Below is an outline of my research over the years and some key publications.
For a full list of references, see Google Scholar.

Picture: points sampled from the bivariate von Mises distribution on the torus. This distribution belongs to directional statistics - a branch of statistics that deals with non-Euclidean data, such as for example dihedral angles in biomolecules.

Structural biology

I did my PhD at the Free University of Brussels (VUB), Belgium on the structural biology (using X-ray crystallography) of proteins that bind complex sugars (lectins), using the legume lectin family as a model system. Here are some of the key papers from that period. They shed light on how lectins distinguish between highly similar complex sugars, on the interplay between sugar binding and the quaternary structure of proteins, and on the structure of a plant-defense protein related to the legume lectins, called arcelin-5. The review that appeared in BBA in 1998 is still a widely-read paper on the subject.

Dolichos biflorus lectin

Structural bioinformatics

After my PhD in protein X-ray crystallography, I turned to structural bioinformatics. The Proteins (2005) article was my first publication in the field: a new way to look at solvent exposure, based on the surprising observation that the environment around the side chain of an amino acid differs radically from its opposite side with respect to the distribition of the number of neighbors. The result, Half Sphere Exposure (HSE), is now an established method to quantify solvent exposure of an amino acid, next to contact number and accessible surface area. I also developed Biopython's Bio.PDB library - still one of the standard tools to extract data from PDB files.

  • Hamelryck, T. (2005). An amino acid has two sides: a new 2D measure provides a different view of solvent exposure. Proteins: Structure, Function, and Bioinformatics, 59(1), 38–48.
  • Hamelryck, T., Manderick, B. (2003). PDB file parser and structure class implemented in Python. Bioinformatics, 19(17), 2308–2310.
  • Cock, P. J., Antao, T., Chang, J. T., Chapman, B. A., Cox, C. J., Dalke, A., Friedberg, I., Hamelryck, T., Kauff, F., Wilczynski, B., and others. (2009). Biopython: freely available Python tools for computational molecular biology and bioinformatics. Bioinformatics, 25(11), 1422–1423.

Bio.PDB

Statistical structural bioinformatics

Combining Bayesian networks and directional statistics, I worked on formulating proper probabilistic models of protein and RNA structure. This work is still relevant for moving beyond point predictions obtained from non-probabilistic deep network heuristics, I think. Much of this work has been done with my two close collaborators at the University of Leeds, Prof. Kanti KV. Mardia and Prof. John T. Kent, and my former PhD student (now professor) Wouter Boomsma. The evolutionary model published in 2017 was developed together with Prof. Jotun Hein, University of Oxford and his PhD student Michael Golden.

  • Hamelryck, T., Kent, JT., Krogh, A. (2006). Sampling realistic protein conformations using local structural bias. PLoS Computational Biology, 2: e131.
  • Boomsma, W., Mardia, KV., Taylor, CC., Ferkinghoff-Borg, J., Krogh, A., Hamelryck, T. (2008). A generative, probabilistic model of local protein structure. Proceedings of the National Academy of Sciences, 105(26), 8932–8937.
  • Frellsen, J., Moltke, I., Thiim, M., Mardia, K. V., Ferkinghoff-Borg, J., Hamelryck, T. (2009). A probabilistic model of RNA conformational space. PLoS Computational Biology, 5(6), e1000406.
  • Harder, T., Boomsma, W., Paluszewski, M., Frellsen, J., Johansson, K. E., Hamelryck, T. (2010). Beyond rotamers: a generative, probabilistic model of side chains in proteins. BMC Bioinformatics, 11(1), 1–13.
  • Hamelryck, T., Mardia, KV., Ferkinghoff-Borg, J., Editors. (2012) Bayesian methods in structural bioinformatics. Book in the Springer series "Statistics for biology and health", 385 pages, 13 chapters. Springer Verlag, March, 2012
  • Golden, M., Garcı́a-Portugués, E., Sørensen, M., Mardia, K. V., Hamelryck, T., Hein, J. (2017). A generative angular model of protein structure evolution. Molecular Biology and Evolution, 34(8), 2085–2100.

torusdbn

One of the results I am particularly happy with, is the explanation of why so-called potentials of mean force for protein structure prediction work. Sippl-type "potentials of mean force" are widely used in simulation of protein folding (AlphaFold1 made use of them, for example). These potentials are typically justified by analogy with the (well-defined) potentials of mean force used in the physics of liquids. This is incorrect. We showed that these potentials are actually an application of Jeffrey's conditioning, a special form of Bayesian updating. This led to the formulation of the reference ratio method, which can be used to update probabilistic models of local protein structure using nonlocal models (for example, on pairwise distances). This is a fine piece of probabilistic reasoning with many potential applications. We applied this method to probabilistic modelling of structure ensembles of flexible proteins.

  • Hamelryck, T., Borg, M., Paluszewski, M., Paulsen, J., Frellsen, J., Andreetta, C., Boomsma, W., Bottaro, S., Ferkinghoff-Borg, J. (2010). Potentials of mean force for protein structure prediction vindicated, formalized and generalized. PloS One, 5(11), e13714.
  • Olsson, S., Frellsen, J., Boomsma, W., Mardia, KV., Hamelryck, T. (2013) Inference of structure ensembles of flexible biomolecules from sparse, averaged data. PLoS ONE. 8(11): e79439.
  • Valentin, J. B., Andreetta, C., Boomsma, W., Bottaro, S., Ferkinghoff-Borg, J., Frellsen, J., Mardia, K. V., Tian, P., Hamelryck, T. (2014). Formulation of probabilistic models of protein structure in atomic detail using the reference ratio method. Proteins: Structure, Function, and Bioinformatics, 82(2), 288–299.

reference ratio

I also served as main editor of a book on Bayesian methods in structural biology for Springer's "Statistics for biology and health" series.

book

Deep probabilistic programming

Deep probabilistic programming combines the modelling scope of deep learning with the principled treatment of uncertainty of Bayesian statistics. My group has applied this new paradigm in machine learning to Bayesian protein structure superposition, a deep generative model of local protein structure (used in vaccine design), and the reconstruction of ancestral protein sequences. The latter makes use of a deep model of protein evolution based on an Ornstein-Uhlenbeck process on a phylogenetic tree (see figure below). The model can be interpreted as an ensemble of variational autoencoders whose latent variables diffuse on a phylogenetic tree. Deep probabilistic programming and its applications is the current focus of my research group.

deep evolutionary model

Thomas Hamelryck

thamelry@bio.ku.dk

Department of biology
Section for Computational and RNA Biology (SCARB)
Ole Maaløes Vej 5
DK-2200 Copenhagen N

and

Department of computer science
Programming languages and theory of computation section (PLTC)
Universitetsparken 5, HCØ, building B
DK-2100 Copenhagen Ø

University of Copenhagen
Denmark

Plain Academic