Computational Biology of Infection Research

The research of the group focuses on the data-driven analysis of biological questions from infection research, as well as method development to solve prediction problems for large biological data sets.

Metagenomics Research

Research Overview

Metagenomics is a young research area which deals with the sequencing and study of whole communities of microorganisms, as opposed to the sequencing of genomes of individual organisms that have been obtained in pure culture. As by estimates less than 1% of all microorganisms can be cultured using standard techniques, the current knowledge of prokaryote biology and the collection of sequenced genomes are strongly biased towards a few, well characterized phyla, while very little is known about the vast majority of the prokaryotic world . Metagenomics has the potential to level this inequality and has already delivered an enormous gain in biological knowledge. This is also reflected by its recognition in Science as one of the top ten scientific breakthroughs in 2004.

Taxonomic metagenome sequence assignment with structured output models

Computational inference of the taxonomic origin of sequence fragments is an essential step in metagenome analysis. Fragments can be assigned to individual populations or corresponding higher-level evolutionary clades using methods based on homology, sequence similarity or sequence composition. This is a challenging task, as for most uncultured microorganisms reference sequences are unavailable, and large amounts of data have to be processed. With this in mind, we introduce PhyloPythiaS a successor of our previously published method PhyloPythia. It is a fast and accurate sequence compositional classifier based on the structured output paradigm.

Discovery of phenotype-defining functional modules

Metagenomic sequencing has retrieved large numbers of novel genes from uncultivable microorganisms that are of biotechnological, medical or agricultural interest. For instance, 1,700 novel protein families were discovered in a comprehensive sequencing study of the marine planktonic microbiota from oceanic surface waters. However, the functions and corresponding biological processes of these genes remain largely unknown, as they lack homology to experimentally characterized proteins of known function. We have developed a probabilistic method that processes large sets of (meta-) genome annotations and captures the inherent functional relationships between gene products acting in concert in a biological process. Such a model can be used to describe the funtional modules encoded in a (meta-)genome, to discover new  protein families of known biological pathways and protein complexes and to determine functional modules that are essential for a particular phenotype. 


  • Johannes Dröge
  • Rubén Garrido Oter
  • Ivan Gregor
  • Sebastian Konietzny
  • Philipp Münch
  • Dr. Yao Pan 
  • Aaron Weimann 

Present and former collaborators

  • Andreas Brune, Research Group Leader, Department of Biogeochemistry, Max-Planck Institute for Terrestrial Microbiology, Marburg, Germany 
  • Mila Chistoserdova, Department of Chemical Engineering, University of Washington,Seattle, WA, USA
  • Jeffrey Gordon and Peter Turnbaugh, Center for Genome Sciences, Washington University, St. Louis, MO, USA
  • Phil Hugenholtz, Australian Center for Ecogenomics, Queensland, Australia 
  • Karl-Erich Jäger, Institute of Molecular Enzyme Technology, Research Center Jülich, Jülich, Germany
  • Mark Morrison, CSIRO Livestock Industries, Queensland, Australia 
  • Phil Pope, Vincent Eijsink, Norwegian University of Life Sciences, Aas, Norway
  • Isidore Rigoutsos, Computational Medicine Center, Thomas Jefferson University, Philadelphia, PA, USA
  • Paul Schulze-Lefert, Davide Bulgarelli, Max-Planck Institute for Plant Breeding Research, Cologne, Germany