Computational Biology of Infection Research
Metagenomics is a young research area which deals with the sequencing and study of whole communities of microorganisms, as opposed to the sequencing of genomes of individual organisms that have been obtained in pure culture. As by estimates less than 1% of all microorganisms can be cultured using standard techniques, the current knowledge of prokaryote biology and the collection of sequenced genomes are strongly biased towards a few, well characterized phyla, while very little is known about the vast majority of the prokaryotic world . Metagenomics has the potential to level this inequality and has already delivered an enormous gain in biological knowledge. This is also reflected by its recognition in Science as one of the top ten scientific breakthroughs in 2004.
Taxonomic metagenome sequence assignment with structured output models
Computational inference of the taxonomic origin of sequence fragments is an essential step in metagenome analysis. Fragments can be assigned to individual populations or corresponding higher-level evolutionary clades using methods based on homology, sequence similarity or sequence composition. This is a challenging task, as for most uncultured microorganisms reference sequences are unavailable, and large amounts of data have to be processed. With this in mind, we introduce PhyloPythiaS a successor of our previously published method PhyloPythia. It is a fast and accurate sequence compositional classifier based on the structured output paradigm.
Discovery of phenotype-defining functional modules
Metagenomic sequencing has retrieved large numbers of novel genes from uncultivable microorganisms that are of biotechnological, medical or agricultural interest. For instance, 1,700 novel protein families were discovered in a comprehensive sequencing study of the marine planktonic microbiota from oceanic surface waters. However, the functions and corresponding biological processes of these genes remain largely unknown, as they lack homology to experimentally characterized proteins of known function. We have developed a probabilistic method that processes large sets of (meta-) genome annotations and captures the inherent functional relationships between gene products acting in concert in a biological process. Such a model can be used to describe the funtional modules encoded in a (meta-)genome, to discover new protein families of known biological pathways and protein complexes and to determine functional modules that are essential for a particular phenotype.
- Johannes Dröge
- Rubén Garrido Oter
- Ivan Gregor
- Sebastian Konietzny
- Philipp Münch
- Dr. Yao Pan
- Aaron Weimann
Present and former collaborators
- Andreas Brune, Research Group Leader, Department of Biogeochemistry, Max-Planck Institute for Terrestrial Microbiology, Marburg, Germany
- Mila Chistoserdova, Department of Chemical Engineering, University of Washington,Seattle, WA, USA
- Jeffrey Gordon and Peter Turnbaugh, Center for Genome Sciences, Washington University, St. Louis, MO, USA
- Phil Hugenholtz, Australian Center for Ecogenomics, Queensland, Australia
- Karl-Erich Jäger, Institute of Molecular Enzyme Technology, Research Center Jülich, Jülich, Germany
- Mark Morrison, CSIRO Livestock Industries, Queensland, Australia
- Phil Pope, Vincent Eijsink, Norwegian University of Life Sciences, Aas, Norway
- Isidore Rigoutsos, Computational Medicine Center, Thomas Jefferson University, Philadelphia, PA, USA
- Paul Schulze-Lefert, Davide Bulgarelli, Max-Planck Institute for Plant Breeding Research, Cologne, Germany
De novo prediction of the genomic components and capabilities for microbial plant biomass degradation from (meta-)genomes
A. Weimann, Y. Trukhina, P.B. Pope, S.G.A. Konietzny, A.C. McHardy Biotechnology for Biofuels 2013, 6: 24, PMC3585893
Understanding the biological mechanisms used by microorganisms for plant biomass degradation is of considerable biotechnological interest. Despite of the growing number of sequenced (meta)genomes of plant biomass-degrading microbes, there is currently no technique for the systematic determination of the genomic components of this process from these data. We describe a computational method for the discovery of the protein domains and CAZy families involved in microbial plant biomass degradation. Our method furthermore accurately predicts the capability to degrade plant biomass for microbial species from their genome sequences. Application to a large, manually curated data set of microbial degraders and non-degraders identified gene families of enzymes known by physiological and biochemical tests to be implicated in cellulose degradation, such as GH5 and GH6. Additionally, genes of enzymes that degrade other plant polysaccharides, such as hemicellulose, pectins and oligosaccharides, were found, as well as gene families which have not previously been related to the process. For draft genomes reconstructed from a cow rumen metagenome our method predicted Bacteroidetes-affiliated species and a relative to a known plant biomass degrader to be plant biomass degraders. This was supported by the presence of genes encoding enzymatically active glycoside hydrolases in these genomes. Our results show the potential of the method for generating novel insights into microbial plant biomass degradation from (meta-)genome data, where there is an increasing production of genome assemblages for uncultured microbes.
Taxonomic metagenome sequence assignment with structured output models
K. Patil, P. Haider, P.B. Pope, P.J. Turnbaugh, M. Morrison, T. Scheffer, A.C. McHardy Nature Methods 2011, 8: 191-192
Computational inference of the taxonomic origin of sequence fragments is an essential step in metagenome analysis. Fragments can be assigned to individual populations or corresponding higher-level evolutionary clades using methods based on homology, sequence similarity or sequence composition. This is a challenging task, as for most uncultured microorganisms reference sequences are unavailable, and large amounts of data have to be processed. With this in mind, we introduce PhyloPythiaS a successor of our previously published method PhyloPythia. It is a fast and accurate sequence compositional classifier based on the structured output paradigm.We evaluated PhyloPythiaS on simulated and real metagenome data in comparison to four other methods: PhyloPythia, metagenome analyzer (MEGAN), Phymm and PhymmBL. PhyloPhythiaS performed particularly well for taxonomic assignment of populations from novel genera, order or higher-level clades, when limited amounts of reference data were available. We observed that accurate assignments were obtained based on 100 kilobases (kb) of training data for a sample population. We observed this for simulated data and a predominant population of a novel family of the order of Aeromonadales from the Australian Tammar wallaby gut. We performed experiments on simulated data for four settings: with genomes of the same species for the dominant strains made available as reference data (known species) or with genomes of the corresponding higher-level clades (genus, order and class) removed, while retaining 100 kb of sequence for the dominant strains. In this scenario, alignment-based methods performed poorly. If closely related genomeswere available, the performance of all methods became more similar, with a slight advantage for alignment-based approaches. We observed this for simulated data and the predominant genera of two human gut metagenomes.
Prof Dr Alice McHardy
Head of department Computational Biology of Infection Research
+49 531 391-55271