Our Research
The Department of “Computational Biology for Infection Research” studies the human microbiome, viral and bacterial pathogens, and human cell lineages within individual patients by analysis of large-scale biological and epidemiological data sets with computational techniques. Focusing on high throughput meta’omics, population genomic and single cell sequencing data, we produce testable hypotheses, such as sets of key sites or relevant genes implicated in onset of a disease, antibiotic resistance or immune defense. We interact with experimental collaborators to verify our findings and to promote their translation into medical treatment or diagnosis procedures. To achieve its research goals, the department also develops novel algorithms and software.
Our Research
The Department of “Computational Biology for Infection Research” studies the human microbiome, viral and bacterial pathogens, and human cell lineages within individual patients by analysis of large-scale biological and epidemiological data sets with computational techniques. Focusing on high throughput meta’omics, population genomic and single cell sequencing data, we produce testable hypotheses, such as sets of key sites or relevant genes implicated in onset of a disease, antibiotic resistance or immune defense. We interact with experimental collaborators to verify our findings and to promote their translation into medical treatment or diagnosis procedures. To achieve its research goals, the department also develops novel algorithms and software.
Prof Dr Alice McHardy
Alice Carolyn McHardy holds a diploma in biochemistry and a doctoral degree (Dr. rer. nat) in bioinformatics, both from Bielefeld University in Germany. From 2005 to 2007 she first was a postdoc and then a permanent staff member in the Bioinformatics and Pattern Discovery Group at the IBM T.J. Watson Research Center in Yorktown Heights, USA.
She then became the head of the independent research group for Computational Genomics and Epidemiology at the Max Planck Institute of Computer Science in Saarbrücken. In 2010, she was appointed Chair of Algorithmic Bioinformatics at Heinrich Heine University in Düsseldorf.
In 2014, she became head of the Department of Computational Biology for Infection Research at the Helmholtz Centre for Infection Research in Braunschweig and was appointed as a full professor at TU Braunschweig.
Team
Web Applications
Traitar is a web service for phenotyping bacteria based on their genome sequences.
PhyloPythiaS+ - a self-training method for the rapid reconstruction of low-ranking taxonomic bins from metagenomes
PhyloPythiaS - the PhyloPythiaS Web Server for Taxonomic Assignment of Metagenome Sequences.
Taxator-tk performs taxonomic sequence assignment by fast approximate determination of evolutionary neighbors from sequence similarities.
AdaPatch is a method for detecting positively selected patches of sites on the surface of viral proteins, which are likely candidates for being relevant for adaptive evolution.
Software Downloads
Software implementing the group’s research can be downloaded here or on GitHub at github.com/hzi-bifo, such as
SDplots: software for the detection of selective sweeps to monitor the adaptation of influenza A viruses
SDplots VaccineUpdates: results of the bi-annual vaccine strain prediction for influenza A viruses
PatchDetection: software for the detection of protein patches under positive selection
Phylogeography: software for phylogeographical reconstruction to infer origin and spread routes of viral pathogen outbreaks
FrechetTreeDistances: distances between phylogeographic reconstructions across tree topologies.
DiTaxa: nucleotide-pair encoding of 16S rRNA sequences for host phenotype and biomarker detection
CAMISIM: Simulating metagenomes and microbial communities
AMBER: Assessment of Metagenome BinnERs
OPAL: Open-community Profiling Assessment tooL
CAMITAX: Taxon labels for microbial genomes
Selected Publications
- Meyer, F., Fritz, A., Deng, Z. L., Koslicki, D., Lesker, T. R., Gurevich, A., Robertson, G., Alser, M., Antipov, D., Beghini, F., Bertrand, D., Brito, J. J., Brown, C. T., Buchmann, J., Buluç, A., Chen, B., Chikhi, R., Clausen, P., Cristian, A., Dabrowski, P. W., Darling, A. E., Egan, R., Eskin, E., Georganas, E., Goltsman, E., Gray, M. A., Hansen, L. H., Hofmeyr, S., Huang, P., Irber, L., Jia, H., Jørgensen, T. S., Kieser, S. D., Klemetsen, T., Kola, A., Kolmogorov, M., Korobeynikov, A., Kwan, J., LaPierre, N., Lemaitre, C., Li, C., Limasset, A., Malcher-Miranda, F., Mangul, S., Marcelino, V. R., Marchet, C., Marijon, P., Meleshko, D., Mende, D. R., Milanese, A., Nagarajan, N., Nissen, J., Nurk, S., Oliker, L., Paoli, L., Peterlongo, P., Piro, V. C., Porter, J. S., Rasmussen, S., Rees, E. R., Reinert, K., Renard, B., Robertsen, E. M., Rosen, G. L., Ruscheweyh, H. J., Sarwal, V., Segata, N., Seiler, E., Shi, L., Sun, F., Sunagawa, S., Sørensen, S. J., Thomas, A., Tong, C., Trajkovski, M., Tremblay, J., Uritskiy, G., Vicedomini, R., Wang, Z., Wang, Z., Wang, Z., Warren, A., Willassen, N. P., Yelick, K., You, R., Zeller, G., Zhao, Z., Zhu, S., Zhu, J., Garrido-Oter, R., Gastmeier, P., Hacquard, S., Häußler, S., Khaledi, A., Maechler, F., Mesny, F., Radutoiu, S., Schulze-Lefert, P., Smit, N., Strowig, T., Bremges, A., Sczyrba, A. & McHardy, A. C. Critical Assessment of Metagenome Interpretation: the second round of challenges. Nat Methods (2022) 19, 429, doi:10.1038/s41592-022-01431-4.
- Asgari, E., Münch, P. C., Lesker, T. R., McHardy, A. C.* & Mofrad, M. R. K.* (*shared last authors) DiTaxa: nucleotide-pair encoding of 16S rRNA for host phenotype and biomarker detection. Bioinformatics (2019) 35, 2498, doi:10.1093/bioinformatics/bty954.
- Bankwitz, D.*, Bahai, A.*, Labuhn, M., Doepke, M., Ginkel, C., Khera, T., Todt, D., Ströh, L. J., Dold, L., Klein, F., Klawonn, F., Krey, T., Behrendt, P., Cornberg, M., McHardy, A. C.* & Pietschmann, T*. (*shared first and last authors) Hepatitis C reference viruses highlight potent antibody responses and diverse viral functional interactions with neutralising antibodies. Gut (2021) 70, 1734, doi:10.1136/gutjnl-2020-321190.
- Fritz, A.*, Hofmann, P.*, Majda, S., Dahms, E., Dröge, J., Fiedler, J., Lesker, T. R., Belmann, P., DeMaere, M. Z., Darling, A. E., Sczyrba, A., Bremges, A. & McHardy, A. C. (*shared first authors) CAMISIM: simulating metagenomes and microbial communities. Microbiome (2019) 7, 17, doi:10.1186/s40168-019-0633-6.
- Münch, P. C., Franzosa, E. A., Stecher, B., McHardy, A. C.* & Huttenhower, C.* (*shared last authors) Identification of Natural CRISPR Systems and Targets in the Human Microbiome. Cell Host Microbe (2021) 29, 94, doi:10.1016/j.chom.2020.10.010.
Publications
Computational biology of viral pathogens
The Research Department “Computational Biology for Infection Research” at the HZI studies rapidly evolving viral pathogens, such as influenza, hepatitis and human cytomegaloviruses and their coevolution with the adaptive immune response of the human host using computational techniques. A particular focus are influenza A viruses, where we combine epidemiological, genetic, antigenic and structural information on circulating viral strains to determine the antigenicity-altering areas on the protein structure, key sites and amino acid changes and analyze how these affect the viral fitness with regards to escaping human immune response [1,3,5]. In collaboration with infection biologists and immunologists from the HZI, the Hannover Medical School and the German Centre for Infection Research (DZIF), we also study the adaptation of influenza viruses to novel hosts and for maintaining fitness under immune selection in animal models. Viral pathogens such as hepatitis and human cytomegaloviruses differ from influenza viruses in that they cause chronic, not short-term infections. Together with collaborators we track the evolution of these pathogens and the corresponding adaptive immune ‘evolution’ of the host within individual patients over time. We thus aim to generate novel insights into pathogen-host co-evolution and identify leads of translational relevance, such as for development of a universal vaccine against hepatitis C virus infections.
We focus on:
- Computational prediction of suitable vaccine strains for human influenza A
- Determining epitopes for broadly neutralizing antibodies for development of a “universal vaccine” against hepatitis C virus infections
- Reconstructing viral haplotypes from deep sequencing data
- Inference of viral spread trajectories with viral phylogeographies
- Development of a universal vaccine against influenza A viruses
SARS-COV-2 resources
VirusTracker
High resolution global spread reconstruction of COVID-19 via air travel from genome and geographic location data available up to mid February 2020 using the method in (Reimering et. al, PLOS Computational Biology, 2020). Interactive visualisations using
https://nextstrain.github.io/auspice/ are available at https://corona.bifo.helmholtz-hzi.de/ncov2.
Collaborators
- Wulf Blankenfeldt, Department of Structure and Functions of Proteins, Helmholtz Centre for Infection Research (HZI), Braunschweig, Germany
- Mohammad Mofrad, Departments of Bioengineering and Mechanical Engineering, University of California, Berkeley, USA
- Gülsah Gabriel, Heinrich Pette Institute, Leibniz Institute for Experimental Virology, Hamburg, Germany
- Carlos Guzmán, Department of Vaccinology and Applied Microbiology, Helmholtz Centre for Infection Research (HZI), Braunschweig, Germany
- Thomas Pietschmann, Institute for Experimental Virology, Twincore Centre for Experimental and Clinical Infection Research, Hannover, Germany (DZIF collaboration)
- Thomas Schulz, Institute of Virology, Hannover Medical School (MHH), Hannover, Germany
- Klaus Schughart, Infection Genetics, Helmholtz Centre for Infection Research (HZI), Braunschweig, Germany
- Thomas Krey, Institute of Virology, Hannover Medical School (MHH), Hannover, Germany
Computational microbiome research
A research focus of BIFO is the study of microbial communities, including bacteria, viruses and eukaryotic community members, and their relevance for human health and disease. The human microbiota is implicated in a variety of diseases and subject of experimental studies at HZI. Direct metagenome, -transcriptome or -proteome sequencing of microbial community samples enables the study of the majority of microorganisms that cannot be obtained in pure culture, corresponding to the vast majority of the microbial world.
Research in BIFO focuses on establishing data-driven computational approaches that further advance individualized infection medicine in the clinic, such as computational biomarker discovery from microbial omics data, i.e. genotype-phenotype and genotype-environment inference, and the data-driven discovery molecular predictors of host disease status and pathogen phenotypes. We also develop methods for common meta’ome data types, and promote the development of standards and best practices via the Initiative for the Critical Assessment of Metagenome Interpretation (CAMI).
BIFO currently focuses on the following problems and questions:
- Can we identify biomarkers for clinically relevant phenotypes from microbiome data using machine learning approaches and reliably predict these phenotypes? This is particularly relevant for the analysis of cost-efficient 16S data, which however does not encode any information about the functional gene repertoire of a sample.
- Which software is particularly well suited for processing different kinds of metagenome samples? A. McHardy founded and organizes (together with A. Sczyrba) CAMI, the Initiative for the Critical Assessment of Metagenome Interpretation, which aims to establish standards and best practices in metagenome analysis by organizing benchmarking challenges for method developers.
- Can we reconstruct the genomes of individual strains from metagenomics data? This question has large clinical relevance, as individual strains of the same species can have very different phenotypes (e.g. the probiotic E. coli Nissle versus the EHEC strain).
- Which traces does the adaptation of microbial communities to a certain environment leave in the microbiome? Specifically we are interested in this question for the human microbiota and for the spread of antibiotic resistances.
- What can we learn about the role of the microbial CRISPR-CAS system in the human microbiome by systematic metagenome analyses combined with deep learning techniques?
Selected publications
Researchers
- Dr. Fernando Meyer
- Dr. Ehsaneddin Asgari
- Dr. Till Robin Lesker
- Dr. Zhiluo Deng
- Adrian Fritz
- Philipp Münch
- Tzu-Hao Kuo
Collaborators
Current:
- Justin O’Grady & Gemma Kay, Quadram Institute, Norwich, UK
- Markus Cornberg, Hannover Medical School, Hannover, Germany
- Thomas Schulz, Hannover Medical School, Hannover, Germany
- Curtis Huttenhower, Harvard T.H. Chan School of Public Health, Boston, MA, U.S.
- Barbara Stecher, Medical Microbiology and Hospital Epidemiology, Max von Pettenkofer Institute, Ludwig Maximilian University of Munich, Munich, Germany
- Phil Pope and Vincent Eijsink, Norwegian University of Life Sciences, Aas, Norway
- Nadine Ziemert, Natural Product Genome Mining, Eberhard Karls University of Tübingen, Tübingen, Germany (DZIF collaboration)
- Alexander Sczyrba, Aaron Darling, Tanja Woyke…and the further CAMI initiative
- Till Strowig, Microbial Immune Regulation, Helmholtz Centre for Infection Research (HZI), Braunschweig, Germany
Past:
- Paul Schulze-Lefert, Max Planck Institute for Plant Breeding Research, Cologne, Germany
- Phil Pope and Vincent Eijsink, Norwegian University of Life Sciences, Aas, Norway
- Johannes Gescher, Institute of Applied Biosciences (IAB), Karlsruhe Institute of Technology (KIT), Karlsruhe, Germany
- Mark Morrison, CSIRO Livestock Industries, Queensland, Australia
- Jeffrey Gordon and Peter Turnbaugh, Center for Genome Sciences, Washington University, St. Louis, Missouri, USA
- Phil Hugenholtz, Australian Center for Ecogenomics, Queensland, Australia
- Isidore Rigoutsos, Computational Medicine Center, Thomas Jefferson University, Philadelphia, Pennsylvania, USA
- Andreas Brune, Research Group Leader, Department of Biogeochemistry, Max Planck Institute for Terrestrial Microbiology, Marburg, Germany
- Mila Chistoserdova, Department of Chemical Engineering, University of Washington, Seattle, Washington, USA
Machine Learning
High-throughput sequencing combined with metagenomics has uncovered thousands of novel bacterial species directly from samples without isolation or culturing. This enables fine-grained analyses of the functions of microbial community members, and the study of their association with phenotypes and environments, as well as studies of microevolution and adaptation to changing environmental conditions. These sequencing efforts led to a sheer amount of data that can be analyzed using machine learning methods and can help to uncover relationships between human health and diseases.
In this research focus we are:
- Developing novel machine learning methods implemented as software libraries that can be used by the research communities.
- Developing efficient machine learning models that utilize the vast amount of unlabeled biological data.
- Applying machine learning methods on microbial datasets to analyze the interaction between bacteria and the mammalian host (see Computational microbiome research).
- Developing machine and deep learning models to predict properties of molecules and identify their most important structural characteristics.
- Developing machine learning methods to predict B-cell epitopes for epitope mapping and designing novel vaccines.
We focus on:
- GenomeNet - A deep neural network for genomic modelling, semi-supervised classification and imputation: The GenomeNet project is a BMBF funded joint research enterprise of the Helmholtz Centre for Infection Research and the University of Munich with close collaboration with the Harvard T.H. Chan School of Public Health. In this project we aim to develop customized deep learning network architectures which are particularly suited for modeling of large nucleotide sequences. These networks will then be employed on bacterial, viral and human genomes with the goal to understand the complex structures underlying the code of life. This work is funded by the Federal Ministry of Education and Research (031L0199A).
- Learning structures in the CRISPR-Cas system using deep learning architectures: In this project we apply statistical models to specify structural properties of CRISPR cassettes. These properties will be used to further describe potential functions and their classification. This work is funded by the Deutsche Forschungsgemeinschaft (405892038).
Selected publications
Collaborators
- Prof. Bernd Bischl, Ludwig Maximilian University of Munich
- Prof. Bärbel Stecher, Ludwig Maximilian University of Munich
- Prof. Curtis Huttenhower, Harvard School of Public Health
- Dr. Eric Franzosa, Harvard School of Public Health
- Volkswagen Lab
Ongoing Grants
2021 “NFDI4Microbiota - National Research Data Infrastructure for Microbiota Research” (DFG)
2021 “GHGA-Microbiota: Increasing the multimodal use of human and microbiome-related omics data” - GHGA Flex Funds Projekt (DFG)
2021 “Hepatitis C Control: Towards prophylaxis and identification of those in need of treatment” (DZIF) (BMBF)
2019 “Paving the way towards individualized vaccination (i.Vacc) - Exploring multi-omics Big Data in the general population based on a digital mHealth cohort” – Volkswagen Stiftung
2019 “Drug discovery and cheminformatics for new anti-infectives (iCA)” – Lower Saxony Doctoral Program
2019 “Rational design of a universal flu vaccine using recombinant neuraminidase” – Global Grand Challenges of the Bill & Melinda Gates Foundation
2019 “GenomeNet: A deep neural network for genomic modelling, semi-supervised classification and imputation” – Computational Life Sciences Call, Bundesministerium für Bildung und Forschung (BMBF)
2019 “RESIST - Resolving Infection Susceptibility“ – Exzellenzcluster 2155, Deutsche Forschungsgemeinschaft (DFG)
2018 “Learning structures in the CRISPR-Cas system using deep learning architectures” – SPP2141 Deutsche Forschungsgemeinschaft (DFG)
2017 “HiGHmed (Heidelberg-Göttingen-Hannover Medizininformatik)“ - Bundesministerium für Bildung und Forschung (BMBF)
2014 “TI Bioressourcen, Biodaten und digitale Gesundheit” (vorher: “TI Bioinformatics Platform“ - Deutsches Zentrum für Infektionsforschung (DZIF) (BMBF)
Concluded Grants
2020 Corona VAC “Proof of concept study of a SARS-COV-2 vaccine based on recombinant spike protein“ - Niedersächsischen Ministerium für Wissenschaft und Kultur (MWK)
2017 “Sparse2Big: Data fusion and imputation from massive sparse data consortium” - Information and Data Science Initiative, Helmholtz Society
2017 “Bioinformatics support for the development of a prophylactic HCV vaccine candidate” - Deutsches Zentrum für Infektionsforschung (DZIF)
2017 “Communities Allied in Infection coalition" - Volkswagen Foundation
2016 “A Method for Tracking CRISPR/Plasmidome Dynamics in Complex Bacterial Communities“ – Research Grant, Deutsche Forschungsgemeinschaft (DFG)
2014 “Isolation and characterization of novel azidophilic archaea (with J. Gescher)” – Deutsche Forschungsgemeinschaft (DFG)
Further Groups of the Department Computational Biology of Infection Research
Host (epi)genomics - Dr Andreas Klötgen
High-throughput technologies such as microarray chips and next-generation sequencing have enabled screening of a wide range of patients in genome-wide association studies, revealing unprecedented insights into genomic, transcriptomic and plenty of different epigenomic drivers of disease.
Transcriptional regulation is a multistep process involving not only the transcription machinery and transcription factors, but is also defined by histone modifications, nuclear chromatin organization and availability of cis-regulatory elements (for example enhancers). We study the composition of such factors with respect to transcriptional output in immune development and infectious diseases 1. To this end, the incorporation of Hi-C data (3D chromatin information) with ChIP-seq of histone modifications and chromatin regulators has enabled us to predict the chromatin environment of leukemia and how targeted drug therapies affect this complex network 2. Furthermore, more recent techniques such as HiChIP require advanced computational approaches not yet fully developed, and have been used to identify the involvement of Klf4 in cellular reprogramming 3.
However, the complexity of protein expression and regulation does not end at gene transcription. Posttranscriptional regulation ranges from controlling mRNA stability to translational rate and more, and can be studied by specific sequencing approaches 4. We study the effects of RNA-binding proteins (RBPs) by projecting miRNA expression and binding onto mRNA expression regulation 5 with the help of PAR-CLIP. Or, study the direct effects of RPBs on mRNA binding and stability, and how such regulators affect downstream cellular processes in disease 6. As in any computational biology field, the development of new approaches for gaining deeper insights into the underlying biology is key 7, and we continue to develop computational methods and integrative pipelines for various purposes and sequencing approaches.
Our research focuses on
- How do chromatin regulators, histone modifications and chromatin organization orchestrate the complex transcription machinery in immune cells? Do these features hold predispositions of their own for infectious diseases?
- Integration of RNA-binding protein information with RNA-seq or miRNA-seq to achieve a more directed way of posttranscriptional gene regulation by individual RBPs
- How can novel machine learning or deep learning approaches support integrative tasks of genomics and transcriptomics data? To this end, we develop computational approaches to deal with novel and custom sequencing data, and to integrate complex data from various sources in order to achieve a complete view of (post-)transcriptional gene regulation.
Selected publications
- A. Kloetgen, P. Thandapani et al., 3D Chromosomal Landscapes in Hematopoiesis and Immunity. Trends Immunol. 2019 40: 809
- A. Kloetgen, P. Thandapani, P. Ntziachristos et al., Three-dimensional chromatin landscapes in T cell acute lymphoblastic leukemia. Nat Genet. 2020 52: 388
- D. C. Di Giammartino, A. Kloetgen, A. Polyzos A et al., KLF4 is involved in the organization and regulation of pluripotency-associated three-dimensional enhancer networks. Nat Cell Biol. 2019 21: 1179
- A. Kloetgen et al., Biochemical and bioinformatic methods for elucidating the role of RNA-protein interactions in posttranscriptional regulation. Brief Funct Genomics. 2015 14 :102
- K. Hezaveh, A. Kloetgen, S. H. Bernhart et al., Alterations of microRNA and microRNA-regulated messenger RNA expression in germinal center B-cell lymphomas determined by integrative sequencing analysis. Haematologica. 2016 101: 1380
- S. Duggimpudi, A. Kloetgen, S. K. Maney et al., Transcriptome-wide analysis uncovers the targets of the RNA-binding protein MSI2 and effects of MSI2's RNA-binding activity on IL-6 signaling. J Biol Chem. 2018 293: 15359
- A. Kloetgen et al., The PARA-suite: PAR-CLIP specific sequence read simulation and processing. PeerJ. 2016 4:e2619
Researchers
- Dr. Andreas Kloetgen
Collaborators
- Abel Viejo-Borbolla, MHH, Hannover, Deutschland
- Effie Apostolou, Weill Cornell Medicine, New York, USA
- Ari Melnick, Weill Cornell Medicine, New York, USA
- Tracy McGaha, Princess Margaret Cancer Centre, Toronto, Kanada
- Iannis Aifantis, NYU Langone Health, New York, USA
Job Offers
The Research Department “Computational Biology for Infection Research” welcomes applications at all seniority levels in the following fields:
- Bioinformatics / Computational Biology
- Computer Science
- Statistics
- Biomathematics
We are looking for motivated applicants with a strong background in the above-mentioned fields, good coding skills and interest in interdisciplinary research in biology and infection research.
Methods that we employ or develop in our research are related to the fields of:
- Bioinformatics
- Machine learning and deep learning
- Phylogenetics and population genetics
The goal of our projects is the development of algorithms and computer-aided methods to analyze the human microbiome, viral and bacterial pathogens, and human cell lineages within individual patients.
Some, but not all, currently open positions are listed below. Therefore unsolicited applications are always welcome. Please send your applications to Jobs.
Open job offers:
Project Manager / Scientific Writer (f/m/d)
Deep Learning for Molecular Biology
The seminar "Deep Learning for molecular biology" is hosted by the department "Computational Biology of Infection Research" at the HZI headed by Prof. Alice McHardy.
Kick-off Meeting: 23rd October 2024, 9 a.m.
Room Kick-off Meeting: BRICS, room 107/108
Seminar Date: TBA
Room seminar: TBA
Max. number of participants: 10
Language: English
Modus: 30 minutes of presentation (including discussion) + 5 pages written summary
Designated for Bachelor and Master Students of Computer and Data Science
Prerequisites: Familiarity with programming in Python and Linear Algebra (matrix / vector multiplications)
In case you have questions about the seminar, contact Mohammad Hadi Foroughmand Araabi.
Description: Recently, deep neural network models have revolutionized machine learning research and achieved state-of-the-art performance in almost every related research, including computer vision, natural language processing, and computational biology. The goal of this seminar is to teach the basic principles of deep learning along with some basic implementations in pytorch framework. We will explore the most fundamental neural architectures including convolutional neural networks, recurrent neural networks, and autoencoders as well as language-model based representation learning methods.
Topics:
- Deep Neural Networks and back propagation
- Convolutional Neural Networks (CNN)
- Recurrent Neural Networks (RNN)
- Transformer models