Computational Biology for Infection Research
Our Research
The Department of “Computational Biology for Infection Research” studies the human microbiome, viral and bacterial pathogens, and human cell lineages within individual patients by analysis of large-scale biological and epidemiological data sets with computational techniques. Focusing on high throughput meta’omics, population genomic and single cell sequencing data, we produce testable hypotheses, such as sets of key sites or relevant genes implicated in onset of a disease, antibiotic resistance or immune defense. We interact with experimental collaborators to verify our findings and to promote their translation into medical treatment or diagnosis procedures. To achieve its research goals, the department also develops novel algorithms and software.
Our Research
The Department of “Computational Biology for Infection Research” studies the human microbiome, viral and bacterial pathogens, and human cell lineages within individual patients by analysis of large-scale biological and epidemiological data sets with computational techniques. Focusing on high throughput meta’omics, population genomic and single cell sequencing data, we produce testable hypotheses, such as sets of key sites or relevant genes implicated in onset of a disease, antibiotic resistance or immune defense. We interact with experimental collaborators to verify our findings and to promote their translation into medical treatment or diagnosis procedures. To achieve its research goals, the department also develops novel algorithms and software.
Prof Dr Alice McHardy
Alice Carolyn McHardy holds a diploma in biochemistry and a doctoral degree (Dr. rer. nat) in bioinformatics, both from Bielefeld University in Germany. From 2005 to 2007 she first was a postdoc and then a permanent staff member in the Bioinformatics and Pattern Discovery Group at the IBM T.J. Watson Research Center in Yorktown Heights, USA.
She then became the head of the independent research group for Computational Genomics and Epidemiology at the Max Planck Institute of Computer Science in Saarbrücken. In 2010, she was appointed Chair of Algorithmic Bioinformatics at Heinrich Heine University in Düsseldorf.
In 2014, she became head of the Department of Computational Biology for Infection Research at the Helmholtz Centre for Infection Research in Braunschweig and was appointed as a full professor at TU Braunschweig.
Team
Web Applications
Traitar is a web service for phenotyping bacteria based on their genome sequences.
PhyloPythiaS+ - a self-training method for the rapid reconstruction of low-ranking taxonomic bins from metagenomes
PhyloPythiaS - the PhyloPythiaS Web Server for Taxonomic Assignment of Metagenome Sequences.
Taxator-tk performs taxonomic sequence assignment by fast approximate determination of evolutionary neighbors from sequence similarities.
AdaPatch is a method for detecting positively selected patches of sites on the surface of viral proteins, which are likely candidates for being relevant for adaptive evolution.
Software Downloads
Software implementing the group’s research can be downloaded here or on GitHub at github.com/hzi-bifo, such as
SDplots: software for the detection of selective sweeps to monitor the adaptation of influenza A viruses
SDplots VaccineUpdates: results of the bi-annual vaccine strain prediction for influenza A viruses
PatchDetection: software for the detection of protein patches under positive selection
Phylogeography: software for phylogeographical reconstruction to infer origin and spread routes of viral pathogen outbreaks
FrechetTreeDistances: distances between phylogeographic reconstructions across tree topologies.
DiTaxa: nucleotide-pair encoding of 16S rRNA sequences for host phenotype and biomarker detection
CAMISIM: Simulating metagenomes and microbial communities
AMBER: Assessment of Metagenome BinnERs
OPAL: Open-community Profiling Assessment tooL
CAMITAX: Taxon labels for microbial genomes
Selected Publications
- Meyer, F., Fritz, A., Deng, Z. L., Koslicki, D., Lesker, T. R., Gurevich, A., Robertson, G., Alser, M., Antipov, D., Beghini, F., Bertrand, D., Brito, J. J., Brown, C. T., Buchmann, J., Buluç, A., Chen, B., Chikhi, R., Clausen, P., Cristian, A., Dabrowski, P. W., Darling, A. E., Egan, R., Eskin, E., Georganas, E., Goltsman, E., Gray, M. A., Hansen, L. H., Hofmeyr, S., Huang, P., Irber, L., Jia, H., Jørgensen, T. S., Kieser, S. D., Klemetsen, T., Kola, A., Kolmogorov, M., Korobeynikov, A., Kwan, J., LaPierre, N., Lemaitre, C., Li, C., Limasset, A., Malcher-Miranda, F., Mangul, S., Marcelino, V. R., Marchet, C., Marijon, P., Meleshko, D., Mende, D. R., Milanese, A., Nagarajan, N., Nissen, J., Nurk, S., Oliker, L., Paoli, L., Peterlongo, P., Piro, V. C., Porter, J. S., Rasmussen, S., Rees, E. R., Reinert, K., Renard, B., Robertsen, E. M., Rosen, G. L., Ruscheweyh, H. J., Sarwal, V., Segata, N., Seiler, E., Shi, L., Sun, F., Sunagawa, S., Sørensen, S. J., Thomas, A., Tong, C., Trajkovski, M., Tremblay, J., Uritskiy, G., Vicedomini, R., Wang, Z., Wang, Z., Wang, Z., Warren, A., Willassen, N. P., Yelick, K., You, R., Zeller, G., Zhao, Z., Zhu, S., Zhu, J., Garrido-Oter, R., Gastmeier, P., Hacquard, S., Häußler, S., Khaledi, A., Maechler, F., Mesny, F., Radutoiu, S., Schulze-Lefert, P., Smit, N., Strowig, T., Bremges, A., Sczyrba, A. & McHardy, A. C. Critical Assessment of Metagenome Interpretation: the second round of challenges. Nat Methods (2022) 19, 429, doi:10.1038/s41592-022-01431-4.
- Asgari, E., Münch, P. C., Lesker, T. R., McHardy, A. C.* & Mofrad, M. R. K.* (*shared last authors) DiTaxa: nucleotide-pair encoding of 16S rRNA for host phenotype and biomarker detection. Bioinformatics (2019) 35, 2498, doi:10.1093/bioinformatics/bty954.
- Bankwitz, D.*, Bahai, A.*, Labuhn, M., Doepke, M., Ginkel, C., Khera, T., Todt, D., Ströh, L. J., Dold, L., Klein, F., Klawonn, F., Krey, T., Behrendt, P., Cornberg, M., McHardy, A. C.* & Pietschmann, T*. (*shared first and last authors) Hepatitis C reference viruses highlight potent antibody responses and diverse viral functional interactions with neutralising antibodies. Gut (2021) 70, 1734, doi:10.1136/gutjnl-2020-321190.
- Fritz, A.*, Hofmann, P.*, Majda, S., Dahms, E., Dröge, J., Fiedler, J., Lesker, T. R., Belmann, P., DeMaere, M. Z., Darling, A. E., Sczyrba, A., Bremges, A. & McHardy, A. C. (*shared first authors) CAMISIM: simulating metagenomes and microbial communities. Microbiome (2019) 7, 17, doi:10.1186/s40168-019-0633-6.
- Münch, P. C., Franzosa, E. A., Stecher, B., McHardy, A. C.* & Huttenhower, C.* (*shared last authors) Identification of Natural CRISPR Systems and Targets in the Human Microbiome. Cell Host Microbe (2021) 29, 94, doi:10.1016/j.chom.2020.10.010.
Publications
Computational biology of viral pathogens
The Research Department “Computational Biology for Infection Research” at the HZI studies rapidly evolving viral pathogens using computational and data-driven approaches.
A recent focus is the computational genomic surveillance and evolutionary analysis of SARS-CoV-2. Based on large-scale international genome-sequencing data, we investigate how viral lineages emerge, spread and acquire mutations that may affect transmissibility, immune escape or viral fitness. To this end, we develop approaches for the early prediction and characterization of emerging variants, including the CoVerage platform (www.sarscoverage.org), as well as for reconstructing viral spread and introduction dynamics using phylogenetics, phylogeography and epidemiological data.
Beyond SARS-CoV-2, the department has long-standing expertise in the computational analysis of rapidly evolving viral pathogens, including influenza viruses and chronic viral infections. These studies address viral adaptation, immune escape and pathogen-host co-evolution, with the aim of identifying translationally relevant leads for surveillance, diagnostics and vaccine development.
We focus on
- Computational genomic surveillance of SARS-CoV-2 and early detection of emerging variants
- Prediction and characterization of SARS-CoV-2 variant properties, including lineage dynamics, antigenic change and immune-escape potential
- Phylogenetic and phylogeographic reconstruction of SARS-CoV-2 spread and lineage importation dynamics, as well as temporal links to nonpharmaceutical interventions during the pandemic
- Analysis of viral evolution from large-scale genome-sequencing data
- Computational prediction of vaccine strains
- Analysis of viral adaptation, immune escape and pathogen-host co-evolution in acute and chronic viral infections
Researchers
- Dr. Sama Goliaei
- Dr. Zhi-Luo Deng
- Dr. Mohammad Hadi Fouroghmand
Alumni
- Dr. Katrina Norwood
- Dr. Akash Bahai
- Dr. Adrian Fritz
- Dr. Thorsten Klingen
- Dr. Susanne Reimering
- Dr. Lars Steinbrück
Collaborators
- Thomas Pietschmann, Institute for Experimental Virology, Twincore Centre for Experimental and Clinical Infection Research, Hannover, Germany (DZIF collaboration)
- Denise Kühnert, Robert Koch Institute, Berlin, Germany, and Max Planck Institute of Geoanthropology, Jena, Germany
- Martin Hölzer, Genome Competence Center, Robert Koch Institute, Berlin, Germany
- Frank Klawonn, Biostatistics, Helmholtz Centre for Infection Research, Braunschweig, Germany, and Department of Computer Science, Ostfalia University of Applied Sciences, Wolfenbüttel, Germany
- Nick Goldman, Nicola De Maio, EMBL-EBI, Hinxton, United Kingdom
- Sebastian Duchene, Anna Zhukova, Frédéric Lemoine, Institut Pasteur, Paris, France
Former Collaborators
- Gülsah Gabriel, Heinrich Pette Institute, Leibniz Institute for Experimental Virology, Hamburg, Germany
- Carlos Guzmán, Department of Vaccinology and Applied Microbiology, Helmholtz Centre for Infection Research (HZI), Braunschweig, Germany
- Thomas Schulz, Institute of Virology, Hannover Medical School (MHH), Hannover, Germany
- Klaus Schughart, Infection Genetics, Helmholtz Centre for Infection Research (HZI), Braunschweig, Germany
- Thomas Krey, Institute of Virology, Hannover Medical School (MHH), Hannover, Germany
- Wulf Blankenfeldt, Department of Structure and Functions of Proteins, Helmholtz Centre for Infection Research (HZI), Braunschweig, Germany
- Mohammad Mofrad, Departments of Bioengineering and Mechanical Engineering, University of California, Berkeley, USA
Selected publications
*K. Norwood, *Z.-L. Deng, S. Reimering, G. Robertson, M.-H. Foroughmand-Araabi, S. Goliaei, M. Hölzer, F. Klawonn, A.C. McHardy. In silico genomic surveillance by CoVerage predicts and characterizes SARS-CoV-2 variants of interest (*shared first authors). doi: 10.1038/s41467-025-60231-4. Nature Communications 2025, 16(1): 6281.
*S. Goliaei, *M.-H. Foroughmand-Araabi, A. Roddy, A. Weber, S. Översti, …, D. Kühnert, A.C. McHardy. Importations of SARS-CoV-2 lineages decline after nonpharmaceutical interventions in phylogeographic analyses (*shared first authors). doi: 10.1038/s41467-024-48641-2. Nature Communications 2024, 15(1): 5267.
*D. Bankwitz, *A. Bahai, M. Labuhn, M. Doepke, C. Ginkel, T. Khera, D. Todt, L. J. Ströh, L. Dold, F. Klein, F. Klawonn, T. Krey, P. Behrendt, M. Cornberg, *A.C. McHardy, *T. Pietschmann. Hepatitis C reference viruses highlight potent antibody responses and diverse viral functional interactions with neutralising antibodies (*shared first and last authors). doi: 10.1136/gutjnl-2020-321190. Gut 2021, 70(9): 1734–1745.
S. Reimering, S. Muñoz, A.C. McHardy. Phylogeographic reconstruction using air transportation data and its application to the 2009 H1N1 influenza A pandemic. doi: 10.1371/journal.pcbi.1007101. PLoS Computational Biology 2020, 16(2): e1007101.
Computational microbiome and microbial pathogen research
BIFO studies microbial communities, including bacteria, viruses and eukaryotic community members, and their relevance for human health and disease. The human microbiota is implicated in a variety of diseases and subject of experimental studies at HZI. Direct metagenome, -transcriptome or -proteome sequencing of microbial community samples enables the study of the majority of microorganisms that cannot be obtained in pure culture, corresponding to the vast majority of the microbial world.
Our research focuses on establishing data-driven computational approaches that further advance individualized infection medicine in the clinic, such as computational biomarker discovery from microbial omics data, i.e. genotype-phenotype and genotype-environment inference, and the data-driven discovery molecular predictors of host disease status and pathogen phenotypes. We also develop methods for common meta’ome data types, and promote the development of standards and best practices via the Initiative for the Critical Assessment of Metagenome Interpretation (CAMI).
BIFO focuses on the following problems and questions:
- Can we identify biomarkers for clinically relevant phenotypes from microbiome data using artificial intelligence approaches and reliably predict these phenotypes?
- Which software with which settings is particularly well suited for processing different types of metagenome samples? A. McHardy founded and organizes (together with A. Sczyrba) CAMI, the Initiative for the Critical Assessment of Metagenome Interpretation, which aims to establish standards and best practices in metagenome analysis by organizing benchmarking challenges for method developers.
- Can we reconstruct the genomes of individual strains from metagenomics data? This question has large clinical relevance, as individual strains of the same species can have very different phenotypes (e.g. the probiotic E. coli Nissle versus the EHEC strain).
- Which traces does the adaptation of microbial communities and pathogens to a certain environment leave in the microbiome and their genomes? Specifically we are interested in this question for the spread of antibiotic resistances.
- What can we learn about the role of the microbial CRISPR-CAS system in the human microbiome by systematic metagenome analyses combined with deep learning techniques?
Researchers
- Dr. Zhiluo Deng
- Dr. Fernando Meyer
- Dr. Philipp Münch
- Dr. Nasim Safaei
- Steven Medina
- Dr. Ehsaneddin Asgari (associated)
Collaborators
- Benjamin Maasoumy, Anke Kraft, Markus Cornberg, Hannover Medical School, Hannover, Germany
- Curtis Huttenhower, Harvard T.H. Chan School of Public Health, Boston, MA, U.S.
- Barbara Stecher, Medical Microbiology and Hospital Epidemiology, Max von Pettenkofer Institute, Ludwig Maximilian University of Munich, Munich, Germany
- the CAMI initiative
- Till Strowig, Helmholtz Centre for Infection Research (HZI), Braunschweig, Germany
Past collaborators
- Justin O’Grady & Gemma Kay, Quadram Institute, Norwich, UK
- Thomas Schulz, Hannover Medical School, Hannover, Germany
- Paul Schulze-Lefert, Max Planck Institute for Plant Breeding Research, Cologne, Germany
- Phil Pope and Vincent Eijsink, Norwegian University of Life Sciences, Aas, Norway
- Johannes Gescher, Institute of Applied Biosciences (IAB), Karlsruhe Institute of Technology (KIT), Karlsruhe, Germany
- Mark Morrison, CSIRO Livestock Industries, Queensland, Australia
- Jeffrey Gordon and Peter Turnbaugh, Center for Genome Sciences, Washington University, St. Louis, Missouri, USA
- Phil Hugenholtz, Australian Center for Ecogenomics, Queensland, Australia
- Isidore Rigoutsos, Computational Medicine Center, Thomas Jefferson University, Philadelphia, Pennsylvania, USA
- Andreas Brune, Research Group Leader, Department of Biogeochemistry, Max Planck Institute for Terrestrial Microbiology, Marburg, Germany
- Mila Chistoserdova, Department of Chemical Engineering, University of Washington, Seattle, Washington, USA
Selected publications
*Z.-L. Deng, *N. Safaei, S. *L. Schütte, V. Ohlendorf, *B. Maasoumy, *A. C. McHardy. High Prevalence and Local Dissemination of Daptomycin-Resistance Mutations for Enterococcus faecium in Cirrhotic Patients (*shared first and senior authors). doi: 10.1053/j.gastro.2025.08.046. Gastroenterology 2026, 170(1): 228–230.
K. Hu, F. Meyer, Z.-L. Deng, E. Asgari, T. H. Kuo, P. C. Münch, A. C. McHardy. Assessing computational predictions of antimicrobial resistance phenotypes from microbial genomes. doi: 10.1093/bib/bbae206. Briefings in Bioinformatics 2024, 25(3): bbae206.
C. Huttenhower, R. D. Finn, A. C. McHardy. Challenges and opportunities in sharing microbiome data and analyses. doi: 10.1038/s41564-023-01484-x. Nature Microbiology 2023, 8(11): 1960–1970.
P. C. Münch, C. Eberl, S. Woelfel, …, C. Huttenhower, *A. C. McHardy, *B. Stecher. Pulsed antibiotic treatments of gnotobiotic mice manifest in complex bacterial community dynamics and resistance effects (*shared last authors). doi: 10.1016/j.chom.2023.05.013. Cell Host & Microbe 2023, 31(6): 1007–1020.e4.
F. Meyer, A. Fritz, Z.-L. Deng, …, A. C. McHardy. Critical Assessment of Metagenome Interpretation: the second round of challenges. doi: 10.1038/s41592-022-01431-4. Nature Methods 2022, 19(4): 429–440.
Z.-L. Deng, P. C. Münch, R. Mreches, A. C. McHardy. Rapid and accurate identification of ribosomal RNA sequences via deep learning. doi: 10.1093/nar/gkac112. Nucleic Acids Research 2022, 50(10): e60.
P. C. Münch, E. A. Franzosa, B. Stecher, *A. C. McHardy, *C. Huttenhower. Identification of Natural CRISPR Systems and Targets in the Human Microbiome (*shared last authors). doi: 10.1016/j.chom.2020.10.010. Cell Host & Microbe 2021, 29(1): 94–106.e4.
A. Khaledi, A. Weimann, M. Schniederjans, E. Asgari, T. H. Kuo, A. Oliver, G. Cabot, A. Kola, P. Gastmeier, M. Hogardt, D. Jonas, M. R. Mofrad, A. Bremges, *A. C. McHardy, *S. Häussler. Predicting antimicrobial resistance in Pseudomonas aeruginosa with machine learning-enabled molecular diagnostics (*shared last authors). doi: 10.15252/emmm.201910264. EMBO Molecular Medicine 2020, 12(3): e10264.
A. Sczyrba, P. Hofmann, P. Belmann, …, A. C. McHardy. Critical Assessment of Metagenome Interpretation—a benchmark of metagenomics software. doi: 10.1038/nmeth.4458. Nature Methods 2017, 14(11): 1063–1071.
K. Patil, P. Haider, P. B. Pope, P. J. Turnbaugh, M. Morrison, T. Scheffer, A. C. McHardy. Taxonomic metagenome sequence assignment with structured output models. doi: 10.1038/nmeth0311-191. Nature Methods 2011, 8(3): 191–192.
Machine Learning
High-throughput sequencing combined with metagenomics has uncovered thousands of novel bacterial species directly from samples without isolation or culturing. This enables fine-grained analyses of the functions of microbial community members, and the study of their association with phenotypes and environments, as well as studies of microevolution and adaptation to changing environmental conditions. These sequencing efforts led to a sheer amount of data that can be analyzed using machine learning methods and can help to uncover relationships between human health and diseases.
In this research focus we are:
- Developing novel machine learning methods implemented as software libraries that can be used by the research communities.
- Developing efficient machine learning models that utilize the vast amount of unlabeled biological data.
- Applying machine learning methods on microbial datasets to analyze the interaction between bacteria and the mammalian host (see Computational microbiome research).
- Developing machine and deep learning models to predict properties of molecules and identify their most important structural characteristics.
- Developing machine learning methods to predict B-cell epitopes for epitope mapping and designing novel vaccines.
We focus on:
- GenomeNet - A deep neural network for genomic modelling, semi-supervised classification and imputation: The GenomeNet project is a BMBF funded joint research enterprise of the Helmholtz Centre for Infection Research and the University of Munich with close collaboration with the Harvard T.H. Chan School of Public Health. In this project we aim to develop customized deep learning network architectures which are particularly suited for modeling of large nucleotide sequences. These networks will then be employed on bacterial, viral and human genomes with the goal to understand the complex structures underlying the code of life. This work is funded by the Federal Ministry of Education and Research (031L0199A).
- Learning structures in the CRISPR-Cas system using deep learning architectures: In this project we apply statistical models to specify structural properties of CRISPR cassettes. These properties will be used to further describe potential functions and their classification. This work is funded by the Deutsche Forschungsgemeinschaft (405892038).
Selected publications
Collaborators
- Prof. Bernd Bischl, Ludwig Maximilian University of Munich
- Prof. Bärbel Stecher, Ludwig Maximilian University of Munich
- Prof. Curtis Huttenhower, Harvard School of Public Health
- Dr. Eric Franzosa, Harvard School of Public Health
- Volkswagen Lab
Ongoing Grants
2026 “Advancing Research Infrastructures to Support Infection Research at DZIF” (DZIF - BMFTR)
2026 “Design of an optimized HCV vaccine candidate” (DZIF - BMFTR)
2025 “Computational Tools and Resources for PAIS Data Analysis (ACT-PDA)” (BMFTR)
2025 “Individualized prevention and treatment of infections in patients with liver cirrhosis (INDIVO)” (zukunft.niedersachsen - Lower Saxony Ministry of Science and Culture and the Volkswagen Foundation)
2025 “Deep Learning Integration of Genomic Sequences, Transcriptomics and Interaction Networks for Phenotype Prediction in Eukaryotes” (Helmholtz AI)
2024 “Randomized trial assessing individualized microbiota-based prevention of healthcare-associated infections with multidrug-resistant Enterobacteriaceae (RESET-MDR)” (DZIF - BMFTR)
2021 “NFDI4Microbiota - National Research Data Infrastructure for Microbiota Research” (DFG)
2019 “RESIST - Resolving Infection Susceptibility“ – Exzellenzcluster 2155, Deutsche Forschungsgemeinschaft (DFG)
Concluded Grants
2023 “Establishing data broker functionalities at HZI for optimizing omics data submissions to public repositories” (NFDI4Microbiota Strategy Funds Project – DFG)
2022 “DZIF-IT Plattform“ (DZIF - BMFTR)
2021 “GHGA-Microbiota: Increasing the multimodal use of human and microbiome-related omics data” (GHGA Flex Funds Project - DFG)
2021 “Hepatitis C Control: Towards prophylaxis and identification of those in need of treatment” (DZIF - BMFTR)
2020 “DZIF Bioinformatics Platform“ (DZIF - BMFTR)
2020 “Proof of concept study of a SARS-COV-2 vaccine based on recombinant spike protein“ (Lower Saxony Ministry of Science and Culture (MWK))
2019 “Paving the way towards individualized vaccination (i.Vacc) - Exploring multi-omics Big Data in the general population based on a digital mHealth cohort” (Volkswagen Foundation)
2019 “Drug discovery and cheminformatics for new anti-infectives (iCA)” (Lower Saxony Doctoral Program)
2019 “Rational design of a universal flu vaccine using recombinant neuraminidase” (Global Grand Challenges of the Bill & Melinda Gates Foundation)
2019 “GenomeNet: A deep neural network for genomic modelling, semi-supervised classification and imputation” (BMFTR)
2018 “Learning structures in the CRISPR-Cas system using deep learning architectures” – SPP2141 Deutsche Forschungsgemeinschaft (DFG)
2017 “HiGHmed (Heidelberg-Göttingen-Hannover Medizininformatik)“ (BMFTR)
2017 “Sparse2Big: Data fusion and imputation from massive sparse data consortium” (Information and Data Science Initiative, Helmholtz Society)
2017 “Bioinformatics support for the development of a prophylactic HCV vaccine candidate” (Deutsches Zentrum für Infektionsforschung (DZIF - BMFTR)
2017 “Communities Allied in Infection coalition" (Volkswagen Foundation)
2016 “A Method for Tracking CRISPR/Plasmidome Dynamics in Complex Bacterial Communities“ (DFG)
2014 “Isolation and characterization of novel azidophilic archaea (with J. Gescher)” (DFG)
2014 “DZIF Bioinformatics” (DZIF - BMFTR)
Further Groups of the Department Computational Biology of Infection Research
Agentic AI in Data-Driven Infection Research – Dr. Philipp C. Münch
Dr. Philipp C. Münch works at the interface of artificial intelligence, bioinformatics, and infection research. He has an interdisciplinary background in bioinformatics and epidemiology, with a B.Sc. in Bioinformatics and an M.Sc. in Epidemiology, and received his PhD from LMU Munich in 2022. He is a permanent research scientist at the Helmholtz Centre for Infection Research and is additionally affiliated with the Huttenhower Lab at the Harvard T.H. Chan School of Public Health.
Recent developments in large language models and agentic AI systems are opening new possibilities for data-driven infection research. At the same time, they raise fundamental questions about reliability, reproducibility, transparency, biological validity, and trustworthiness. Our goal is to systematically evaluate the capabilities and limitations of these methods and to define settings, workflows, and benchmarks that maximize their value for bioinformatics and infection research.
We study these questions in the context of several applied problems: microbial genome interpretation, phenotype prediction, metagenomic analysis, automated literature-based knowledge extraction, reproducible computational workflows, and FAIR data capture. Across these areas, our team benchmarks, evaluates, adapts, and applies machine learning, deep learning, large language models, and AI-agent-based systems to concrete biological questions.
We are interested in how agentic AI systems can support future data-driven infection research by connecting data, models, literature, workflows, and biological interpretation. Such systems can help researchers move from raw sequencing data and published evidence toward structured knowledge, reproducible analyses, interpretable predictions, and improved biological databases.
One central application area is phenotype prediction from genomic data. We investigate how information encoded in complete genomes can be combined with literature-derived knowledge, phenotype databases, molecular profiles, and biological networks to infer biologically relevant traits. These approaches are relevant for antimicrobial resistance, microbial physiology, host–pathogen interactions, biotechnology, and the broader interpretation of genomic variation.
We are particularly interested in making deep-learning models for genomics more robust, interpretable, and reproducible. This includes model benchmarking, comparison, documentation, error analysis, interpretation, and transfer across datasets, organisms, and experimental settings. The aim is to better understand when genomic models generalize, where they fail, and how their predictions can be connected to biological evidence.
Part of our work is funded through a Helmholtz AI call, supporting the development and application of multimodal deep-learning approaches for phenotype prediction from genomic and molecular data. In this context, we investigate how models trained on rich experimental datasets can improve our understanding of genotype–phenotype relationships and provide a foundation for transfer to other biological systems.
Within the DZIF-TI BBD and related activities in NFDI4Microbiota, we work together with BacDive to augment missing phenotype information with high-quality AI-inferred predictions. This aims to make microbial phenotype information more complete, accessible, FAIR, and useful for infection research, microbiology, and downstream computational analyses.
Beyond predictive modeling, our team also works on meta-science and infrastructure for bioinformatics workflows. We develop tools and concepts that improve how sequencing studies are planned, documented, executed, and reproduced. One example is SeqDesk.com, a sample and study management platform for sequencing centers. SeqDesk is designed to improve structured data capture at the point where samples and studies enter sequencing workflows, helping to make research data more FAIR, reusable, and suitable for downstream computational analysis.
We also have ongoing projects on the automation of reproducibility in bioinformatics, including work connected to the Critical Assessment of Metagenome Interpretation (CAMI) challenge. Here, we explore how AI agents can support reproducible computational benchmarking, workflow execution, method comparison, and the interpretation of complex metagenomic analysis tasks.
A further research direction is the use of AI to process scientific literature and extract structured biological knowledge. We investigate how research papers can be analyzed with AI methods to identify relevant models, datasets, biological entities, phenotypes, experimental evidence, and methodological claims. This supports knowledge extraction, the augmentation of deep-learning models, and the development of predictive systems that connect published evidence with computational biology.
Our Research
- How can agentic AI systems be evaluated and used reliably in data-driven infection research?
- How can AI agents connect data capture, literature mining, model training, prediction, validation, and database augmentation?
- How can large language models support microbial phenotype annotation, biological interpretation, and structured knowledge extraction?
- How can deep-learning models for genomics be made more robust, interpretable, reproducible, and transferable across datasets and organisms?
- How can multimodal AI integrate genome sequences, molecular profiles, biological networks, phenotype databases, and literature-derived evidence?
- How can AI agents improve reproducible bioinformatics workflows, benchmarking, and interpretation of metagenomic analyses?
- How can sequencing centers capture sample and study metadata in a more structured and FAIR way?
Selected Publications
Jordan S. L. Jensen, Sagun Maharjan, Philipp C. Münch, Jiaxian Shen, Bailey Bowcutt, Jack T. Sumner, Xochitl C. Morgan, Kelsey N. Thompson, Long H. Nguyen, Eric A. Franzosa, Curtis Huttenhower. Enhanced multi-omic viral profiling from microbial community sequencing with BAQLaVa. bioRxiv, 2026.
Philipp C. Münch, Nasim Safaei, René Mreches, Martin Binder, Yichen Han, Gary Robertson, Eric A. Franzosa, Curtis Huttenhower, Alice C. McHardy. Comparative Assessment of Large Language Models for Microbial Phenotype Annotation. bioRxiv, 2025.
Kaixin Hu, Fernando Meyer, Zhi-Luo Deng, Ehsaneddin Asgari, Tzu-Hao Kuo, Philipp C. Münch, Alice C. McHardy.
Assessing computational predictions of antimicrobial resistance phenotypes from microbial genomes. Briefings in Bioinformatics, 2024.
Hüseyin Anil Gündüz, Martin Binder, Xiao-Yin To, René Mreches, Bernd Bischl, Alice C. McHardy, Philipp C. Münch, Mina Rezaei.
A self-supervised deep learning method for data-efficient training in genomics. Briefings in Bioinformatics, 2023.
Amadeu Scheppach, Hüseyin Anil Gündüz, Emilio Dorigatti, Philipp C. Münch, Alice C. McHardy, Bernd Bischl, Mina Rezaei, Martin Binder.
Neural architecture search for genomic sequence data. IEEE Conference on Computational Intelligence in Bioinformatics and Computational Biology, 2023.
Philipp Münch, Hüseyin Anil Gündüz, René Mreches, Julia Moosbauer, Gary Robertson, Xiao-Yin To, Eric Franzosa, Curtis Huttenhower, Mina Rezaei, Alice McHardy, Bernd Bischl, Martin Binder. Optimized model architectures for deep learning on genomic data. 2023.
Team
- Dr. Philipp C. Münch
- Tim Tjalma
- Yichen Han
Alumni
- Xiao-Yin To
- René Mreches
Collaboration and Funding
Our work is supported in part by a Helmholtz AI call and embedded in collaborative infection research activities, including work within the DZIF-TI BBD together with BacDive. The team also contributes to reproducibility and benchmarking activities connected to the Critical Assessment of Metagenome Interpretation (CAMI) challenge. Dr. Philipp C. Münch is additionally affiliated with the Huttenhower Lab at the Harvard T.H. Chan School of Public Health.
Job Offers
The Research Department “Computational Biology for Infection Research” welcomes applications at all seniority levels in the following fields:
- Bioinformatics / Computational Biology
- Computer Science
- Statistics
- Biomathematics
We are looking for motivated applicants with a strong background in the above-mentioned fields, good coding skills and interest in interdisciplinary research in biology and infection research.
Methods that we employ or develop in our research are related to the fields of:
- Bioinformatics
- Machine learning and deep learning
- Phylogenetics and population genetics
The goal of our projects is the development of algorithms and computer-aided methods to analyze the human microbiome, viral and bacterial pathogens, and human cell lineages within individual patients.
Some, but not all, currently open positions are listed below. Therefore unsolicited applications are always welcome. Please send your applications to Jobs.
Open job offers:
None at the moment
Deep Learning for Molecular Biology
The seminar "Deep Learning for molecular biology" is hosted by the department "Computational Biology of Infection Research" at the HZI headed by Prof. Alice McHardy.
- Kick-off Meeting: 15th April 2026, 9.30h
- Room Kick-off Meeting: BRICS, Raum 01 (Ground floor)
- Seminar Date: TBA
- Room seminar: TBA
- Max. number of participants: 10
- Language: English
- Modus: 30 minutes of presentation (including discussion) + 5 pages written summary
- Designated for Bachelor and Master Students of Computer and Data Scienc
- Prerequisites: Familiarity with programming in Python and Linear Algebra (matrix / vector multiplications)
In case you have questions about the seminar, contact Mohammad Hadi Foroughmand Araabi.
Description:
Recently, deep neural network models have revolutionized machine learning research and achieved state-of-the-art performance in almost every related research, including computer vision, natural language processing, and computational biology. The goal of this seminar is to teach the basic principles of deep learning along with some basic implementations in pytorch framework. We will explore the most fundamental neural architectures including convolutional neural networks, recurrent neural networks, and autoencoders as well as language-model based representation learning methods.
Topics:
- Deep Neural Networks and back propagation
- Convolutional Neural Networks (CNN)
- Recurrent Neural Networks (RNN)
- Transformer models