Computational Biology for Infection Research

The Department of “Computational Biology for Infection Research” studies the human microbiome, viral and bacterial pathogens, and human cell lineages within individual patients by analysis of large-scale biological and epidemiological data sets with computational techniques. Focusing on high throughput meta’omics, population genomic and single cell sequencing data, we produce testable hypotheses, such as sets of key sites or relevant genes associated with the presence of a disease, of antibiotic resistance or pathogenic evasion of immune defense. We interact with experimental collaborators to verify our findings and to promote their translation into medical treatment or diagnosis procedures. To achieve its research goals, the department also develops novel algorithms and software.


Machine Learning

High-throughput sequencing combined with metagenomics has uncovered thousands of novel bacterial species directly from samples without isolation or culturing. This enables fine-grained analyses of the functions of microbial community members, and the study of their association with phenotypes and environments, as well as studies of microevolution and adaptation to changing environmental conditions. These sequencing efforts led to a sheer amount of data that can be analyzed using machine learning methods and can help to uncover relationships between human health and diseases. 

In this research focus we are:

  • Developing novel machine learning methods implemented as software libraries that can be used by the research communities.
  • Developing efficient machine learning models that utilize the vast amount of unlabeled biological data.
  • Applying machine learning methods on microbial datasets to analyze the interaction between bacteria and the mammalian host (see Computational microbiome research).
  • Developing machine and deep learning models to predict properties of molecules and identify their most important structural characteristics.
  • Developing machine learning methods to predict B-cell epitopes for epitope mapping and designing novel vaccines.

We focus on: 

  • GenomeNet - A deep neural network for genomic modelling, semi-supervised classification and imputation: The GenomeNet project is a BMBF funded joint research enterprise of the Helmholtz Centre for Infection Research and the University of Munich with close collaboration with the Harvard T.H. Chan School of Public Health. In this project we aim to develop customized deep learning network architectures which are particularly suited for modeling of large nucleotide sequences. These networks will then be employed on bacterial, viral and human genomes with the goal to understand the complex structures underlying the code of life. This work is funded by the Federal Ministry of Education and Research (031L0199A).
  • Learning structures in the CRISPR-Cas system using deep learning architectures: In this project we apply statistical models to specify structural properties of CRISPR cassettes. These properties will be used to further describe potential functions and their classification. This work is funded by the Deutsche Forschungsgemeinschaft (405892038).

Selected publications

  1. Asgari, E., McHardy, A. C., & Mofrad, M. R. (2019). Probabilistic variable-length segmentation of protein sequences for discriminative motif discovery (DiMotif) and sequence embedding (ProtVecX). Scientific reports, 9(1), 1-16.
  2. Asgari, Ehsaneddin, et al. "MicroPheno: predicting environments and host phenotypes from 16S rRNA gene sequencing using a k-mer based representation of shallow sub-samples." Bioinformatics 34.13 (2018): i32-i42.
  3. Khaledi, A., Weimann, A., Schniederjans, M., Asgari, E., Kuo, T. H., Oliver, A., ... & Häussler, S. (2020). Predicting antimicrobial resistance in Pseudomonas aeruginosa with machine learning‐enabled molecular diagnostics. EMBO molecular medicine, 12(3), e10264.
  4. Asgari, E., Münch, P. C., Lesker, T. R., McHardy, A. C., & Mofrad, M. R. (2019). DiTaxa: nucleotide-pair encoding of 16S rRNA for host phenotype and biomarker detection. Bioinformatics, 35(14), 2498-2500.
  5. Deng, Z. L., Münch, P. C., Mreches, R., & McHardy, A. C. (2022). Rapid and accurate identification of ribosomal RNA sequences via deep learning. Nucleic Acids Research.
  6. René Mreches, Alice C. McHardy, Bernd Bischl, Julia Moosbauer, Hüseyin Anil Gündüz, Sandra Klawitter, Zhi-Luo Deng, Eric Franzosa, Curtis Huttenhower, Gary Robertson, Ehsaneddin Asgari, Xiao-Yin To, Martin Binder, & Philipp C. Münch. (2021). GenomeNet/deepG: DeepG pre-release version (v0.1.0-alpha). Zenodo. 
  7. Asgari, E., Poerner, N., McHardy, A. C., & Mofrad, M. R. (2019). DeepPrime2Sec: deep learning for protein secondary structure prediction from the primary sequences. bioRxiv, 705426.


  • Philipp C. Münch
  • René Mreches
  • Dr. Ehsan Asgari
  • Akash Bahai
  • Kaixin Hu
  • Giorgos Kallergis


  • Prof. Bernd Bischl, Ludwig Maximilian University of Munich
  • Prof. Bärbel Stecher, Ludwig Maximilian University of Munich
  • Prof. Curtis Huttenhower, Harvard School of Public Health 
  • Dr. Eric Franzosa, Harvard School of Public Health
  • Volkswagen Lab
PrintSend per emailShare