Computational Biology of Infection Research

The research of the group focuses on the data-driven analysis of biological questions from infection research, as well as method development to solve prediction problems for large biological data sets.


Evolution of the Influenza Virus

Endemic influenza by estimates causes 500.000 deaths each year, in particular among young children and the elderly [1]. Three distinct viral types (A, B and C) circulate in the human population. While types B and C evolve slowly and circulate at low levels, influenza A through rapid evolution continuously evades host immunity from previous infection or vaccination and regularly causes large epidemics. The virus is a single-stranded, negative-sense RNA virus of the family Orthomyxoviridae with a genome consisting of eight segments.

The antigenic properties of a specific virus are defined by the surface glycoproteins, haemagglutinin (HA) and neuraminidase (NA). These proteins are under ongoing selection for change and continuously acquire mutations which result in reduced immune recognition in previously infected or vaccinated hosts -- a process referred to as antigenic drift. Larger changes in antigenicity can also be obtained by acquisition of different HA or NA genes through a process known as segment reassortment [4]. In this two viruses that co-infect the same cell produce a descendant virus which carries a permutation of their genome segments.

Vaccine Strain Selection

Immune protection can be achieved either by infection or vaccination, but is nonpermanent, due to the rapid evolution of the virus [2, 3]. Accordingly, the composition of influenza vaccines is updated annually, following suggestions of the World Health Organization (WHO). Although a successful match to the dominant circulating strain is achieved in the majority of cases, there is still room for improvement. We developed a computational method to map the genetic and antigenic evolution of the virus, to identify patches of sites under selection on the structure of viral proteins, and to improve the prediction of suitable vaccine strains.

Our computational decision procedure allows for quantitative interpretation of the growing amounts of data and may help to match the vaccine better to predominating strains in seasonal influenza epidemics and improves the recommendations made by the WHO.

Detecting and Characterizing Reassortment Events

The segmented nature of the influenza genome allows the exchange of genome segments between co-circulating viruses, leading to antigenic and other genetic differences. Such changes can provide a fitness advantage relative to previously predominating strains. Similar to lateral gene transfer events in Bacteria and Archaea, reassortment events result in discordant evolutionary histories for different genetic regions, which is why both types are also known as reticulation events [5].

We are researching computational models to detect reassortment events and to infer evolutionary networks that display the reticulate evolution of influenza A viruses. Identification of such events, in particular of events that have provided a selective advantage to the respective viruses, will increase our understanding of pathogenicity- and transmissibility-defining genetic regions for influenza A viruses.


  1. A.C. McHardy, B. Adams
    The role of genomics in tracking the evolution of influenza A virus
    PLoS Pathogens 2009, 5(10): e1000566 
  2. Medina, R.a. and A. García-Sastre
    Influenza A viruses: new research developments
    Nature Reviews Microbiology
    2011, 9(8): p. 590-603
  3. B. Adams, A.C. McHardy, C. Lundegaard, T. Lengauer
    Viral Bioinformatics

    in Modern Genome Annotation, D. Frishman, A. Valencia (Editors) Springer Verlag, 2009:  429-452
  4. L. Steinbrück, A.C. McHardy
    Inference of Genotype-Phenotype Relationships in the Antigenic Evolution of Human Influenza A (H3N2) Viruses
    PLoS Comput Biol 2012, 8(4): e1002492
  5.  C. Tusche, L. Steinbrück, A.C. McHardy
    Detecting patches of protein sites of influenza A viruses under positive selection
    Mol Biol Evol 2012, 29(8): 2063-2071
  6. L. Steinbrück, T.R. Klingen, A.C. McHardy
    Computational prediction of vaccine strains for human influenza A(H3N2) viruses
    J Virol 2014, 88(20): 12123-32
  7. Baroni, M., C. Semple, and M. Steel
    A Framework for Representing Reticulate Evolution
    Annals of Combinatorics
    2005, 8(4): p. 391-408.


  • Thorsten Klingen
  • Susanne Reimering
  • Kyra Mooren
  • Jens Loers


  • Gülsah Gabriel, Heinrich Pette Institute, Leibniz Institute for Experimental Virology, Hamburg
  • Eddie Holmes, School of Biological Sciences, University of Sydney, Australia
  • Dirk Hoeper and Michael Beer, Institute of Diagnostic Virology, Friedrich-Loeffler-Institut, Greifswald - Insel Riems, Germany
  • Juergen and Olga Stech, Institute of Molecular Biology, Friedrich-Loeffler-Institut, Greifswald - Insel Riems, Germany

Recent Highlights

Steinbrück, T.R. Klingen, A.C. McHardy
Computational prediction of vaccine strains for human influenza A(H3N2) viruses
J Virol 014, 88(20): 12123-32


 Human influenza A viruses are rapidly evolving pathogens that cause substantial morbidity and mortality in seasonal epidemics around the globe. To ensure continued protection, the strains used for the production of the seasonal influenza vaccine have to be regularly updated, which involves data collection and analysis by numerous experts worldwide. Computer-guided analysis is becoming increasingly important in this problem due to the vast amounts of generated data. We here describe a computational method for selecting a suitable strain for production of the human influenza A virus vaccine. It interprets available antigenic and genomic sequence data based on measures of antigenic novelty and rate of propagation of the viral strains throughout the population. For viral isolates sampled between 2002 and 2007, we used this method to predict the antigenic evolution of the H3N2 viruses in retrospective testing scenarios. When seasons were scored as true or false predictions, our method returned six true positives, three false negatives, eight true negatives, and one false positive, or 78% accuracy overall. In comparison to the recommendations by the WHO, we identified the correct antigenic variant once at the same time and twice one season ahead. Even though it cannot be ruled out that practical reasons such as lack of a sufficiently well-growing candidate strain may in some cases have prevented recommendation of the best-matching strain by the WHO, our computational decision procedure allows quantitative interpretation of the growing amounts of data and may help to match the vaccine better to predominating strains in seasonal influenza epidemics.

C. Kratsch, A.C. McHardy
Ridgeace: ridge regression for continuous ancestral character estimation on phylogenetic trees
Bioinformatics 2014, 30(17): i527-i533


Motivation: Ancestral character state reconstruction describes a set of techniques for estimating phenotypic or genetic features of species or related individuals that are the predecessors of those present today. Such reconstructions can reach into the distant past and can provide insights into the history of a population or a set of species when fossil data are not available, or they can be used to test evolutionary hypotheses, e.g. on the co-evolution of traits. Typical methods for ancestral character state reconstruction of continuous characters consider the phylogeny of the underlying data and estimate the ancestral process along the branches of the tree. They usually assume a Brownian motion model of character evolution or extensions thereof, requiring specific assumptions on the rate of phenotypic evolution.

Results: We suggest using ridge regression to infer rates for each branch of the tree and the ancestral values at each inner node. We performed extensive simulations to evaluate the performance of this method and have shown that the accuracy of its reconstructed ancestral values is competitive to reconstructions using other state-of-the-art software. Using a hierarchical clustering of gene mutation profiles from an ovarian cancer dataset, we demonstrate the use of the method as a feature selection tool.

L. Steinbrück, A.C. McHardy
Evaluating the Evolutionary Dynamics of Viral Populations
in Dynamic Models of Infectious Diseases, V. Sree Hari Rao, Ravi Durvasula (Editors), Springer Verlag, 2013: 205-225


Phylodynamic techniques combine epidemiological and genetic information to analyze the evolutionary and spatiotemporal dynamics of rapidly evolving pathogens, such as influenza A or human immunodeficiency viruses. We introduce allele dynamics plots (AD-plots) as a method for visualizing the evolutionary dynamics of a gene in a population. Using AD-plots, we propose to identify the alleles that are likely to be subject to directional selection. We analyze the method’s merits with a detailed study of the evolutionary dynamics of seasonal influenza A viruses. AD-plots for the major surface protein of seasonal influenza A (H3N2) and the 2009 swine-origin influenza A (H1N1) viruses show the succession of substitutions that became fixed in the evolution of the two viral populations. They also allow the early identification of those viral strains that later rise to predominance, which is important for the problem of vaccine strain selection. In summary, we describe a technique that reveals the evolutionary dynamics of a rapidly evolving population and allows us to identify alleles and associated genetic changes that might be under directional selection. The method can be applied for the study of influenza A viruses and other rapidly evolving species or viruses.

I. Gregor, L. Steinbrück, A.C. McHardy
PTree: pattern-based, stochastic search for maximum parsimony phylogenies
PeerJ 2013, 1: e89


Phylogenetic reconstruction is vital to analyzing the evolutionary relationship of genes within and across populations of different species. Nowadays, with next generation sequencing technologies producing sets comprising thousands of sequences, robust identification of the tree topology, which is optimal according to standard criteria such as maximum parsimony, maximum likelihood or posterior probability, with phylogenetic inference methods is a computationally very demanding task. Here, we describe a stochastic search method for a maximum parsimony tree, implemented in a software package we named PTree. Our method is based on a new pattern-based technique that enables us to infer intermediate sequences efficiently where the incorporation of these sequences in the current tree topology yields a phylogenetic tree with a lower cost. Evaluation across multiple datasets showed that our method is comparable to the algorithms implemented in PAUP* or TNT, which are widely used by the bioinformatics community, in terms of topological accuracy and runtime. We show that our method can process large-scale datasets of 1,000–8,000 sequences. We believe that our novel pattern-based method enriches the current set of tools and methods for phylogenetic tree inference. The software is available under:

PrintSend per emailShare