High-Throughput-technologies generate large amounts of data. The drawback: these biological and medical data must be processed to results with statistical methods and models. Concepts from robust and computational statistics as well as visualisation techniques help to better understand, estimate and cope with the uncertainty and strong variation that are often inherent in biological and medical data.
The project group „Biostatistics“ is part of the research group „Cellular Proteome Research” which is led by Lothar Jänsch.
Infection researchers are rarely specialists in Bionformatics & Statistics. But modern infection research is affected by large amounts of data. We develop and apply statistical methods and models to analyse and evaluate biological and medical data, especially from high throughput experiments. Concepts from robust and computational statistics as well as visualisation techniques help to better understand, estimate and cope with the uncertainty and strong variation that are often inherent in biological and medical data.
Cellular bio-molecules are accessible by different high throughput technologies and are the target of systematic analyses in order to reveal the molecular workings of organisms, which is essential for understanding infection diseases and to find treatments against them. Data analysis of high throughput experiments can provide novel and biological relevant hypotheses, but often constitutes a challenge and severe limitation for functional genomics and the following aspects need to be considered:
Functional genomics at HZI aims at identifying molecular components involved in infection. However, the types (gene, RNA, protein, metabolite…) and number of components contributing to the observed infection phenotypes are often unknown at the beginning of a project. We utilise methods from exploratory data analysis and data mining as well as corrections for multiple testing to handle such multivariate data sets and to reveal components actually involved in infection.
Data from functional genomics are highly complex and computational efficiency plays an important role for the feasible analysis of such data, so that we put a special emphasis on the development of efficient and fast data analysis algorithms.
Raw data (like MS and imaging technologies) are naturally corrupted by noise. We build special methods and noise models to distinguish biological from random effects. Although the data sets themselves can be large, in most cases, few repetitions per experiment are available, leading to problems with classical statistical methods often based on larger sample sizes. Thus, a careful design of experiment as well as suitable statistical techniques ─ for instance robust and Bayesian methods ─ are needed to generate reliable hypotheses in functional genome studies.