Building "Google search engines" for microbial sequence
Prof. Dr. Zamin Iqbal, European Bioinformatics Institute (EMBL-EBI) will give a presentation entitled "Building 'Google search engines' for microbial sequence"
Although we continue to sequence and store progressively more genomes, we have barely scratched the surface of the genetic diversity of microbial life on Earth. One of the unexpected challenges is the fact that we don't have very good access even to the genomes that are publicly available. There are two reasons: first, 80% of publicly archived data is deposited as raw unassembled sequence, which is not (usefully) searchable by BLAST. Second, the full DNA archive is too big for standard tools like BLAST, and doubling every 18 months.
The ability to query these data for sequence search terms would facilitate both basic research and applications such as realtime genomic epidemiology and surveillance. To solve this problem, we combine knowledge of microbial population genomics with computational methods devised for web search to produce a searchable data structure named BItsliced Genomic Signature Index (BIGSI, https://www.ncbi.nlm.nih.gov/pubmed/30718882). We indexed the entire global corpus of 447,833 bacterial and viral whole-genome sequence datasets using four orders of magnitude less storage than previous methods - you can try the search out at http://bigsi.io. I will talk about how we applied our BIGSI search function to rapidly find resistance genes MCR-1, MCR-2, and MCR-3, determine the host-range of 2,827 plasmids, and quantify antibiotic resistance in archived datasets.
These kinds of method open up valuable resources for science, in just the same way that Google opens up the internet - this is an enabling technology rather than an end in itself. I will also discuss limitations of searching in DNA rather than protein space, and plans to address that.
Building and room
Building E8.1, Seminar Room/Ground Floor
Prof. Dr. Olga Kalinina
- Drug Bioinformatics- Prof. Dr. Olga Kalinina