Diversity and Dynamics of Genomes (Seminar)

392217 Wittler Winter 2017 Tuesday 14-16 in U10-146 ekvv

Course Description

This semester, everybody is invited to give a tutorial on a selected topic, e.g., a method he/she is working with.

Schedule

Date Name Topic Details/Materials
10.10. Administratives
17.10.
24.10.
31.10. – exceptional national holiday –
07.11. Omar Introduction to Systems Biology: Dynamic mathematical modelling (with ODEs) Requirements: Python with Scipy represilator.py.zip
14.11. Georges Introduction to color and color usage Short presentation about color, terminology, tips, and useful links. slides
21.11.
28.11. Linda Group Games - For Youth Groups or Your Next Flat Party just be open to play some funny/silly games
05.12. Lukas Data Science in Python - Crash Course Python with juypter notebook, pandas, scikit-learn and more, Anaconda distribution of python highly recommended, Github
12.12. Roland Basics in pictorial design with application in polarization photography results in /vol/didy/Pictures/20171212_Polarization
19.12. Robert Short introduction to analytic combinatorics (Christmas edition) Mathematica or Wolfram Programming Lab (for hands-on experience, fully optional). slides, notebooks
09.01. Michel T. TikZ You'll find the latest version of the manual here.
16.01. Markus Merits and Pitfalls of Clustering
23.01. Guillaume Bifrost: Highly Parallel and Memory Efficient Compacted de Bruijn Graph Construction (abstract below)
30.01. Karsten Brewing 101 Basic introduction into brewing of beer. Tasting included if some beer is ready.

Bifrost: Highly Parallel and Memory Efficient Compacted de Bruijn Graph Construction

De Bruijn graphs are the core data structure for a wide number of whole genome and transcriptome assemblers processing High Throughput Sequencing datasets. However, memory consumption of such assemblers is often prohibitive, due to the large number of vertices and edges in the graph, to the point of hindering the use of assemblers on large and complex genomes. Most short-read assemblers based on the de Bruijn graph paradigm reduce the assembly complexity and memory usage by compacting first all maximal non-branching paths of the graph into single vertices. Yet, such a compaction is challenging as it requires the uncompacted de Bruijn graph to be available in memory. We present a new parallel and memory efficient algorithm enabling the direct construction of the compacted de Bruijn graph without producing the intermediate uncompacted de Bruijn graph. Our method relies on a space and time efficient data structure, the Bloom filter, enhanced with minimizer hashing to increase cache performance. Despite making extensive use of a probabilistic data structure, our algorithm guarantees that the produced compacted de Bruijn graph is deterministic. Furthermore, the algorithm features de Bruijn graph simplification steps used by assemblers such as tip clipping and isolated unitig removal. In addition, as disk-based software performance is significantly affected by the discrepancy of speed among disk storage technologies, our method uses only main memory storage. Experimental results show that our algorithm is competitive with state-of-the-art de Bruijn graph compaction methods.