K-mer basierte Algorithmen in der Bioinformatik (S)

392219 Wittler Summer 2025 Wednesday 10:15-11:45 U10-146


(Je nach Wunsch/Bedarf der Studierenden wird das Seminar auf Deutsch oder Englisch durchgeführt.
Depending on the wishes/demands of the students, this seminar can be held in English or German.)

Based on original research papers, the participants will give oral presentations (20-45 min) and write short summaries (5-10 pages) about algorithmic problems in bioinformatics and their solutions. Talks and essays can be done in German or English. The first day covers an overview of possible topics, which will then be distributed to the students. Aspects of scientific writing and presenting will be covered as well.

The overarching topic of this semester are k-mers (a.k.a. q-grams). This simple concept builds a basis for many algorithmic solutions in bioinformatics, such as assembly, alignment, genome comparison, pangenomics, etc.

To practice algorithm design and presentation, each participant will specify a simple k-mer counting algorithm and present it using pseudocode (a LaTeX template will be provided). Afterwards, they will implement the algorithm as a basic prototype. In a coding showdown, all implementations will battle it out to see which one takes the crown! (This can be credited as “392041 Implementation of Algorithms (Ü)”, 1 LP.)


Possible concrete methods/publications to be presented/discussed in the seminar are:

Assembly

De Bruijn Graphs

  • Holley, Guillaume, and Páll Melsted. “Bifrost: highly parallel construction and indexing of colored and compacted de Bruijn graphs.” Genome biology 21.1 (2020): 1-20.
  • Ekim, Barış, Bonnie Berger, and Rayan Chikhi. “Minimizer-space de Bruijn graphs: Whole-genome assembly of long reads in minutes on a personal computer.” Cell systems 12.10 (2021): 958-968.

Alignment

Counting k-mers

Storing k-mers

Pangenomics

Timetable

09.04. Organization, topic selection, Scientific reading Slides: HowToRead
16.04. Pseudocode algorithm2e-docu, LaTeX template (remove .pdf from file name), notes/example
23.04. – (self study) Scientific Writing, Ten Simple Rules for Making Good Oral Presentations, Philip E Bourne, Ten simple rules for short and swift presentations, Christopher J. Lortie
30.04.
07.05. Pseudocode vorstellen
14.05. KMC 3: counting and manipulating k-mer statistics Kathrin
Minimizer-space de Bruijn graphs: Whole-genome assembly of long reads in minutes on a personal computer Liliana
21.05. k-mer counting challenge (fast) alle
28.05.
04.05. Fast gapped k-mer counting with subdivided multi-way bucketed Cuckoo hash table Sofie
11.06. Velvet: algorithms for de novo short read assembly using de Bruijn graphs Simon
18.06. Space-efficient and exact de Bruijn graph representation based on a Bloom filter. (Minia) Max
Minimap2: pairwise alignment for nucleotide sequences Igor
25.06. Revisiting pangenome openness with k-mers Mathis
02.07.
09.07.
16.07.

Details on the k-mer counting exercise

Input:

  • Multiple fasta file, containing nucleotide sequences (small or capital letters, maybe N's)
  • k-mer length k
  • threshold c

Output:

  • Number of canonical k-mers occurring at least c times.