This is an old revision of the document!


https://ekvv.uni-bielefeld.de/kvv_publ/publ/vd?id=659868809 Parmigiani Summer 2026 Wednesday 10:15-11:45 U10-146

This course is held in English.

Based on selected book chapters from *Genome-Scale Algorithm Design* (Veli Mäkinen, Djamal Belazzougui, Fabio Cunial, Alexandru I. Tomescu), the participants will give oral presentations (20-45 min) and write a short summary (5-10 pages) about the an algorithmic problem in bioinformatics and their solutions. Talks and essays must be done in English. The first day covers an overview of possible topics, which will then be distributed to the students. Aspects of scientific writing and presenting will be covered as well.

The overarching topic of this semester are string algorithms for genome-scale data, with a focus on fundamental algorithmic techniques underlying modern bioinformatics tools, including indexing, pattern matching, compressed data structures, and graph-based genome representations.

To practice algorithm design and presentation, each participant will present one core algorithmic idea from the assigned book chapter using pseudocode (in LaTeX). Afterwards, they will implement the algorihtm as a basic prototype in a programming language of their choice (esoteric programming languages are accepted, as long as accompanied by a proper explanation ;)).

Possible concrete chapters/topics to be presented and discussed in the seminar include:

Indexing and Pattern Matching

  • Exact pattern matching algorithms
  • Suffix arrays and suffix trees
  • Burrows-Wheeler Transform (BWT)
  • FM-index and backward search

Compressed Data Structures

  • Rank and select data structures
  • Wavelet trees
  • Compressed suffix arrays
  • Succinct text indexes

k-mer and q-gram based methods

  • q-gram indexes
  • Minimizers and sketches
  • Applications of k-mer spectra
  • String similarity and filtering techniques

Graph-based genome representations

  • De Bruijn graphs
  • Compacted and colored de Bruijn graphs
  • Variation graphs and pan-genome models

Approximate Matching

  • Edit distance algorithms
  • Seed-and-extend paradigm
  • Filtering techniques for approximate string matching

Timetable

15.04.
22.04. Organization, topic selection, Scientific reading
29.04.
06.05.
13.05.
20.05.
27.05. luca at conference
03.06.
10.06.
17.06.
24.06.
01.07.
08.07.
15.07.
22.07.