Genome Informatics

Based on selected book chapters from Genome-Scale Algorithm Design (Veli Mäkinen, Djamal Belazzougui, Fabio Cunial, Alexandru I. Tomescu), the participants will give oral presentations (20-45 min) and write a short summary (5-10 pages) about the an algorithmic problem in bioinformatics and their solutions. Talks and essays must be done in English. The first day covers an overview of possible topics, which will then be distributed to the students. Aspects of scientific writing and presenting will be covered as well.

The overarching topic of this semester are string algorithms for genome-scale data, with a focus on fundamental algorithmic techniques underlying modern bioinformatics tools, including indexing, pattern matching, compressed data structures, and graph-based genome representations.

To practice algorithm design and presentation, each participant will present one core algorithmic idea from the assigned book chapter using pseudocode (in LaTeX). Afterwards, they will implement the algorihtm as a basic prototype in a programming language of their choice (esoteric programming languages are accepted, as long as accompanied by a proper explanation ;)).

Possible concrete chapters/topics to be presented and discussed in the seminar include:

5. Network flows

5.1 Flows and their decompositions
5.2 Minimum-cost flows and circulations
5.3 Bipartite matching problems
5.4 Covering problems

7. Hidden Markov models

7.1 Definition and basic problems
7.2 The Viterbi algorithm
7.3 The forward and backward algorithms
7.4 Estimating HMM parameters

8. Classical indexes

8.1 k-mer index
8.2 Suffix array
8.3 Suffix tree
8.4 Applications of the suffix tree

9. Burrows-Wheeler indexes

9.1 Burrows-Wheeler transform (BWT)
9.2 BWT index
9.3 Space-efficient construction of the BWT
9.4 Bidirectional BWT index

10. Alignment-based genome analysis

10.1 Variation calling
10.2 Pattern partitioning for read alignment
10.3 Dynamic programming along suffix tree paths
10.4 Backtracking on BWT indexes

11. Alignment-free genome analysis and comparison

11.1 De novo variation calling
11.2 Space-efficient genome analysis
11.3 Comparing genomes without alignment

12. Compression of genome collections

12.1 Lempel-Ziv parsing
- 12.1.1 Basic algorithm for Lempel-Ziv parsing
- 12.1.2 Space-efficient Lempel-Ziv parsing
12.2 Bit-optimal Lempel-Ziv compression

13. Fragment assembly

13.1 Sequencing by hybridization
13.2 Contig assembly
13.3 Scaffolding
13.4 Gap filling

14. Haplotype analysis

14.1 Haplotype assembly and phasing
- 14.1.1 Minimum error correction
- 14.1.2 Hardness
14.2 Haplotype matching and positional BWT

15. Pangenomics

15.1 Overview of pangenome representations
- 15.1.1 Colored de Bruijn graphs
- 15.2 Aligning reads to a pangenome
15.3 Variation calling over pangenomes

Timetable

Date	Topic	Name
17.04.2026	Organization of the seminar	–
22.04.2026	Selection of topics	–
29.04.2026	Slides: HowToRead, algorithm2e-docu, LaTeX template (remove .pdf from file name), notes/example	Luca
06.05.2026	The MOVE data structure	Jens
13.05.2026	Other basic data structures	Luca
20.05.2026	Compression using Lempel-Ziv parsing	Kiril
27.05.2026	– (Jens and Luca at conference)
03.06.2026	Common intervals in sequences	Adrian
10.06.2026
17.06.2026
24.06.2026
01.07.2026	Pseudocode presentations
08.07.2026
15.07.2026	Code presentations
22.07.2026