392174 Parmigiani Summer 2026 Wednesday 10:15-11:45 U10-146

This course is held in English.

Based on selected book chapters from Genome-Scale Algorithm Design (Veli Mäkinen, Djamal Belazzougui, Fabio Cunial, Alexandru I. Tomescu), the participants will give oral presentations (20-45 min) and write a short summary (5-10 pages) about the an algorithmic problem in bioinformatics and their solutions. Talks and essays must be done in English. The first day covers an overview of possible topics, which will then be distributed to the students. Aspects of scientific writing and presenting will be covered as well.

The overarching topic of this semester are string algorithms for genome-scale data, with a focus on fundamental algorithmic techniques underlying modern bioinformatics tools, including indexing, pattern matching, compressed data structures, and graph-based genome representations.

To practice algorithm design and presentation, each participant will present one core algorithmic idea from the assigned book chapter using pseudocode (in LaTeX). Afterwards, they will implement the algorihtm as a basic prototype in a programming language of their choice (esoteric programming languages are accepted, as long as accompanied by a proper explanation ;)).

Possible concrete chapters/topics to be presented and discussed in the seminar include:

5. Network flows

  • 5.1 Flows and their decompositions
  • 5.2 Minimum-cost flows and circulations
  • 5.3 Bipartite matching problems
  • 5.4 Covering problems

7. Hidden Markov models

  • 7.1 Definition and basic problems
  • 7.2 The Viterbi algorithm
  • 7.3 The forward and backward algorithms
  • 7.4 Estimating HMM parameters

8. Classical indexes

  • 8.1 k-mer index
  • 8.2 Suffix array
  • 8.3 Suffix tree
  • 8.4 Applications of the suffix tree

9. Burrows-Wheeler indexes

  • 9.1 Burrows-Wheeler transform (BWT)
  • 9.2 BWT index
  • 9.3 Space-efficient construction of the BWT
  • 9.4 Bidirectional BWT index

10. Alignment-based genome analysis

  • 10.1 Variation calling
  • 10.2 Pattern partitioning for read alignment
  • 10.3 Dynamic programming along suffix tree paths
  • 10.4 Backtracking on BWT indexes

11. Alignment-free genome analysis and comparison

  • 11.1 De novo variation calling
  • 11.2 Space-efficient genome analysis
  • 11.3 Comparing genomes without alignment

12. Compression of genome collections

  • 12.1 Lempel-Ziv parsing
    • 12.1.1 Basic algorithm for Lempel-Ziv parsing
    • 12.1.2 Space-efficient Lempel-Ziv parsing
  • 12.2 Bit-optimal Lempel-Ziv compression

13. Fragment assembly

  • 13.1 Sequencing by hybridization
  • 13.2 Contig assembly
  • 13.3 Scaffolding
  • 13.4 Gap filling

14. Haplotype analysis

  • 14.1 Haplotype assembly and phasing
    • 14.1.1 Minimum error correction
    • 14.1.2 Hardness
  • 14.2 Haplotype matching and positional BWT

15. Pangenomics

  • 15.1 Overview of pangenome representations
    • 15.1.1 Colored de Bruijn graphs
    • 15.2 Aligning reads to a pangenome
  • 15.3 Variation calling over pangenomes

Timetable

15.04.
22.04. Organization, topic selection, Scientific reading
29.04.
06.05.
13.05.
20.05.
27.05. luca at conference
03.06.
10.06.
17.06.
24.06.
01.07.
08.07.
15.07.
22.07.