Differences

This shows you the differences between two versions of the page.

Link to this comparison view

Both sides previous revision Previous revision
Next revision
Previous revision
teaching:2025summer:kmer_algbioinf [2025/03/28 10:12]
rwittler
teaching:2025summer:kmer_algbioinf [2025/06/17 09:05] (current)
rwittler [Timetable]
Line 11: Line 11:
 \\ \\
  
-The overarching topic of this semester are **k-mers** (a.k.a. //​q-grams//​). This simple concept builds a basis for many algorithmic solutions in bioinformatics,​ such as assembly, alignment, genome comparison, pangenomics,​ etc.\\+The overarching topic of this semester are **//k//-mers** (a.k.a. //​q-grams//​). This simple concept builds a basis for many algorithmic solutions in bioinformatics,​ such as assembly, alignment, genome comparison, pangenomics,​ etc.\\
 \\ \\
 +
 +To practice algorithm design and presentation,​ each participant will specify a simple //k//-mer counting algorithm and present it using pseudocode (a LaTeX template will be provided). Afterwards, they will implement the algorithm as a basic prototype. In a coding showdown, all implementations will battle it out to see which one takes the crown! (This can be credited as "​392041 Implementation of Algorithms (Ü)", 1 LP.) 
 +
 +\\
 +
  
 Possible concrete methods/​publications to be presented/​discussed in the seminar are: Possible concrete methods/​publications to be presented/​discussed in the seminar are:
Line 35: Line 40:
 ==== Counting k-mers ==== ==== Counting k-mers ====
  
-  * Kokot, Marek, Maciej Długosz, and Sebastian Deorowicz. "KMC 3: counting and manipulating k-mer statistics."​ [[https://​academic.oup.com/​bioinformatics/​article/​33/​17/​2759/​3796399?​login=false|Bioinformatics 33.17 (2017): 2759-2761.]] (plus previous versions)+  * Kokot, Marek, Maciej Długosz, and Sebastian Deorowicz. "KMC 3: counting and manipulating ​//k//-mer statistics."​ [[https://​academic.oup.com/​bioinformatics/​article/​33/​17/​2759/​3796399?​login=false|Bioinformatics 33.17 (2017): 2759-2761.]] (plus previous versions)
   * Lucas Robidou and Pierre Peterlongo. "​fimpera:​ drastic improvement of Approximate Membership Query data-structures with counts"​ [[https://​www.biorxiv.org/​content/​10.1101/​2022.06.27.497694v3|bioRxiv (2023)]]   * Lucas Robidou and Pierre Peterlongo. "​fimpera:​ drastic improvement of Approximate Membership Query data-structures with counts"​ [[https://​www.biorxiv.org/​content/​10.1101/​2022.06.27.497694v3|bioRxiv (2023)]]
   *  Téo Lemane, Paul Medvedev, Rayan Chikhi, and Pierre Peterlongo. "​kmtricks:​ efficient and flexible construction of Bloom filters for large sequencing data collections."​ [[https://​academic.oup.com/​bioinformaticsadvances/​article/​2/​1/​vbac029/​6576015|Bioinformatics Advances, 2.1 (2022)]]   *  Téo Lemane, Paul Medvedev, Rayan Chikhi, and Pierre Peterlongo. "​kmtricks:​ efficient and flexible construction of Bloom filters for large sequencing data collections."​ [[https://​academic.oup.com/​bioinformaticsadvances/​article/​2/​1/​vbac029/​6576015|Bioinformatics Advances, 2.1 (2022)]]
-  *  Jens Zentgraf and Sven Rahmann. "Fast gapped k-mer counting with subdivided multi-way bucketed Cuckoo hash table."​ [[https://​drops.dagstuhl.de/​opus/​volltexte/​2022/​17046/​pdf/​LIPIcs-WABI-2022-12.pdf|In 22nd International Workshop on Algorithms in Bioinformatics (WABI). Schloss Dagstuhl-Leibniz-Zentrum für Informatik (2022)]]+  *  Jens Zentgraf and Sven Rahmann. "Fast gapped ​//k//-mer counting with subdivided multi-way bucketed Cuckoo hash table."​ [[https://​drops.dagstuhl.de/​opus/​volltexte/​2022/​17046/​pdf/​LIPIcs-WABI-2022-12.pdf|In 22nd International Workshop on Algorithms in Bioinformatics (WABI). Schloss Dagstuhl-Leibniz-Zentrum für Informatik (2022)]]
  
 ==== Storing k-mers ==== ==== Storing k-mers ====
  
-  *  Sebastian Schmidt and Jarno N. Alanko. "​Eulertigs:​ minimum plain text representation of k-mer sets without repetitions in linear time." [[https://​link.springer.com/​article/​10.1186/​s13015-023-00227-1|Algorithms for Molecular Biology 18(1), 5 (2023)]]+  *  Sebastian Schmidt and Jarno N. Alanko. "​Eulertigs:​ minimum plain text representation of //k//-mer sets without repetitions in linear time." [[https://​link.springer.com/​article/​10.1186/​s13015-023-00227-1|Algorithms for Molecular Biology 18(1), 5 (2023)]]
   * Giulio Ermanno Pibir. "​Sparse and skew hashing of K-mers."​ [[https://​academic.oup.com/​bioinformatics/​article/​38/​Supplement_1/​i185/​6617506?​login=true|Bioinformatics 38.Suppl1 (2022):​i185-i194.]]   * Giulio Ermanno Pibir. "​Sparse and skew hashing of K-mers."​ [[https://​academic.oup.com/​bioinformatics/​article/​38/​Supplement_1/​i185/​6617506?​login=true|Bioinformatics 38.Suppl1 (2022):​i185-i194.]]
  
 +==== Pangenomics ====
  
 +  * Luca Parmigiani, Roland Wittler, and Jens Stoye. “Revisiting pangenome openness with k-mers.” [[https://​peercommunityjournal.org/​item/​10.24072/​pcjournal.415.pdf|Peer Community Journal (4):e47 (2024)]] ​
  
 ==== Timetable ==== ==== Timetable ====
  
 | 09.04. | Organization, ​ topic selection, Scientific reading ​ | {{:​teaching:​2022summer:​algbioinf:​howtoread.pdf|Slides:​ HowToRead}} | | 09.04. | Organization, ​ topic selection, Scientific reading ​ | {{:​teaching:​2022summer:​algbioinf:​howtoread.pdf|Slides:​ HowToRead}} |
-| 16.04. |  +| 16.04. | Pseudocode | [[https://​texdoc.org/​serve/​algorithm2e/​0|algorithm2e-docu]],​ {{:​teaching:​2025summer:​kmer:​template_algo.tex.pdf|LaTeX template (remove .pdf from file name)}}, {{:​teaching:​2025summer:​kmer:​template_algo.pdf|notes/​example}} ​
-| 23.04. |  |+| 23.04. | -- (self study) | [[:​teaching:​2024summer:​karp:​writing|Scientific Writing]], [[https://​doi.org/​10.1371/​journal.pcbi.0030077|Ten Simple Rules for Making Good Oral Presentations,​ Philip E Bourne]], [[https://​doi.org/​10.1371/​journal.pcbi.1005373|Ten simple rules for short and swift presentations,​ Christopher J. Lortie]] ​|
 | 30.04. | -- | | 30.04. | -- |
-| 07.05. |  +| 07.05. | Pseudocode vorstellen ​
-| 14.05. |  +| 14.05. | KMC 3: counting and manipulating //k//-mer statistics | Kathrin ​
-| 21.05. |  |+| 21.05. | k-mer counting challenge | (fast) alle 
 | 28.05. | -- | | 28.05. | -- |
-| 04.05. |  | +| 04.06. | Fast gapped //k//-mer counting with subdivided multi-way bucketed Cuckoo hash table | Sofie  
-| 11.06. |  +|            | Minimizer-space de Bruijn graphs: Whole-genome assembly of long reads in minutes on a personal computer | Liliana ​
-| 18.06. |  +| 11.06. | Sparse and skew hashing of K-mers | Alena 
-| 25.06. |  |+| 18.06. | Space-efficient and exact de Bruijn graph representation based on a Bloom filter. (Minia) | Max 
 +| 25.06. | Minimap2: pairwise alignment for nucleotide sequences | Igor | 
 +|             | Revisiting pangenome openness with k-mers | Mathis ​|
 | 02.07. | -- | | 02.07. | -- |
-| 09.07. |  |+| 09.07. | |
 | 16.07. | -- | | 16.07. | -- |
    
 +==== Details on the k-mer counting exercise ====
 +
 +**Input:**
 +  * Multiple fasta file, containing nucleotide sequences (small or capital letters, maybe N's)
 +  * //k//-mer length **k**
 +  * threshold **c**
  
 +**Output:**
 +  * Number of canonical //k//-mers occurring at least **c** times.