This shows you the differences between two versions of the page.
Both sides previous revision Previous revision Next revision | Previous revision | ||
teaching:2025summer:kmer_algbioinf [2025/03/28 10:12] rwittler |
teaching:2025summer:kmer_algbioinf [2025/06/17 09:05] (current) rwittler [Timetable] |
||
---|---|---|---|
Line 11: | Line 11: | ||
\\ | \\ | ||
- | The overarching topic of this semester are **k-mers** (a.k.a. //q-grams//). This simple concept builds a basis for many algorithmic solutions in bioinformatics, such as assembly, alignment, genome comparison, pangenomics, etc.\\ | + | The overarching topic of this semester are **//k//-mers** (a.k.a. //q-grams//). This simple concept builds a basis for many algorithmic solutions in bioinformatics, such as assembly, alignment, genome comparison, pangenomics, etc.\\ |
\\ | \\ | ||
+ | |||
+ | To practice algorithm design and presentation, each participant will specify a simple //k//-mer counting algorithm and present it using pseudocode (a LaTeX template will be provided). Afterwards, they will implement the algorithm as a basic prototype. In a coding showdown, all implementations will battle it out to see which one takes the crown! (This can be credited as "392041 Implementation of Algorithms (Ü)", 1 LP.) | ||
+ | |||
+ | \\ | ||
+ | |||
Possible concrete methods/publications to be presented/discussed in the seminar are: | Possible concrete methods/publications to be presented/discussed in the seminar are: | ||
Line 35: | Line 40: | ||
==== Counting k-mers ==== | ==== Counting k-mers ==== | ||
- | * Kokot, Marek, Maciej Długosz, and Sebastian Deorowicz. "KMC 3: counting and manipulating k-mer statistics." [[https://academic.oup.com/bioinformatics/article/33/17/2759/3796399?login=false|Bioinformatics 33.17 (2017): 2759-2761.]] (plus previous versions) | + | * Kokot, Marek, Maciej Długosz, and Sebastian Deorowicz. "KMC 3: counting and manipulating //k//-mer statistics." [[https://academic.oup.com/bioinformatics/article/33/17/2759/3796399?login=false|Bioinformatics 33.17 (2017): 2759-2761.]] (plus previous versions) |
* Lucas Robidou and Pierre Peterlongo. "fimpera: drastic improvement of Approximate Membership Query data-structures with counts" [[https://www.biorxiv.org/content/10.1101/2022.06.27.497694v3|bioRxiv (2023)]] | * Lucas Robidou and Pierre Peterlongo. "fimpera: drastic improvement of Approximate Membership Query data-structures with counts" [[https://www.biorxiv.org/content/10.1101/2022.06.27.497694v3|bioRxiv (2023)]] | ||
* Téo Lemane, Paul Medvedev, Rayan Chikhi, and Pierre Peterlongo. "kmtricks: efficient and flexible construction of Bloom filters for large sequencing data collections." [[https://academic.oup.com/bioinformaticsadvances/article/2/1/vbac029/6576015|Bioinformatics Advances, 2.1 (2022)]] | * Téo Lemane, Paul Medvedev, Rayan Chikhi, and Pierre Peterlongo. "kmtricks: efficient and flexible construction of Bloom filters for large sequencing data collections." [[https://academic.oup.com/bioinformaticsadvances/article/2/1/vbac029/6576015|Bioinformatics Advances, 2.1 (2022)]] | ||
- | * Jens Zentgraf and Sven Rahmann. "Fast gapped k-mer counting with subdivided multi-way bucketed Cuckoo hash table." [[https://drops.dagstuhl.de/opus/volltexte/2022/17046/pdf/LIPIcs-WABI-2022-12.pdf|In 22nd International Workshop on Algorithms in Bioinformatics (WABI). Schloss Dagstuhl-Leibniz-Zentrum für Informatik (2022)]] | + | * Jens Zentgraf and Sven Rahmann. "Fast gapped //k//-mer counting with subdivided multi-way bucketed Cuckoo hash table." [[https://drops.dagstuhl.de/opus/volltexte/2022/17046/pdf/LIPIcs-WABI-2022-12.pdf|In 22nd International Workshop on Algorithms in Bioinformatics (WABI). Schloss Dagstuhl-Leibniz-Zentrum für Informatik (2022)]] |
==== Storing k-mers ==== | ==== Storing k-mers ==== | ||
- | * Sebastian Schmidt and Jarno N. Alanko. "Eulertigs: minimum plain text representation of k-mer sets without repetitions in linear time." [[https://link.springer.com/article/10.1186/s13015-023-00227-1|Algorithms for Molecular Biology 18(1), 5 (2023)]] | + | * Sebastian Schmidt and Jarno N. Alanko. "Eulertigs: minimum plain text representation of //k//-mer sets without repetitions in linear time." [[https://link.springer.com/article/10.1186/s13015-023-00227-1|Algorithms for Molecular Biology 18(1), 5 (2023)]] |
* Giulio Ermanno Pibir. "Sparse and skew hashing of K-mers." [[https://academic.oup.com/bioinformatics/article/38/Supplement_1/i185/6617506?login=true|Bioinformatics 38.Suppl1 (2022):i185-i194.]] | * Giulio Ermanno Pibir. "Sparse and skew hashing of K-mers." [[https://academic.oup.com/bioinformatics/article/38/Supplement_1/i185/6617506?login=true|Bioinformatics 38.Suppl1 (2022):i185-i194.]] | ||
+ | ==== Pangenomics ==== | ||
+ | * Luca Parmigiani, Roland Wittler, and Jens Stoye. “Revisiting pangenome openness with k-mers.” [[https://peercommunityjournal.org/item/10.24072/pcjournal.415.pdf|Peer Community Journal (4):e47 (2024)]] | ||
==== Timetable ==== | ==== Timetable ==== | ||
| 09.04. | Organization, topic selection, Scientific reading | {{:teaching:2022summer:algbioinf:howtoread.pdf|Slides: HowToRead}} | | | 09.04. | Organization, topic selection, Scientific reading | {{:teaching:2022summer:algbioinf:howtoread.pdf|Slides: HowToRead}} | | ||
- | | 16.04. | | | + | | 16.04. | Pseudocode | [[https://texdoc.org/serve/algorithm2e/0|algorithm2e-docu]], {{:teaching:2025summer:kmer:template_algo.tex.pdf|LaTeX template (remove .pdf from file name)}}, {{:teaching:2025summer:kmer:template_algo.pdf|notes/example}} | |
- | | 23.04. | | | + | | 23.04. | -- (self study) | [[:teaching:2024summer:karp:writing|Scientific Writing]], [[https://doi.org/10.1371/journal.pcbi.0030077|Ten Simple Rules for Making Good Oral Presentations, Philip E Bourne]], [[https://doi.org/10.1371/journal.pcbi.1005373|Ten simple rules for short and swift presentations, Christopher J. Lortie]] | |
| 30.04. | -- | | | 30.04. | -- | | ||
- | | 07.05. | | | + | | 07.05. | Pseudocode vorstellen | |
- | | 14.05. | | | + | | 14.05. | KMC 3: counting and manipulating //k//-mer statistics | Kathrin | |
- | | 21.05. | | | + | | 21.05. | k-mer counting challenge | (fast) alle | |
| 28.05. | -- | | | 28.05. | -- | | ||
- | | 04.05. | | | + | | 04.06. | Fast gapped //k//-mer counting with subdivided multi-way bucketed Cuckoo hash table | Sofie | |
- | | 11.06. | | | + | | | Minimizer-space de Bruijn graphs: Whole-genome assembly of long reads in minutes on a personal computer | Liliana | |
- | | 18.06. | | | + | | 11.06. | Sparse and skew hashing of K-mers | Alena | |
- | | 25.06. | | | + | | 18.06. | Space-efficient and exact de Bruijn graph representation based on a Bloom filter. (Minia) | Max | |
+ | | 25.06. | Minimap2: pairwise alignment for nucleotide sequences | Igor | | ||
+ | | | Revisiting pangenome openness with k-mers | Mathis | | ||
| 02.07. | -- | | | 02.07. | -- | | ||
- | | 09.07. | | | + | | 09.07. | | |
| 16.07. | -- | | | 16.07. | -- | | ||
+ | ==== Details on the k-mer counting exercise ==== | ||
+ | |||
+ | **Input:** | ||
+ | * Multiple fasta file, containing nucleotide sequences (small or capital letters, maybe N's) | ||
+ | * //k//-mer length **k** | ||
+ | * threshold **c** | ||
+ | **Output:** | ||
+ | * Number of canonical //k//-mers occurring at least **c** times. |