===== K-mer basierte Algorithmen in der Bioinformatik (S) ===== | [[https://ekvv.uni-bielefeld.de/kvv_publ/publ/vd?id=391455209|392219]] | Wittler | Summer 2023 | Wednesday 10:15-11:45 | U10-146 |\\ \\ (Je nach Wunsch/Bedarf der Studierenden wird das Seminar auf Deutsch oder Englisch durchgeführt.\\ Depending on the wishes/demands of the students, this seminar can be held in English or German.)\\ \\ Based on original research papers, the participants will give oral presentations (20-45 min) and write short summaries (5-10 pages) about algorithmic problems in bioinformatics and their solutions. Talks and essays can be done in German or English. The first day covers an overview of possible topics, which will then be distributed to the students. Aspects of scientific writing and presenting will be covered as well.\\ \\ The overarching topic of this semester are **k-mers** (a.k.a. //q-grams//). This simple concept builds a basis for many algorithmic solutions in bioinformatics, such as assembly, alignment, genome comparison, pangenomics, etc.\\ \\ Possible concrete methods/publications to be presented/discussed in the seminar are: ==== Assembly ==== * Zerbino, Daniel R., and Ewan Birney. "Velvet: algorithms for de novo short read assembly using de Bruijn graphs." [[https://genome.cshlp.org/content/18/5/821.full.pdf|Genome research 18.5 (2008): 821-829.]] * Li, Heng. "Minimap and miniasm: fast mapping and de novo assembly for noisy long sequences." [[https://academic.oup.com/bioinformatics/article/32/14/2103/1742895?login=false|Bioinformatics 32.14 (2016): 2103-2110.]] * Chikhi, Rayan, and Guillaume Rizk. "Space-efficient and exact de Bruijn graph representation based on a Bloom filter. (Minia)" [[https://link.springer.com/article/10.1186/1748-7188-8-22|Algorithms for Molecular Biology 8.1 (2013): 1-9.]] ==== De Bruijn Graphs ==== * Holley, Guillaume, and Páll Melsted. "Bifrost: highly parallel construction and indexing of colored and compacted de Bruijn graphs." [[https://genomebiology.biomedcentral.com/articles/10.1186/s13059-020-02135-8|Genome biology 21.1 (2020): 1-20.]] * Luhmann, Nina, Guillaume Holley, and Mark Achtman. "BlastFrost: fast querying of 100,000 s of bacterial genomes in Bifrost graphs." [[https://genomebiology.biomedcentral.com/articles/10.1186/s13059-020-02237-3|Genome biology 22.1 (2021): 1-15.]] * Ekim, Barış, Bonnie Berger, and Rayan Chikhi. "Minimizer-space de Bruijn graphs: Whole-genome assembly of long reads in minutes on a personal computer." [[https://doi.org/10.1016/j.cels.2021.08.009|Cell systems 12.10 (2021): 958-968.]] ==== Alignment ==== * Rasmussen, Kim R., Jens Stoye, and Eugene W. Myers. "Efficient q-gram filters for finding all ε-matches over a given length. (SWIFT)" [[https://www.researchgate.net/profile/Jens-Stoye/publication/7183100_Efficient_q-gram_filters_for_finding_all_e-Matches_over_a_given_length/links/0fcfd512b2ebaac910000000/Efficient-q-gram-filters-for-finding-all-e-Matches-over-a-given-length.pdf|Journal of Computational Biology 13.2 (2006): 296-308.]] * Li, Heng. "Minimap2: pairwise alignment for nucleotide sequences." [[https://academic.oup.com/bioinformatics/article/34/18/3094/4994778?login=false|Bioinformatics 34.18 (2018): 3094-3100.]] * Ondov, Brian D., et al. "Mash: fast genome and metagenome distance estimation using MinHash." [[https://genomebiology.biomedcentral.com/articles/10.1186/s13059-016-0997-x?ref=https://codemonkey.link|Genome biology 17.1 (2016): 1-14.]] ==== Counting k-mers ==== * Kokot, Marek, Maciej Długosz, and Sebastian Deorowicz. "KMC 3: counting and manipulating k-mer statistics." [[https://academic.oup.com/bioinformatics/article/33/17/2759/3796399?login=false|Bioinformatics 33.17 (2017): 2759-2761.]] (plus previous versions) * Lucas Robidou and Pierre Peterlongo. "fimpera: drastic improvement of Approximate Membership Query data-structures with counts" [[https://www.biorxiv.org/content/10.1101/2022.06.27.497694v3|bioRxiv (2023)]] * Téo Lemane, Paul Medvedev, Rayan Chikhi, and Pierre Peterlongo. "kmtricks: efficient and flexible construction of Bloom filters for large sequencing data collections." [[https://academic.oup.com/bioinformaticsadvances/article/2/1/vbac029/6576015|Bioinformatics Advances, 2.1 (2022)]] * Jens Zentgraf and Sven Rahmann. "Fast gapped k-mer counting with subdivided multi-way bucketed Cuckoo hash table." [[https://drops.dagstuhl.de/opus/volltexte/2022/17046/pdf/LIPIcs-WABI-2022-12.pdf|In 22nd International Workshop on Algorithms in Bioinformatics (WABI). Schloss Dagstuhl-Leibniz-Zentrum für Informatik (2022)]] * Giulio Ermanno Pibir. "Sparse and skew hashing of K-mers." [[https://academic.oup.com/bioinformatics/article/38/Supplement_1/i185/6617506?login=true|Bioinformatics 38.Suppl1 (2022):i185-i194.]] ==== Timetable ==== | 05.04. | Organization, topic selection, Scientific reading | {{:teaching:2022summer:algbioinf:howtoread.pdf|Slides: HowToRead}} | | 12.04. | -- | | 19.04. | -- | | 26.04. | -- | | 03.05. | Scientific writing, introduction to LaTeX | {{:teaching:2018summer:funal:scientific_writing_citing.txt|Notes on Scientific writing and citing}}, {{:teaching:2022summer:algbioinf:template.tex.pdf|Latex-template}}, {{:teaching:2018summer:funal:example.bib.pdf|Bibtex-example}} (remove ".pdf" from filenames), [[https://de.sharelatex.com/learn/bibtex_bibliography_styles|Bibliography styles]] | | 10.05. | Introductions, conclusions, math | {{:teaching:2018summer:funal:scientific_writing_introduction.txt| Notes 1}}, {{:teaching:2018summer:funal:scientific_writing_math.txt| Notes 2}}, {{:teaching:2018summer:funal:template_math.tex.pdf|extended Latex-template}} (remove ".pdf" from filename) | | 17.05. | -- | | 24.05.| Tables, figures, algorithms, Checklist on writing (1/2) | {{:teaching:2018summer:funal:template_algo.tex.pdf|extended Latex-template}} (remove ".pdf" from filename), {{:teaching:2022summer:algbioinf:checklist.txt|checklist}} | | 31.05. | -- | | 07.06. | Tables, figures, algorithms, Checklist on writing (2/2) | {{:teaching:2018summer:funal:template_algo.tex.pdf|extended Latex-template}} (remove ".pdf" from filename), {{:teaching:2022summer:algbioinf:checklist.txt|checklist}} | | 14.06. | N.V.: Altschul, Stephen F., et al. "Basic local alignment search tool." (1990) | {{https://www.sciencedirect.com/science/article/pii/S0022283605803602|paper}} | | 21.06. | -- | | 28.06. | -- | | 05.07. | P.A.: Li, H. "Minimap2: pairwise alignment for nucleotide sequences." (2018) | {{https://academic.oup.com/bioinformatics/article-abstract/34/18/3094/4994778|paper}} | | 12.07. | -- |