Differences

This shows you the differences between two versions of the page.

Link to this comparison view

Next revision
Previous revision
teaching:alggrliterature [2020/02/14 09:07]
127.0.0.1 external edit
teaching:alggrliterature [2022/11/21 09:57] (current)
jstoye [Genome assembly IIb: Hybrid/long read assembly]
Line 1: Line 1:
 ===== References for Algorithms in Genome Research ===== ===== References for Algorithms in Genome Research =====
- 
  
 ==== Overview ==== ==== Overview ====
 Parts of the material is covered by the following two textbooks. Some topics are newer than these books. For most of these, specialized references are given below. ​ Parts of the material is covered by the following two textbooks. Some topics are newer than these books. For most of these, specialized references are given below. ​
- 
  
   - D. Gusfield. Algorithms on Strings, Trees, and Sequences: Computer Science and Computational Biology. Cambridge University Press, New York, NY, 1997.    - D. Gusfield. Algorithms on Strings, Trees, and Sequences: Computer Science and Computational Biology. Cambridge University Press, New York, NY, 1997. 
   - D. W. Mount, editor. Bioinformatics ​ Sequence and Genome Analysis. Cold Spring Harbor Laboratory Press, Cold Spring Harbor, N.Y., 2nd edition, 2004.    - D. W. Mount, editor. Bioinformatics ​ Sequence and Genome Analysis. Cold Spring Harbor Laboratory Press, Cold Spring Harbor, N.Y., 2nd edition, 2004. 
- 
  
 ==== Biological data types and formats ==== ==== Biological data types and formats ====
Line 21: Line 18:
 Papers on SOLiD sequencing: ​ Papers on SOLiD sequencing: ​
  
-  - Voelkerding et al. [[http://dx.doi.org/​10.1373/​clinchem.2008.112789|Next-generation sequencing: from basic research to diagnostics]]. //Clinical Chemistry// **55**(4):​641-658,​ 2009.  +  - Voelkerding et al. [[https://​doi.org/​10.1373/​clinchem.2008.112789|Next-generation sequencing: from basic research to diagnostics]]. //Clinical Chemistry// **55**(4):​641-658,​ 2009.  
-  - Rumble et al. [[http://dx.doi.org/​10.1371/​journal.pcbi.1000386|SHRiMP:​ Accurate Mapping of Short Color-space Reads]]. //PLOS Computational Biology// **5**(5), 2009.  +  - Rumble et al. [[https://​doi.org/​10.1371/​journal.pcbi.1000386|SHRiMP:​ Accurate Mapping of Short Color-space Reads]]. //PLOS Computational Biology// **5**(5), 2009. 
  
 ==== Physical mapping ==== ==== Physical mapping ====
 This section is mainly based on Chapter 16 of Gusfield'​s textbook. The Tightest Layout Problem is originally from (Alizadeh//​et al.//, 1995). Another reference is (Heber //et al.//, 2000). ​ This section is mainly based on Chapter 16 of Gusfield'​s textbook. The Tightest Layout Problem is originally from (Alizadeh//​et al.//, 1995). Another reference is (Heber //et al.//, 2000). ​
  
-  - F. Alizadeh, R. M. Karp, L. A. Newberg, and D. K. Weisser. [[http://dx.doi.org/​10.1007/​BF01188581|Physical mapping of chromosomes:​ A combinatorial problem in molecular biology]]. //​Algorithmica//​ **13**(1/​2):​52-76,​ 1995.  +  - F. Alizadeh, R. M. Karp, L. A. Newberg, and D. K. Weisser. [[https://​doi.org/​10.1007/​BF01188581|Physical mapping of chromosomes:​ A combinatorial problem in molecular biology]]. //​Algorithmica//​ **13**(1/​2):​52-76,​ 1995.  
-  - S. Heber, J. Stoye, M. Frohme, J. Hoheisel, and M. Vingron. [[http://dx.doi.org/​10.1089/​106652700750050853|Contig selection in physical mapping]]. //J. Comp. Biol.// **7**(3/​4):​395-408,​ 2000.  +  - S. Heber, J. Stoye, M. Frohme, J. Hoheisel, and M. Vingron. [[https://​doi.org/​10.1089/​106652700750050853|Contig selection in physical mapping]]. //J. Comp. Biol.// **7**(3/​4):​395-408,​ 2000.
-==== Genome assembly I: Overlap-Layout-Consensus - Clone-by-clone and whole-genome shotgun assembly ====+
  
-A document with a good algorithmic description of the method will be published here soon.+==== Genome assembly Ia: Overlap-Layout-Consensus - Clone-by-clone and whole-genome shotgun assembly ==== 
 +Lander-Waterman Statistics:
  
-References to the original papers ​(to be completed):+  - E.S. Lander and M.S. Waterman, [[https://​doi.org/​10.1016/​0888-7543(88)90007-9|Genomic mapping by fingerprinting random clones: a mathematical analysis]]. //​Genomics//​ **2**:231-239, 1988.
  
-Lander-Waterman Statistics:+Overlap-Layout-Consensus AssemblyThere are many good description available online, just search for them. 
 + 
 +==== Genome assembly Ib: Re-sequencing,​ comparative (reference-based) assembly ==== 
 +A good introduction to comparative genome assembly is [1]. The main algorithmic challenge is to map millions of (most very short) sequence reads onto one or more referene geneome(s). Suitable mapping algorithms for this task are [[http://​bibiserv.cebitec.uni-bielefeld.de/​swift/​|SWIFT]] [2], [[http://​bowtie-bio.sourceforge.net/​index.shtml|Bowtie]] [6], ELAND (Cox, unpublished),​ [[http://​maq.sourceforge.net/​|MAQ]] [3], [[http://​rulai.cshl.edu/​rmap/​|RMAP]],​ [[http://​soap.genomics.org.cn/​|SOAP]] [4], [[http://​compbio.cs.toronto.edu/​shrimp/​|SHRiMP]],​ SeqMap [5], TAGGER [7], ZOOM [8], [[http://​bio-bwa.sourceforge.net/​bwa.shtml|BWA]] [9], GSNAP [10], SARUMAN [11], SSAHA2 [12], NextGenMap [13], etc.
  
-E.S. Lander ​and M.S. Waterman, [[http://dx.doi.org/​10.1016/0888-7543(88)90007-9|Genomic mapping by fingerprinting random clonesa mathematical analysis]]. //Genomics// **2**:231-2391988+  ​MPop, A. Phillippy, A. L. Delcher, and S. L. Salzberg. [[https://​doi.org/​10.1093/​bib/​5.3.237|Comparative genome assembly]]. //Briefings in Bioinformatics//​ **5**(3):​237-248,​ 2004.  
 +  - K. Rasmussen, J. Stoye, E. W. Myers. [[https://​doi.org/​10.1089/​cmb.2006.13.296|Efficient q-gram filters for finding all epsilon-matches over a given length]]. //J. Comp. Biol.// **13**(2):​296-308,​ 2006.  
 +  - H. Li, J. Ruan, R. Durbin. [[https://​doi.org/​10.1101/​gr.078212.108|Mapping short DNA sequencing reads and calling variants using mapping quality scores]]. //Genome Res.// **18**:​1851-1858,​ 2008.  
 +  - R. Li, Y. Li, K. Kristiansen,​ J. Wang. [[https://​doi.org/​10.1093/​bioinformatics/​btn025|SOAP:​ short oligonucleotide alignment program]]. //​Bioinformatics//​ **24**(5):​713-714,​ 2008.  
 +  - H. Jiang, W. H Wong. [[https://​doi.org/​10.1093/​bioinformatics/​btn429|SeqMap:​ mapping massive amount of oligonucleotides to the genome]]. //​Bioinformatics//​ **24**(20):​2395-2396,​ 2008.  
 +  - B. Langmead, C. Trapnell, ​M. Pop, S. Salzberg. [[https://​doi.org/​10.1186/​gb-2009-10-3-r25|Ultrafast and memory-efficient alignment of short DNA sequences to the human genome]]. //Genome Biol.// **10**:R252009.  
 +  - C. Iseli, G. Ambrosini, P. Bucher, C. V. Jongeneel. ​[[https://doi.org/​10.1371/​journal.pone.0000579|Indexing Strategies for Rapid Searches of Short Words in Genome Sequences]]. //PLoS ONE// **2**(6):​e579,​ 2007.  
 +  - H. Lin, Z. Zhang, M. Q. Zhang, B. Ma, M. Li. [[https://doi.org/10.1093/bioinformatics/​btn416|ZOOM! Zillions of oligos mapped]]. //​Bioinformatics//​ **24**(21): 2431-2437, 2008.  
 +  - H. Li, R. Durbin. [[https://​doi.org/​10.1093/​bioinformatics/​btp324|Fast and accurate short read alignment with Burrows–Wheeler transform]]. //​Bioinformatics//​ **25**(14): 1754-1760, 2009.  
 +  - T. Wu, S. Nacu. [[https://​doi.org/​10.1093/​bioinformatics/​btq057|Fast and SNP-tolerant detection of complex variants and splicing in short reads]]. //​Bioinformatics//​ **26**(7)873–881, 2010.  
 +  - J. Blom, T. Jakobi, D. Doppmeier, S. Jaenicke, J. Kalinowski, J. Stoye, A. Goesmann. [[https://​doi.org/​10.1093/​bioinformatics/​btr151|Exact and complete short read alignment to microbial genomes using GPU programming]]. //Bioinformatics// **27**(10)1351-1358, 2011.  
 +  - Z. Ning, A.J. Cox. [[https://​doi.org/​10.1101/​gr.194201|SSAHA:​ A Fast Search Method for Large DNA Databases]]. //Genome Res.// **11**(10): 1725-1729, 2001.  
 +  - F. J. Sedlazeck, P. Rescheneder,​ A. von Haeseler. [[https://​doi.org/​10.1093/​bioinformatics/​btt468|NextGenMap:​ fast and accurate read mapping in highly polymorphic genomes]]. //​Bioinformatics//​ **29**(21): 2790-2791, 2013. 
 +  - L. Oesper, A. Ritz, S. J. Aerni, R. Drebin, B. J. Raphael. [[https://​doi.org/​10.1186/​1471-2105-13-S6-S10|Reconstructing cancer genomes from paired-end sequencing data]]. //BMC Bioinformatics//​ **13**(Suppl. 6):S102012
  
-==== Genome assembly ​II: De-novo assembly of HTS data ====+==== Genome assembly ​IIa: De-novo assembly of HTS data ====
 These methods are all very similar, using variants of de-Bruijn graphs plus additional tricks: EULER-SR [1], Velvet [2,3], MIRA2 ([[http://​www.chevreux.org/​projectsmira.html|Link]]),​ SSAKE [4],VCAKE [5], SHARCGS [6], Medvedev/​Brudno [7], ABySS ([[http://​www.bcgsc.ca/​platform/​bioinfo/​software/​abyss|Link]]),​ ALLPATHS [8], SOAPdenovo [9], IDBA [10], etc. A very good recent review paper is [11].  These methods are all very similar, using variants of de-Bruijn graphs plus additional tricks: EULER-SR [1], Velvet [2,3], MIRA2 ([[http://​www.chevreux.org/​projectsmira.html|Link]]),​ SSAKE [4],VCAKE [5], SHARCGS [6], Medvedev/​Brudno [7], ABySS ([[http://​www.bcgsc.ca/​platform/​bioinfo/​software/​abyss|Link]]),​ ALLPATHS [8], SOAPdenovo [9], IDBA [10], etc. A very good recent review paper is [11]. 
  
-  - M. J. Chaisson, P. A. Pevzner. [[http://dx.doi.org/​10.1101/​gr.7088808|Short read fragment assembly of bacterial genomes]]. //Genome Res.// **18**(2):​324-330,​ 2008.  +  - M. J. Chaisson, P. A. Pevzner. [[https://​doi.org/​10.1101/​gr.7088808|Short read fragment assembly of bacterial genomes]]. //Genome Res.// **18**(2):​324-330,​ 2008.  
-  - D. R. Zerbino, E. Birney. [[http://dx.doi.org/​10.1101/​gr.074492.107|Velvet:​ Algorithms for de novo short read assembly using de Bruijn graphs]]. //Genome Res.// **18**(5):​821-829,​ 2008.  +  - D. R. Zerbino, E. Birney. [[https://​doi.org/​10.1101/​gr.074492.107|Velvet:​ Algorithms for de novo short read assembly using de Bruijn graphs]]. //Genome Res.// **18**(5):​821-829,​ 2008.  
-  - D. R. Zerbino, G. K. McEwen, E. H. Margulies, E. Birney. [[http://dx.doi.org/​10.1371/​journal.pone.0008407|Pebble and Rock Band: Heuristic Resolution of Repeats and Scaffolding in the Velvet Short-Read de Novo Assembler]]. //PLoS ONE// **4**(12):​e8407,​ 2009.  +  - D. R. Zerbino, G. K. McEwen, E. H. Margulies, E. Birney. [[https://​doi.org/​10.1371/​journal.pone.0008407|Pebble and Rock Band: Heuristic Resolution of Repeats and Scaffolding in the Velvet Short-Read de Novo Assembler]]. //PLoS ONE// **4**(12):​e8407,​ 2009.  
-  - R. L. Warren, G. G. Sutton, S. J. M. Jones, R. A. Holt. [[http://dx.doi.org/​10.1093/​bioinformatics/​btl629|Assembling millions of short dna sequences using ssake]]. //​Bioinformatics//​ **23**(4):​500-501,​ 2006.  +  - R. L. Warren, G. G. Sutton, S. J. M. Jones, R. A. Holt. [[https://​doi.org/​10.1093/​bioinformatics/​btl629|Assembling millions of short dna sequences using ssake]]. //​Bioinformatics//​ **23**(4):​500-501,​ 2006.  
-  - W. R. Jeck, J. A. Reinhardt, D. A. Baltrus, M. T. Hickenbotham,​ V. Magrini, E. R. Mardis, J. L. Dang, C. D. Jones. [[http://dx.doi.org/​10.1093/​bioinformatics/​btm451|Extending assembly of short dna sequences to handle error]]. //​Bioinformatics//​ **23**(21):​2942-2944,​ 2007.  +  - W. R. Jeck, J. A. Reinhardt, D. A. Baltrus, M. T. Hickenbotham,​ V. Magrini, E. R. Mardis, J. L. Dang, C. D. Jones. [[https://​doi.org/​10.1093/​bioinformatics/​btm451|Extending assembly of short dna sequences to handle error]]. //​Bioinformatics//​ **23**(21):​2942-2944,​ 2007.  
-  - J. C. Dohm, C. Lottaz, T. Borodina, and H. Himmelbauer. [[http://dx.doi.org/​10.1101/​gr.6435207|Sharcgs,​ a fast and highly accurate short-read assembly algorithm for de novo genomic sequencing]]. //Genome Res.// **17**(11):​1697-1706,​ 2007.  +  - J. C. Dohm, C. Lottaz, T. Borodina, and H. Himmelbauer. [[https://​doi.org/​10.1101/​gr.6435207|Sharcgs,​ a fast and highly accurate short-read assembly algorithm for de novo genomic sequencing]]. //Genome Res.// **17**(11):​1697-1706,​ 2007.  
-  - P. Medvedev, M. Brudno. [[http://dx.doi.org/​10.1007/​978-3-540-78839-3_5|Ab initio whole genome shotgun assembly with mated short reads]]. //Proc. RECOMB 2008// pages 50-64, 2008.  +  - P. Medvedev, M. Brudno. [[https://​doi.org/​10.1007/​978-3-540-78839-3_5|Ab initio whole genome shotgun assembly with mated short reads]]. //Proc. RECOMB 2008// pages 50-64, 2008.  
-  - J. Butler, I. Maccallum, M. Kleber, I. A. Shlyakhter, M. K. Belmonte, E. S. Lander, C. Nusbaum, D. B. Jaffe. [[http://dx.doi.org/​10.1101/​gr.7337908|Allpaths:​ De novo assembly of whole-genome shotgun microreads]]. //Genome Res.// **18**(5):​810-820,​ 2008.  +  - J. Butler, I. Maccallum, M. Kleber, I. A. Shlyakhter, M. K. Belmonte, E. S. Lander, C. Nusbaum, D. B. Jaffe. [[https://​doi.org/​10.1101/​gr.7337908|Allpaths:​ De novo assembly of whole-genome shotgun microreads]]. //Genome Res.// **18**(5):​810-820,​ 2008.  
-  - R. Li, H. Zhu, J. Ruan, W. Qian, X. Fang, Z. Shi, Y. Li, S. Li, G. Shan, K. Kristiansen,​ S. Li, H. Yang, J. Wang, J. Wang. [[http://dx.doi.org/​10.1101/​gr.097261.109|De novo assembly of human genomes with massively parallel short read sequencing]]. //Genome Res.// **20**(2):​265-272,​ 2010.  +  - R. Li, H. Zhu, J. Ruan, W. Qian, X. Fang, Z. Shi, Y. Li, S. Li, G. Shan, K. Kristiansen,​ S. Li, H. Yang, J. Wang, J. Wang. [[https://​doi.org/​10.1101/​gr.097261.109|De novo assembly of human genomes with massively parallel short read sequencing]]. //Genome Res.// **20**(2):​265-272,​ 2010.  
-  - Y. Peng, H. C. M. Leung, S. M. Yiu, F. Y. L. Chin. [[http://dx.doi.org/​10.1007/​978-3-642-12683-3_28|IDBA – A Practical Iterative de Bruijn Graph De Novo Assembler]]. Proceedings of RECOMB 2010, LNBI 6044, 426-440, 2010.  +  - Y. Peng, H. C. M. Leung, S. M. Yiu, F. Y. L. Chin. [[https://​doi.org/​10.1007/​978-3-642-12683-3_28|IDBA – A Practical Iterative de Bruijn Graph De Novo Assembler]]. Proceedings of RECOMB 2010, LNBI 6044, 426-440, 2010.  
-  - J. R. Miller, S. Koren, G. Sutton. [[http://dx.doi.org/​10.1016/​j.ygeno.2010.03.001|Assembly algorithms for next-generation sequencing data]]. //​Genomics//​ in press, 2010.  +  - J. R. Miller, S. Koren, G. Sutton. [[https://​doi.org/​10.1016/​j.ygeno.2010.03.001|Assembly algorithms for next-generation sequencing data]]. //​Genomics//​ in press, 2010.  
-  - Phillip E. C. Compeau, Pavel A. Pevzner, Glenn Tesler. [[http://dx.doi.org/​10.1038/​nbt.2023|How to apply de Bruijn graphs to genome assembly]]. //Nature biotechnology//,​ **29**, 987-991, 2011.  +  - Phillip E. C. Compeau, Pavel A. Pevzner, Glenn Tesler. [[https://​doi.org/​10.1038/​nbt.2023|How to apply de Bruijn graphs to genome assembly]]. //Nature biotechnology//,​ **29**, 987-991, 2011.
-==== Genome assembly III: Re-sequencing,​ comparative assembly ==== +
-A good introduction to comparative genome assembly is [1]. The main algorithmic challenge is to map millions of (most very short) sequence reads onto one or more referene geneome(s). Suitable mapping algorithms for this task are [[http://​bibiserv.cebitec.uni-bielefeld.de/​swift/​|SWIFT]] [2], [[http://​bowtie-bio.sourceforge.net/​index.shtml|Bowtie]] [6], ELAND (Cox, unpublished),​ [[http://​maq.sourceforge.net/​|MAQ]] [3], [[http://​rulai.cshl.edu/​rmap/​|RMAP]],​ [[http://​soap.genomics.org.cn/​|SOAP]] [4], [[http://​compbio.cs.toronto.edu/​shrimp/​|SHRiMP]],​ SeqMap [5], TAGGER [7], ZOOM [8], [[http://​bio-bwa.sourceforge.net/​bwa.shtml|BWA]] [9], GSNAP [10], SARUMAN [11], SSAHA2 [12] etc. Methods especially suited for mapping SOLiD reads are presented in [13,14]+
  
-  - M. Pop, A. Phillippy, A. L. Delcher, and S. L. Salzberg. [[http://​dx.doi.org/​10.1093/​bib/​5.3.237|Comparative genome assembly]]. //Briefings in Bioinformatics//​ **5**(3):​237-248,​ 2004.  +==== Genome assembly ​IIb: Hybrid/long read assembly ==== 
-  - K. Rasmussen, J. Stoye, E. W. Myers. [[http://​dx.doi.org/​10.1089/​cmb.2006.13.296|Efficient q-gram filters for finding all epsilon-matches over a given length]]. //J. Comp. Biol.// **13**(2):​296-308,​ 2006.  +  - A. C. English, S. Richards, Y. Han, M. Wang, V. Vee, J. Qu, X. Qin, D. M. Muzny, J. G. Reid, K. C. Worley, R. A. Gibbs. [[https://​doi.org/​10.1371/​journal.pone.0047768|Mind the Gap: Upgrading Genomes with Pacific Biosciences RS Long-Read Sequencing Technology]]. //PLoS ONE// **7**(11): e47768, 2012. 
-  - H. Li, J. Ruan, R. Durbin. [[http://​dx.doi.org/​10.1101/​gr.078212.108|Mapping short DNA sequencing reads and calling variants using mapping quality scores]]. //Genome Res.// **18**:​1851-1858,​ 2008.  +  - S. Koren, M. C. Schatz, B. P. Walenz, J. Martin, J. T. Howard, G. Ganapathy, Z. Wang, D. A. Rasko, W. R. McCombie, E. D. Jarvis, A. M. Phillippy. [[https://​doi.org/​10.1038/​nbt.2280|Hybrid error correction and de novo assembly of single-molecule sequencing reads]]. //Nature Biotechnolopgy//​ **30**:​693-700,​ 2012. 
-  - R. Li, Y. Li, K. Kristiansen,​ J. Wang. [[http://​dx.doi.org/​10.1093/​bioinformatics/​btn025|SOAP:​ short oligonucleotide alignment program]]. //​Bioinformatics//​ **24**(5):​713-714,​ 2008.  +  - C.-S. Chin, D. H. Alexander, P. Marks, A. A. Klammer, J. Drake, C. Heiner, A. Clum, A. Copeland, J. Huddleston, E. E. Eichler, S. W. Turner, J. Korlach. [[https://​doi.org/​10.1038/​nmeth.2474|Nonhybrid,​ finished microbial genome assemblies from long-read SMRT sequencing data]]. //Nature Methods// **10**:​563-569,​ 2013. 
-  - H. Jiang, W. H Wong. [[http://​dx.doi.org/​10.1093/​bioinformatics/​btn429|SeqMap:​ mapping massive amount of oligonucleotides to the genome]]. //​Bioinformatics//​ **24**(20):​2395-2396,​ 2008.  +  - G. Myers. [[https://​doi.org/​10.1007/​978-3-662-44753-6_5|Efficient Local Alignment Discovery amongst Noisy Long Reads]]. //​Proceedings of WABI 2014//, LNBI 8701, 52-67, 2014
-  - B. Langmead, C. Trapnell, M. Pop, S. Salzberg. [[http://​dx.doi.org/​10.1186/​gb-2009-10-3-r25|Ultrafast and memory-efficient alignment of short DNA sequences to the human genome]]. //Genome Biol.// **10**:R25, 2009.  +  - F. J. Sedlazeck, P. Rescheneder,​ M. Smolka, H. Fang, M. Nattestad, A. von Haeseler, M. C. Schatz. [[https://​doi.org/​10.1038/​s41592-018-0001-7|Accurate detection of complex structural variations using single molecule sequencing]]. //Nat. Methods// **15**(6): 461–468, 2018. 
-  - C. Iseli, G. Ambrosini, P. Bucher, C. V. Jongeneel. [[http://​dx.doi.org/​10.1371/​journal.pone.0000579|Indexing Strategies for Rapid Searches of Short Words in Genome Sequences]]. //PLoS ONE// **2**(6):​e579,​ 2007.  +  - E. Haghshenas, H. Asghari, J. Stoye, C. Chauve, F. Hach. [[https://​doi.org/​10.1016/​j.isci.2020.101389|HASLR:​ Fast Hybrid Assembly of Long Reads]]. //​iScience//​ **23**(8): 101389, 2020.
-  - H. Lin, Z. Zhang, M. Q. Zhang, B. Ma, M. Li. [[http://​dx.doi.org/​10.1093/​bioinformatics/​btn416|ZOOM! Zillions of oligos mapped]]. //​Bioinformatics//​ **24**(21): 2431-2437, 2008.  +
-  - H. Li, R. Durbin. [[http://​dx.doi.org/​10.1093/​bioinformatics/​btp324|Fast and accurate short read alignment with Burrows–Wheeler transform]]. //​Bioinformatics//​ **25**(14): 1754-1760, 2009.  +
-  - T. Wu, S. Nacu. [[http://​dx.doi.org/​10.1093/​bioinformatics/​btq057|Fast and SNP-tolerant detection of complex variants and splicing in short reads]]. //​Bioinformatics//​ **26**(7): 873–881, 2010.  +
-  - J. Blom, T. Jakobi, D. Doppmeier, S. Jaenicke, J. Kalinowski, J. Stoye, A. Goesmann. [[http://​dx.doi.org/​10.1093/​bioinformatics/​btr151|Exact and complete short read alignment to microbial genomes using GPU programming]]. //​Bioinformatics//​ **27**(10): 1351-1358, 2011.  +
-  - Z. Ning, A.J. Cox. [[http://​dx.doi.org/​10.1101/​gr.194201|SSAHA:​ A Fast Search Method for Large DNA Databases]]. //Genome Res.// **11**(10): 1725-1729, 2001.  +
-  - L. Noé, M. Gîrdea, G. Kucherov. [[http://​dx.doi.org/​10.1007/​978-3-642-12683-3_25|Seed Design Framework for Mapping SOLiD Reads]]. Proceedings of RECOMB 2010, LNBI 6044, 384-396, 2010.  +
-  - M. Csűrös, Sz. Juhos, A. Bérces. [[http://​dx.doi.org/​10.1007/​978-3-642-15294-8_15|Fast Mapping and Precise Alignment of AB SOLiD Color Reads to Reference DNA]]. Proceedings of WABI 2010, LNBI 6293, 176-188, 2010.  +
-  - L. Oesper, A. Ritz, S. J. Aerni, R. Drebin, B. J. Raphael. [[http://​dx.doi.org/​10.1186/​1471-2105-13-S6-S10|Reconstructing cancer genomes from paired-end sequencing data]]. //BMC Bioinformatics//​ **13**(Suppl. 6):S10, 2012.  +
-==== Genome assembly ​IV: Hybrid/long read assembly ==== +
-  - A. C. English, S. Richards, Y. Han, M. Wang, V. Vee, J. Qu, X. Qin, D. M. Muzny, J. G. Reid, K. C. Worley, R. A. Gibbs. [[http://dx.doi.org/​10.1371/​journal.pone.0047768|Mind the Gap: Upgrading Genomes with Pacific Biosciences RS Long-Read Sequencing Technology]]. //PLoS ONE// **7**(11): e47768, 2012. +
-  - S. Koren, M. C. Schatz, B. P. Walenz, J. Martin, J. T. Howard, G. Ganapathy, Z. Wang, D. A. Rasko, W. R. McCombie, E. D. Jarvis, A. M. Phillippy. [[http://dx.doi.org/​10.1038/​nbt.2280|Hybrid error correction and de novo assembly of single-molecule sequencing reads]]. //Nature Biotechnolopgy//​ **30**:​693-700,​ 2012. +
-  - C.-S. Chin, D. H. Alexander, P. Marks, A. A. Klammer, J. Drake, C. Heiner, A. Clum, A. Copeland, J. Huddleston, E. E. Eichler, S. W. Turner, J. Korlach. [[http://dx.doi.org/​10.1038/​nmeth.2474|Nonhybrid,​ finished microbial genome assemblies from long-read SMRT sequencing data]]. //Nature Methods// **10**:​563-569,​ 2013. +
-  - G. Myers. [[http://dx.doi.org/​10.1007/​978-3-662-44753-6_5|Efficient Local Alignment Discovery amongst Noisy Long Reads]]. //​Proceedings of WABI 2014//, LNBI 8701, 52-67, 2014.+
  
 String graph assembly for diploid genomes with long reads is explained on the following [[https://​s3.amazonaws.com/​files.pacb.com/​pdf/​String+Graph+Assembly+For+Diploid+Genomes+with+Long+Reads.pdf|poster by PacBio]]. ​ String graph assembly for diploid genomes with long reads is explained on the following [[https://​s3.amazonaws.com/​files.pacb.com/​pdf/​String+Graph+Assembly+For+Diploid+Genomes+with+Long+Reads.pdf|poster by PacBio]]. ​
Line 83: Line 80:
 Pre-processing (correction of long reads): Pre-processing (correction of long reads):
  
-  - L. Salmela and E. Rivals. [[http://dx.doi.org/​10.1093/​bioinformatics/​btu538|LoRDEC:​ accurate and efficient long read error correction]]. //​Bioinformatics//​ **30**(24):​3506–3514,​ 2014. +  - L. Salmela and E. Rivals. [[https://​doi.org/​10.1093/​bioinformatics/​btu538|LoRDEC:​ accurate and efficient long read error correction]]. //​Bioinformatics//​ **30**(24):​3506–3514,​ 2014. 
-  - L. Salmela, R. Walve, E. Rivals and E. Ukkonen. [[http://dx.doi.org/​10.1093/​bioinformatics/​btw321|Accurate self-correction of errors in long reads using de Bruijn graphs]]. //​Bioinformatics//,​ **33**(6):​799–806,​ 2017. +  - L. Salmela, R. Walve, E. Rivals and E. Ukkonen. [[https://​doi.org/​10.1093/​bioinformatics/​btw321|Accurate self-correction of errors in long reads using de Bruijn graphs]]. //​Bioinformatics//,​ **33**(6):​799–806,​ 2017.
    
 ==== Genome annotation I: Basic techniques ==== ==== Genome annotation I: Basic techniques ====
Line 91: Line 87:
  
   - R. Durbin, S. Eddy, A. Krogh, G. Mitchison. Biological Sequence Analysis: Probabilistic Models of Proteins and Nucleic Acids. Cambridge University Press, Cambridge, UK, 1998.    - R. Durbin, S. Eddy, A. Krogh, G. Mitchison. Biological Sequence Analysis: Probabilistic Models of Proteins and Nucleic Acids. Cambridge University Press, Cambridge, UK, 1998. 
-  - S. R. Eddy, R. Durbin. [[http://dx.doi.org/​10.1093/​nar/​22.11.2079|RNA sequence analysis using covariance models]]. //Nucleic Acids Res.// **22**(11):​2079-2088,​ 1994.  +  - S. R. Eddy, R. Durbin. [[https://​doi.org/​10.1093/​nar/​22.11.2079|RNA sequence analysis using covariance models]]. //Nucleic Acids Res.// **22**(11):​2079-2088,​ 1994.  
-  - Y. Sakakibara, M. Brown, R. Hughey, I. S. Mian, K. Sjølander, R. C. Underwood, D. Haussler. [[http://dx.doi.org/​10.1093/​nar/​22.23.5112|Stochastic context-free grammars for tRNA modeling]]. //Nucleic Acids Res.// **22**(23):​5112-5120,​ 1994. +  - Y. Sakakibara, M. Brown, R. Hughey, I. S. Mian, K. Sjølander, R. C. Underwood, D. Haussler. [[https://​doi.org/​10.1093/​nar/​22.23.5112|Stochastic context-free grammars for tRNA modeling]]. //Nucleic Acids Res.// **22**(23):​5112-5120,​ 1994. 
 ==== Genome annotation II: Finding RNA genes ==== ==== Genome annotation II: Finding RNA genes ====
 Here are several references, in chronological order: ​ Here are several references, in chronological order: ​
  
   - R. Staden. [[http://​www.pubmedcentral.nih.gov/​picrender.fcgi?​artid=327313&​amp;​blobtype=pdf|A computer program to search for tRNA genes]]. //Nucleic Acids Res.// **8**:​817-825,​ 1980.    - R. Staden. [[http://​www.pubmedcentral.nih.gov/​picrender.fcgi?​artid=327313&​amp;​blobtype=pdf|A computer program to search for tRNA genes]]. //Nucleic Acids Res.// **8**:​817-825,​ 1980. 
-  - G. A. Fichant and C. Burks. [[http://dx.doi.org/​10.1016/​0022-2836(91)90108-I|Identifying potential tRNA genes in genomic DNA sequences]]. //J. Mol. Biol.// **220**:​659-671,​ 1991.  +  - G. A. Fichant and C. Burks. [[https://​doi.org/​10.1016/​0022-2836(91)90108-I|Identifying potential tRNA genes in genomic DNA sequences]]. //J. Mol. Biol.// **220**:​659-671,​ 1991.  
-  - A. Pavesi, F. Conterio, A. Bolchi, G. Dieci, and S. Ottonello. [[http://dx.doi.org/​10.1093/​nar/​22.7.1247|Identification of new eukaryotic tRNA genes in genomic DNA databases by a multistep weight matrix analysis of transcriptional control regions]]. //Nucleic Acids Res.// **22**:​1247-1256,​ 1994.  +  - A. Pavesi, F. Conterio, A. Bolchi, G. Dieci, and S. Ottonello. [[https://​doi.org/​10.1093/​nar/​22.7.1247|Identification of new eukaryotic tRNA genes in genomic DNA databases by a multistep weight matrix analysis of transcriptional control regions]]. //Nucleic Acids Res.// **22**:​1247-1256,​ 1994.  
-  - S. R. Eddy and R. Durbin. [[http://dx.doi.org/​10.1093/​nar/​22.11.2079|RNA sequence analysis using covariance models]]. //Nucleic Acids Res.// **22**(11):​2079-2088,​ 1994.  +  - S. R. Eddy and R. Durbin. [[https://​doi.org/​10.1093/​nar/​22.11.2079|RNA sequence analysis using covariance models]]. //Nucleic Acids Res.// **22**(11):​2079-2088,​ 1994.  
-  - T. M. Lowe and S. R. Eddy. [[http://dx.doi.org/​10.1093/​nar/​25.5.0955|tRNAscan-SE:​ A program for improved detection of transfer RNA genes in genomic sequence]]. //Nucleic Acids Res.// **25**:​955-964,​ 1997. +  - T. M. Lowe and S. R. Eddy. [[https://​doi.org/​10.1093/​nar/​25.5.0955|tRNAscan-SE:​ A program for improved detection of transfer RNA genes in genomic sequence]]. //Nucleic Acids Res.// **25**:​955-964,​ 1997. 
 ==== Genome annotation III: Finding protein-coding genes ==== ==== Genome annotation III: Finding protein-coding genes ====
 Here are several references, in chronological order. First, for **prokaryotes** ... Here are several references, in chronological order. First, for **prokaryotes** ...
  
-  - M. Borodovsky and J. McIninch. [[http://dx.doi.org/​10.1016/​0097-8485(93)85004-V|Genmark:​ Parallel gene recognition for both DNA strands]]. //Computers Chem.// **17**(2):​123-133,​ 1993.  +  - M. Borodovsky and J. McIninch. [[https://​doi.org/​10.1016/​0097-8485(93)85004-V|Genmark:​ Parallel gene recognition for both DNA strands]]. //Computers Chem.// **17**(2):​123-133,​ 1993.  
-  - A. V. Lukashin and M. Borodovsky. [[http://dx.doi.org/​10.1093/​nar/​26.4.1107|GeneMark.hmm:​ New solutions for gene finding]]. //Nucleic Acids Res.// **26**(4):​1107-1115,​ 1998.  +  - A. V. Lukashin and M. Borodovsky. [[https://​doi.org/​10.1093/​nar/​26.4.1107|GeneMark.hmm:​ New solutions for gene finding]]. //Nucleic Acids Res.// **26**(4):​1107-1115,​ 1998.  
-  - S. L. Salzberg, A. L. Delcher, S. Kasif, and O. White. [[http://dx.doi.org/​10.1093/​nar/​26.2.544|Microbial gene identification using interpolated Markov models]]. //Nucleic Acids Res.// **26**(2):​544-548,​ 1998.  +  - S. L. Salzberg, A. L. Delcher, S. Kasif, and O. White. [[https://​doi.org/​10.1093/​nar/​26.2.544|Microbial gene identification using interpolated Markov models]]. //Nucleic Acids Res.// **26**(2):​544-548,​ 1998.  
-  - A. L. Delcher, D. Harmon, S. Kasif, O. White, and S. L. Salzberg. [[http://dx.doi.org/​10.1093/​nar/​27.23.4636|Improved microbial gene identification with GLIMMER]]. //Nucleic Acids Res.// **27**(23):​4636-3641,​ 1999.  +  - A. L. Delcher, D. Harmon, S. Kasif, O. White, and S. L. Salzberg. [[https://​doi.org/​10.1093/​nar/​27.23.4636|Improved microbial gene identification with GLIMMER]]. //Nucleic Acids Res.// **27**(23):​4636-3641,​ 1999.  
-  - D. Frishman, A. Mironov, H.-W. Mewes, and M. Gelfand. [[http://dx.doi.org/​10.1093/​nar/​26.12.2941|Combining diverse evidence for gene recognition in completely sequenced bacterial genomes]]. //Nucleic Acids Res.// **26**(12):​2941-2947,​ 1998.  +  - D. Frishman, A. Mironov, H.-W. Mewes, and M. Gelfand. [[https://​doi.org/​10.1093/​nar/​26.12.2941|Combining diverse evidence for gene recognition in completely sequenced bacterial genomes]]. //Nucleic Acids Res.// **26**(12):​2941-2947,​ 1998.  
-  - S. S. Hannenhalli,​ W. S. Hayes, A. G. Hatzigeorgiou,​ and J. W. Fickett. [[http://dx.doi.org/​10.1093/​nar/​27.17.3577|Bacterial start site prediction]]. //Nucleic Acids Res.// **27**(17):​3577-3582,​ 1999. +  - S. S. Hannenhalli,​ W. S. Hayes, A. G. Hatzigeorgiou,​ and J. W. Fickett. [[https://​doi.org/​10.1093/​nar/​27.17.3577|Bacterial start site prediction]]. //Nucleic Acids Res.// **27**(17):​3577-3582,​ 1999. 
   - J. H. Badger and G. J. Olsen. [[http://​mbe.oxfordjournals.org/​content/​16/​4/​512|CRITICA:​ Coding region identification tool invoking comparative analysis]]. //Mol. Biol. Evol.// **16**(4):​512-524,​ 1999.    - J. H. Badger and G. J. Olsen. [[http://​mbe.oxfordjournals.org/​content/​16/​4/​512|CRITICA:​ Coding region identification tool invoking comparative analysis]]. //Mol. Biol. Evol.// **16**(4):​512-524,​ 1999. 
-  - T. S. Larsen and A. Krogh. [[http://dx.doi.org/​10.1186/​1471-2105-4-21|EasyGene ​ a prokaryotic gene finder that ranks ORFs by statistical significance]]. //BMC Bioinformatics//​ **4**(21), 2003.  +  - T. S. Larsen and A. Krogh. [[https://​doi.org/​10.1186/​1471-2105-4-21|EasyGene ​ a prokaryotic gene finder that ranks ORFs by statistical significance]]. //BMC Bioinformatics//​ **4**(21), 2003.  
-  - L. Krause, A. McHardy, T. Nattkemper, A. Pühler, J. Stoye, F. Meyer. [[http://dx.doi.org/​10.1093/​nar/​gkl1083|GISMO - gene identification using a support vector machine for ORF classification]]. //Nucleic Acids Res.// **35**(2):​540-549,​ 2006.+  - L. Krause, A. McHardy, T. Nattkemper, A. Pühler, J. Stoye, F. Meyer. [[https://​doi.org/​10.1093/​nar/​gkl1083|GISMO - gene identification using a support vector machine for ORF classification]]. //Nucleic Acids Res.// **35**(2):​540-549,​ 2006.
    
 ... and now for **eukaryotes**:​ ... and now for **eukaryotes**:​
  
-  - G. Stormo. [[http://dx.doi.org/​10.1016/​0076-6879(90)83015-2|Consensus patterns in DNA]]. In R. F. Doolittle, editor, Molecular Evolution: Computer Analysis of Protein and Nucleic Acid Sequences, //Meth. Enzymol.// **volume 183** chapter 13, pages 211-221. Academic Press, San Diego, CA, 1990.  +  - G. Stormo. [[https://​doi.org/​10.1016/​0076-6879(90)83015-2|Consensus patterns in DNA]]. In R. F. Doolittle, editor, Molecular Evolution: Computer Analysis of Protein and Nucleic Acid Sequences, //Meth. Enzymol.// **volume 183** chapter 13, pages 211-221. Academic Press, San Diego, CA, 1990.  
-  - R. Guigó, S. Knudsen, N. Drake, and T. Smith. [[http://dx.doi.org/​10.1016/​0022-2836(92)90130-C|Prediction of gene structure]]. //J. Mol. Biol.// **226**(1):​141-157,​ 1992. +  - R. Guigó, S. Knudsen, N. Drake, and T. Smith. [[https://​doi.org/​10.1016/​0022-2836(92)90130-C|Prediction of gene structure]]. //J. Mol. Biol.// **226**(1):​141-157,​ 1992. 
   - E. C. Uberbacher and R. J. Mural. [[http://​www.pnas.org/​content/​88/​24/​11261|Locating protein-coding regions in human DNA sequences by a multiple sensor-neural network approach]]. //Proc. Natl. Acad. Sci. USA// **88**(24):​11261-11265,​ 1991.    - E. C. Uberbacher and R. J. Mural. [[http://​www.pnas.org/​content/​88/​24/​11261|Locating protein-coding regions in human DNA sequences by a multiple sensor-neural network approach]]. //Proc. Natl. Acad. Sci. USA// **88**(24):​11261-11265,​ 1991. 
-  - E. C. Uberbacher, Y. Xu, and R. J. Mural. [[http://dx.doi.org/​10.1016/​S0076-6879(96)66018-2|Discovering and understanding genes in human DNA sequence using GRAIL]]. In R. F. Doolittle, editor, Computer Methods for Macromolecular Sequence Analysis, //Meth. Enzymol.// **volume 266** chapter 28, pages 259-281. Academic Press, San Diego, CA, 1996.  +  - E. C. Uberbacher, Y. Xu, and R. J. Mural. [[https://​doi.org/​10.1016/​S0076-6879(96)66018-2|Discovering and understanding genes in human DNA sequence using GRAIL]]. In R. F. Doolittle, editor, Computer Methods for Macromolecular Sequence Analysis, //Meth. Enzymol.// **volume 266** chapter 28, pages 259-281. Academic Press, San Diego, CA, 1996.  
-  - S. L. Salzberg. [[http://dx.doi.org/​10.1093/​bioinformatics/​13.4.365|A method for identifying splice sites and translation start sites in eukaryotic mRNA]]. //CABIOS// **13**(4):​365-376,​ 1997.  +  - S. L. Salzberg. [[https://​doi.org/​10.1093/​bioinformatics/​13.4.365|A method for identifying splice sites and translation start sites in eukaryotic mRNA]]. //CABIOS// **13**(4):​365-376,​ 1997.  
-  - M. G. Reese, F. H. Eeckman, D. Kulp, and D. Haussler. [[http://dx.doi.org/​10.1089/​cmb.1997.4.311|Improved splice site detection in Genie]]. //J. Comp. Biol.// **4**(3):​311-323,​ 1997.  +  - M. G. Reese, F. H. Eeckman, D. Kulp, and D. Haussler. [[https://​doi.org/​10.1089/​cmb.1997.4.311|Improved splice site detection in Genie]]. //J. Comp. Biol.// **4**(3):​311-323,​ 1997.  
-  - C. Burge and S. Karlin. [[http://dx.doi.org/​10.1006/​jmbi.1997.0951|Prediction of complete gene structures in human genomic DNA]]. //J. Mol. Biol.// **268**(1):​78-94,​ 1997.  +  - C. Burge and S. Karlin. [[https://​doi.org/​10.1006/​jmbi.1997.0951|Prediction of complete gene structures in human genomic DNA]]. //J. Mol. Biol.// **268**(1):​78-94,​ 1997.  
-  - G. Parra, E. Blanco, and R. Guigó. [[http://dx.doi.org/​10.1101/​gr.10.4.511|GeneID in Drosophila]]. //Genome Res.// **10**(4):​511-515,​ 2000.  +  - G. Parra, E. Blanco, and R. Guigó. [[https://​doi.org/​10.1101/​gr.10.4.511|GeneID in Drosophila]]. //Genome Res.// **10**(4):​511-515,​ 2000.  
-  - M. G. Reese, D. Kulp, H. Tammana, and D. Haussler. [[http://dx.doi.org/​10.1101/​gr.10.4.529|Genie - gene finding in Drosophila melanogaster]]. //Genome Res.// **10**:​529-538,​ 2000.  +  - M. G. Reese, D. Kulp, H. Tammana, and D. Haussler. [[https://​doi.org/​10.1101/​gr.10.4.529|Genie - gene finding in Drosophila melanogaster]]. //Genome Res.// **10**:​529-538,​ 2000.  
-  - M. Pertea and S. L. Salzberg. [[http://dx.doi.org/​10.1023/​A:​1013770123580|Computational gene finding in plants]]. //Plant Mol. Biol.// **48**(1-2):​39-48,​ 2002. +  - M. Pertea and S. L. Salzberg. [[https://​doi.org/​10.1023/​A:​1013770123580|Computational gene finding in plants]]. //Plant Mol. Biol.// **48**(1-2):​39-48,​ 2002. 
 ==== Genome annotation IV: Patterns in non-coding regions ==== ==== Genome annotation IV: Patterns in non-coding regions ====
 This part is almost completely based on the work of Mathieu Blanchette and co-authors: ​ This part is almost completely based on the work of Mathieu Blanchette and co-authors: ​
  
-  - M. Blanchette, B. Schwikowski,​ and M. Tompa. [[http://dx.doi.org/​10.1089/​10665270252935421|Algorithms for phylogenetic footprinting]]. //J. Comp. Biol.// **9**(2):​211-223,​ 2002.  +  - M. Blanchette, B. Schwikowski,​ and M. Tompa. [[https://​doi.org/​10.1089/​10665270252935421|Algorithms for phylogenetic footprinting]]. //J. Comp. Biol.// **9**(2):​211-223,​ 2002.  
-  - M. Blanchette and M. Tompa. [[http://dx.doi.org/​10.1101/​gr.6902|Discovery of regulatory elements by a computational method for phylogenetic footprinting]]. //Genome Res.// **12**:​739-748,​ 2002. +  - M. Blanchette and M. Tompa. [[https://​doi.org/​10.1101/​gr.6902|Discovery of regulatory elements by a computational method for phylogenetic footprinting]]. //Genome Res.// **12**:​739-748,​ 2002. 
 ==== Genome annotation V: Functional genome annotation ==== ==== Genome annotation V: Functional genome annotation ====
 All bioinformatics textbooks contain good overviews of sequence analysis by Smith-Waterman and fast heuristic methods for database search like FASTA and BLAST. Here are some references to the original papers, including one by ourselves: ​ All bioinformatics textbooks contain good overviews of sequence analysis by Smith-Waterman and fast heuristic methods for database search like FASTA and BLAST. Here are some references to the original papers, including one by ourselves: ​
  
-  - T. F. Smith and M. S. Waterman. [[http://dx.doi.org/​10.1016/​0022-2836(81)90087-5|Identification of common molecular subsequences]]. //J. Mol. Biol.// **147**(1):​195-197,​ 1981.  +  - T. F. Smith and M. S. Waterman. [[https://​doi.org/​10.1016/​0022-2836(81)90087-5|Identification of common molecular subsequences]]. //J. Mol. Biol.// **147**(1):​195-197,​ 1981.  
-  - W. R. Pearson. [[http://dx.doi.org/​10.1016/​0076-6879(90)83007-V|Rapid and sensitive sequence comparison with FASTP and FASTA]]. In R. F. Doolittle, editor, Molecular Evolution: Computer Analysis of Protein and Nucleic Acid Sequences, //Meth. Enzymol.// **volume 183** chapter 5, pages 63-98. Academic Press, San Diego, CA, 1990.  +  - W. R. Pearson. [[https://​doi.org/​10.1016/​0076-6879(90)83007-V|Rapid and sensitive sequence comparison with FASTP and FASTA]]. In R. F. Doolittle, editor, Molecular Evolution: Computer Analysis of Protein and Nucleic Acid Sequences, //Meth. Enzymol.// **volume 183** chapter 5, pages 63-98. Academic Press, San Diego, CA, 1990.  
-  - S. F. Altschul, W. Gish, W. Miller, E. W. Myers, and D. J. Lipman. [[http://dx.doi.org/​10.1016/​S0022-2836(05)80360-2|Basic local alignment search tool]]. //J. Mol. Biol.// **215**(3):​403-410,​ 1990.  +  - S. F. Altschul, W. Gish, W. Miller, E. W. Myers, and D. J. Lipman. [[https://​doi.org/​10.1016/​S0022-2836(05)80360-2|Basic local alignment search tool]]. //J. Mol. Biol.// **215**(3):​403-410,​ 1990.  
-  - S. F. Altschul, T. L. Madden, A. A. Schaffer, J. Zhang, Z. Zhang, W. Miller, and D. J. Lipman. [[http://dx.doi.org/​10.1093/​nar/​25.17.3389|Gapped BLAST and PSI-BLAST: A new generation of protein database search programs]]. //Nucleic Acids Res.// **25**(17):​3389-3402,​ 1997.  +  - S. F. Altschul, T. L. Madden, A. A. Schaffer, J. Zhang, Z. Zhang, W. Miller, and D. J. Lipman. [[https://​doi.org/​10.1093/​nar/​25.17.3389|Gapped BLAST and PSI-BLAST: A new generation of protein database search programs]]. //Nucleic Acids Res.// **25**(17):​3389-3402,​ 1997.  
-  - K. Rasmussen, J. Stoye, and E. W. Myers. [[http://dx.doi.org/​10.1089/​cmb.2006.13.296|Efficient q-gram filters for finding all epsilon-matches over a given length]]. //J. Comp. Biol.// **13**(2):​296-308,​ 2006. +  - K. Rasmussen, J. Stoye, and E. W. Myers. [[https://​doi.org/​10.1089/​cmb.2006.13.296|Efficient q-gram filters for finding all epsilon-matches over a given length]]. //J. Comp. Biol.// **13**(2):​296-308,​ 2006. 
 ==== Transcriptomics I: mRNA analysis, RNA-Seq ==== ==== Transcriptomics I: mRNA analysis, RNA-Seq ====
 A few papers on EST clustering and splicing graphs: ​ A few papers on EST clustering and splicing graphs: ​
  
-  - S. Heber, M. Alekseyev, S.-H. Sze, H. Tang, P. A. Pevzner. [[http://dx.doi.org/​10.1093/​bioinformatics/​18.suppl_1.S181+  - S. Heber, M. Alekseyev, S.-H. Sze, H. Tang, P. A. Pevzner. [[https://​doi.org/​10.1093/​bioinformatics/​18.suppl_1.S181
 |Splicing graphs and EST assembly problem]]. //​Bioinformatics//​ **18**(Suppl. 1):​S181-S188,​ 2002.  |Splicing graphs and EST assembly problem]]. //​Bioinformatics//​ **18**(Suppl. 1):​S181-S188,​ 2002. 
-  - M. Sammeth, S. Foissac, R. Guigó. [[http://dx.doi.org/​10.1371/​journal.pcbi.1000147|A general definition and nomenclature for alternative splicing events]]. //PLoS Comput. Biol.// **4**(8):​e1000147,​ 2008.  +  - M. Sammeth, S. Foissac, R. Guigó. [[https://​doi.org/​10.1371/​journal.pcbi.1000147|A general definition and nomenclature for alternative splicing events]]. //PLoS Comput. Biol.// **4**(8):​e1000147,​ 2008.  
-  - V. Lacroix, M. Sammeth, R. Guigó, A. Bergeron. [[http://dx.doi.org/​10.1007/​978-3-540-87361-7_5|Exact Transcriptome Reconstruction from Short Sequence reads]]. Proc. WABi 2008, LNBI 5251, 50-63, 2008. +  - V. Lacroix, M. Sammeth, R. Guigó, A. Bergeron. [[https://​doi.org/​10.1007/​978-3-540-87361-7_5|Exact Transcriptome Reconstruction from Short Sequence reads]]. Proc. WABi 2008, LNBI 5251, 50-63, 2008. 
  
 Very good review about NGS transcriptomics (RNA-seq): ​ Very good review about NGS transcriptomics (RNA-seq): ​
  
-  - J. A. Martin, Z. Wang. [[http://dx.doi.org/​10.1038/​nrg3068|Next-generation transcriptome assembly]]. Nat. Rev. Genet., **12**(10), 671–682, 2011.  ​+  - J. A. Martin, Z. Wang. [[https://​doi.org/​10.1038/​nrg3068|Next-generation transcriptome assembly]]. Nat. Rev. Genet., **12**(10), 671–682, 2011.  ​
  
 Examples for software supporting RNA-seq: ​ Examples for software supporting RNA-seq: ​
  
-  - C. Trapnell, L. Pachter, S. Salzberg. [[http://dx.doi.org/​10.1093/​bioinformatics/​btp120|TopHat:​ discovering splice junctions with RNA-Seq]]. //​Bioinformatics//​ **25**(9): 1105-1111, 2009.  +  - C. Trapnell, L. Pachter, S. Salzberg. [[https://​doi.org/​10.1093/​bioinformatics/​btp120|TopHat:​ discovering splice junctions with RNA-Seq]]. //​Bioinformatics//​ **25**(9): 1105-1111, 2009.  
-  - K. Wang, D. Singh, Z. Zeng, S. J. Coleman, Y. Huang, G. L. Savich, X. He, P. Mieczkowski,​ S. A. Grimm, C. M. Perou, J. N. MacLeod, D. Y. Chiang, J. F. Prins, J. Liu. [[http://dx.doi.org/​10.1093/​nar/​gkq622|MapSplice:​ Accurate mapping of RNA-seq reads for splice junction discovery]]. //Nucleic Acids Res.// **38**(18): e178, 2010. +  - K. Wang, D. Singh, Z. Zeng, S. J. Coleman, Y. Huang, G. L. Savich, X. He, P. Mieczkowski,​ S. A. Grimm, C. M. Perou, J. N. MacLeod, D. Y. Chiang, J. F. Prins, J. Liu. [[https://​doi.org/​10.1093/​nar/​gkq622|MapSplice:​ Accurate mapping of RNA-seq reads for splice junction discovery]]. //Nucleic Acids Res.// **38**(18): e178, 2010. 
 ==== Tanscriptomics II: DNA microarray design ==== ==== Tanscriptomics II: DNA microarray design ====
 Just a few in-house papers, because they are so nice:  Just a few in-house papers, because they are so nice: 
  
-  - S. Rahmann. [[http://dx.doi.org/​10.1142/​S0219720003000125|Fast large scale oligonucleotide selection using the longest common factor approach]]. //J. Bioinf. Comput. Biol.// **1**(2):​343-361,​ 2003.  +  - S. Rahmann. [[https://​doi.org/​10.1142/​S0219720003000125|Fast large scale oligonucleotide selection using the longest common factor approach]]. //J. Bioinf. Comput. Biol.// **1**(2):​343-361,​ 2003.  
-  - S. Rahmann. [[http://dx.doi.org/​10.1093/​bioinformatics/​btg1073|The shortest common supersequence problem in a microarray production setting]]. //​Bioinformatics//​ **19**(Suppl. 2):156-161, 2003. (Proceedings of ECCB 2003).  +  - S. Rahmann. [[https://​doi.org/​10.1093/​bioinformatics/​btg1073|The shortest common supersequence problem in a microarray production setting]]. //​Bioinformatics//​ **19**(Suppl. 2):156-161, 2003. (Proceedings of ECCB 2003).  
-  - S.A. de Carvalho Jr. and S. Rahmann. [[http://dx.doi.org/​10.1007/​11851561_30|Improving the layout of oligonucleotide microarrays:​ Pivot Partitioning]]. In Proc. 6th Workshop of Algorithms in Bioinformatics //LNBI// volume **4175** pages 321-332. Springer, 2006.  +  - S.A. de Carvalho Jr. and S. Rahmann. [[https://​doi.org/​10.1007/​11851561_30|Improving the layout of oligonucleotide microarrays:​ Pivot Partitioning]]. In Proc. 6th Workshop of Algorithms in Bioinformatics //LNBI// volume **4175** pages 321-332. Springer, 2006.  
-  - S.A. de Carvalho Jr. and S. Rahmann. [[http://​subs.emis.de/​LNI/​Proceedings/​Proceedings83/​article7005.html|Microarray layout as a quadratic assignment problem]]. In Proc. German Conference on Bioinformatics,​ volume P-83 of Lecture Notes in Informatics,​ pages 11-20. GI, 2006. +  - S.A. de Carvalho Jr. and S. Rahmann. [[http://​subs.emis.de/​LNI/​Proceedings/​Proceedings83/​article7005.html|Microarray layout as a quadratic assignment problem]]. In Proc. German Conference on Bioinformatics,​ volume P-83 of Lecture Notes in Informatics,​ pages 11-20. GI, 2006. 
 ==== Tanscriptomics III: DNA microarray analysis ==== ==== Tanscriptomics III: DNA microarray analysis ====
 A couple of papers and one book:  A couple of papers and one book: 
  
-  - W. Huber, A. von Heydebreck, H. Sueltmann, A. Poustka, and M. Vingron. [[http://dx.doi.org/​10.1093/​bioinformatics/​18.suppl_1.S96|Variance stabilization applied to microarray data calibration and to the quantification of differential expression]]. //​Bioinformatics//​ **18**(Suppl. 1):96-104, 2002. +  - W. Huber, A. von Heydebreck, H. Sueltmann, A. Poustka, and M. Vingron. [[https://​doi.org/​10.1093/​bioinformatics/​18.suppl_1.S96|Variance stabilization applied to microarray data calibration and to the quantification of differential expression]]. //​Bioinformatics//​ **18**(Suppl. 1):96-104, 2002. 
   - T. P. Speed, editor. [[https://​www.crcpress.com/​Statistical-Analysis-of-Gene-Expression-Microarray-Data/​Speed/​p/​book/​9781584883272|Statistical Analysis of Gene Expression Microarray Data]]. Chapman and Hall/CRC, 2003.    - T. P. Speed, editor. [[https://​www.crcpress.com/​Statistical-Analysis-of-Gene-Expression-Microarray-Data/​Speed/​p/​book/​9781584883272|Statistical Analysis of Gene Expression Microarray Data]]. Chapman and Hall/CRC, 2003. 
   - G. K. Smyth. [[https://​doi.org/​10.2202/​1544-6115.1027|Linear models and empirical bayes methods for assessing differential expression in microarray experiments]]. //Stat. Appl. Genet. Mol. Biol.// **3**(1):​1-25,​ 2004.    - G. K. Smyth. [[https://​doi.org/​10.2202/​1544-6115.1027|Linear models and empirical bayes methods for assessing differential expression in microarray experiments]]. //Stat. Appl. Genet. Mol. Biol.// **3**(1):​1-25,​ 2004. 
-  - C. Steinhoff and M. Vingron. [[http://dx.doi.org/​10.1093/​bib/​bbl002|Normalization and quantification of differential expression in gene expression microarrays]]. //Brief. Bioinform.//​ **7**(2):​166-177,​ 2006. +  - C. Steinhoff and M. Vingron. [[https://​doi.org/​10.1093/​bib/​bbl002|Normalization and quantification of differential expression in gene expression microarrays]]. //Brief. Bioinform.//​ **7**(2):​166-177,​ 2006. 
 ==== RNA secondary structure, structure space, RNA-based regulation ==== ==== RNA secondary structure, structure space, RNA-based regulation ====
 The most classical algorithm for RNA secondary structure prediction is Nussinov'​s algorithm: The most classical algorithm for RNA secondary structure prediction is Nussinov'​s algorithm:
  
-  - R. Nussinov, G. Pieczenik, J. R. Griggs, D. J. Kleitman. [[http://dx.doi.org/​10.1137/​0135006|Algorithms for Loop Matchings]]. //SIAM J. Appl. Math.// **35**(1):​68–82,​ 1978. +  - R. Nussinov, G. Pieczenik, J. R. Griggs, D. J. Kleitman. [[https://​doi.org/​10.1137/​0135006|Algorithms for Loop Matchings]]. //SIAM J. Appl. Math.// **35**(1):​68–82,​ 1978. 
-  - R. Giegerich, B. Voß, M. Rehmsmeier. [[http://dx.doi.org/​10.1093/​nar/​gkh779|Abstract shapes of RNA]]. //Nucleic Acids Res.// **32**(16):​4843-4851,​ 2004.+  - R. Giegerich, B. Voß, M. Rehmsmeier. [[https://​doi.org/​10.1093/​nar/​gkh779|Abstract shapes of RNA]]. //Nucleic Acids Res.// **32**(16):​4843-4851,​ 2004. 
 ==== Proteomics I: 2D gels ==== ==== Proteomics I: 2D gels ====
 A survey of various techniques for two-dimensional gel alignment is [1]. The method described in the lecture is from [2], [3].  A survey of various techniques for two-dimensional gel alignment is [1]. The method described in the lecture is from [2], [3]. 
  
-  - A. W. Dowsey, M. J. Dunn, and G.-Z. Yang. [[http://dx.doi.org/​10.1002/​pmic.200300459|The role of bioinformatics in two-dimensional gel electrophoresis]]. //​Proteomics//​ **3**(8):​1567-1596,​ 2003.  +  - A. W. Dowsey, M. J. Dunn, and G.-Z. Yang. [[https://​doi.org/​10.1002/​pmic.200300459|The role of bioinformatics in two-dimensional gel electrophoresis]]. //​Proteomics//​ **3**(8):​1567-1596,​ 2003.  
-  - F. A. Potra, X. Liu, F. Seillier-Moiseiwitsch,​ A. Roy, Y. Hang, M. R. Marten, B. Raman, and C. Whisnant. [[http://dx.doi.org/​10.1089/​cmb.2006.13.614|Protein image alignment via piecewise affine transformations]]. //J. Comp. Biol.// **13**(3):​614-630,​ 2006.  +  - F. A. Potra, X. Liu, F. Seillier-Moiseiwitsch,​ A. Roy, Y. Hang, M. R. Marten, B. Raman, and C. Whisnant. [[https://​doi.org/​10.1089/​cmb.2006.13.614|Protein image alignment via piecewise affine transformations]]. //J. Comp. Biol.// **13**(3):​614-630,​ 2006.  
-  - F. A. Potra, X. Liu. [[http://dx.doi.org/​10.1089/​cmb.2006.13.1384|Aligning Families of Two-Dimensional Gels by a Combined Multiresolution Forward-Inverse Transformation Approach]]. //J. Comp. Biol.// **13**(7):​1384-1395,​ 2006.  +  - F. A. Potra, X. Liu. [[https://​doi.org/​10.1089/​cmb.2006.13.1384|Aligning Families of Two-Dimensional Gels by a Combined Multiresolution Forward-Inverse Transformation Approach]]. //J. Comp. Biol.// **13**(7):​1384-1395,​ 2006.  
-  - Andrew W. Dowsey, Michael J. Dunn and Guang-Zhong Yang. [[http://dx.doi.org/​10.1093/​bioinformatics/​btn059|Automated image alignment for 2D gel electrophoresis in a high-throughput proteomics pipeline]]. //​Bioinformatics//​ **24**(7):​950-957,​ 2008 +  - Andrew W. Dowsey, Michael J. Dunn and Guang-Zhong Yang. [[https://​doi.org/​10.1093/​bioinformatics/​btn059|Automated image alignment for 2D gel electrophoresis in a high-throughput proteomics pipeline]]. //​Bioinformatics//​ **24**(7):​950-957,​ 2008
 ==== Proteomics II: Analysis of mass spectra ==== ==== Proteomics II: Analysis of mass spectra ====
 The algorithms discussed include: de-novo protein sequencing by mass spectra [1,2], the money changing problem [3], alignment of time-series of mass spectra [4].  The algorithms discussed include: de-novo protein sequencing by mass spectra [1,2], the money changing problem [3], alignment of time-series of mass spectra [4]. 
  
-  - T Chen, Ming-Yang Kao, M Tepel, J Rush, and G M Church. [[http://dx.doi.org/​10.1089/​10665270152530872|A dynamic programming approach to de novo peptide sequencing via tandem mass spectrometry]]. //J. Comput. Biol.// **8**(3):​325-337,​ 2001.  +  - T Chen, Ming-Yang Kao, M Tepel, J Rush, and G M Church. [[https://​doi.org/​10.1089/​10665270152530872|A dynamic programming approach to de novo peptide sequencing via tandem mass spectrometry]]. //J. Comput. Biol.// **8**(3):​325-337,​ 2001.  
-  - Lijuan Mo, Debojyoti Dutta, Yunhu Wan, and Ting Chen. [[http://dx.doi.org/​10.1021/​ac070039n|MSNovo:​ A Dynamic Programming Algorithm for de Novo Peptide Sequencing via Tandem Mass Spectrometry]]. //Anal. Chem.// **79**(13): 4870–4878,​ 2007.  +  - Lijuan Mo, Debojyoti Dutta, Yunhu Wan, and Ting Chen. [[https://​doi.org/​10.1021/​ac070039n|MSNovo:​ A Dynamic Programming Algorithm for de Novo Peptide Sequencing via Tandem Mass Spectrometry]]. //Anal. Chem.// **79**(13): 4870–4878,​ 2007.  
-  - Sebastian Böcker and Zsuzsanna Lipták. [[http://dx.doi.org/​10.1007/​s00453-007-0162-8|A fast and simple algorithm for the money changing problem]]. //​Algorithmica//​ **48**(4):​413-432,​ 2007.  +  - Sebastian Böcker and Zsuzsanna Lipták. [[https://​doi.org/​10.1007/​s00453-007-0162-8|A fast and simple algorithm for the money changing problem]]. //​Algorithmica//​ **48**(4):​413-432,​ 2007.  
-  - Amol Prakash, Parag Mallick, Jeffrey Whiteaker, Heidi Zhang, Amanda Paulovich, Mark Flory, Hookeun Lee, Ruedi Aebersold, and Benno Schwikowski. [[http://dx.doi.org/​10.1074/​mcp.M500133-MCP200|Signal maps for mass spectrometry-based comparative proteomics]]. //Mol Cell Proteomics//​ **5**(3):​423-432,​ 2006. +  - Amol Prakash, Parag Mallick, Jeffrey Whiteaker, Heidi Zhang, Amanda Paulovich, Mark Flory, Hookeun Lee, Ruedi Aebersold, and Benno Schwikowski. [[https://​doi.org/​10.1074/​mcp.M500133-MCP200|Signal maps for mass spectrometry-based comparative proteomics]]. //Mol Cell Proteomics//​ **5**(3):​423-432,​ 2006. 
  
 ==== Computational Metabolomics ==== ==== Computational Metabolomics ====
 Some software developed in Bielefeld has been published here:  Some software developed in Bielefeld has been published here: 
  
-  - H. Neuweger, S. P. Albaum, M. Dondrup, M. Persicke, T. Watt, K. Niehaus, J. Stoye, A. Goesmann. [[http://dx.doi.org/​10.1093/​bioinformatics/​btn452|MeltDB:​ A software Platform for the Analysis and Integration of Metabolomics Experiment Data]]. //​Bioinformatics//​ **24**(23):​2726-2732,​ 2008. +  - H. Neuweger, S. P. Albaum, M. Dondrup, M. Persicke, T. Watt, K. Niehaus, J. Stoye, A. Goesmann. [[https://​doi.org/​10.1093/​bioinformatics/​btn452|MeltDB:​ A software Platform for the Analysis and Integration of Metabolomics Experiment Data]]. //​Bioinformatics//​ **24**(23):​2726-2732,​ 2008. 
-  - N. Hoffmann, J. Stoye. [[http://dx.doi.org/​10.5772/​31224|Generic Software Frameworks for GC-MS Based Metabolomics]]. In: U. Roessner (ed.): Metabolomics. Chapter 4, pp. 73-98. InTech, 2012.+  - N. Hoffmann, J. Stoye. [[https://​doi.org/​10.5772/​31224|Generic Software Frameworks for GC-MS Based Metabolomics]]. In: U. Roessner (ed.): Metabolomics. Chapter 4, pp. 73-98. InTech, 2012.
  
 ==== Computational Systems Biology: Networks, their models and algorithms ==== ==== Computational Systems Biology: Networks, their models and algorithms ====
 A textbook that covers this (and much more) is:  A textbook that covers this (and much more) is: 
  
-  - E. Klipp, R. Herwig, A. Kowald, C. Wierling, H. Lehrach. [[http://dx.doi.org/​10.1002/​3527603603|Systems Biology in Practice - Concepts, Implementation and Application]]. Wiley-VCH, 2005. +  - E. Klipp, R. Herwig, A. Kowald, C. Wierling, H. Lehrach. [[https://​doi.org/​10.1002/​3527603603|Systems Biology in Practice - Concepts, Implementation and Application]]. Wiley-VCH, 2005.  
 + 
 +==== Computational pangenomics ==== 
 +The gene based method is considered here (for example): 
 + 
 +  - H. Tettelin et al. [[https://​doi.org/​10.1073/​pnas.0506758102|Genome analysis of multiple pathogenic isolates of Streptococcus agalactiae: Implicationsfor the microbial ‘‘pan-genome’’]]. //Proc. Natl. Academy. Sci. USA// **102**(39):​ 13950-13955,​ 2005. 
 +  - J. Blom, S. P. Albaum, D. Doppmeier, A. Pühler, F.-J. Vorhölter, M. Zakrzewski, and A. Goesmann. [[https://​doi.org/​10.1186/​1471-2105-10-154|EDGAR:​ A software framework for the comparative analysis of prokaryotic genomes]]. //BMC Bioinformatics//​ 10:154, 2009. 
 +  - J. Blom,  J. Kreis, ​ S. Spänig, ​ T. Juhre, ​ C. Bertelli, C. Ernst, and A. Goesmann. [[https://​doi.org/​10.1093/​nar/​gkw255|EDGAR 2.0: an enhanced software platform for comparative gene content analyses]]. //Nucleic Acids Res.// **44**(W1):​W22–W28,​ 2016. 
 +  - J. Blom, S. P. Glaeser, T. Juhre, J. Kreis, P. H. G. Hanel, J. G. Schrader, P. Kämpfer, and A. Goesmann. [[https://​doi.org/​10.1002/​9781118960608.bm00038|EDGAR:​ A Versatile Tool for Phylogenomics]]. In: W. B. Whitman (ed.). Bergey'​s Manual of Systematics of Archaea and Bacteria, Wiley, 2019. 
 + 
 +A good overview of genome-based computational pangenomics gives the following review paper: 
 + 
 +  - The Computational Pan-Genomics Consortium. [[https://​doi.org/​10.1093/​bib/​bbw089|Computational pan-genomics:​ status, promises and challenges]]. //Brief. Bioinf.// **19**(1), 118–135, 2018. 
 + 
 +Some more specialized papers are the following. 
 + 
 +(A) Data structures 
 +  - B. Paten, D. Earl, N. Nguyen, M. Diekhans, D. Zerbino, D. Haussler. [[https://​doi.org/​10.1101/​gr.123356.111|Cactus:​ Algorithms for genome multiple sequence alignment]]. //Genome Research// **21**, 1512–1528,​ 2011 
 +  - C. Ernst, S. Rahmann. [[https://​drops.dagstuhl.de/​opus/​volltexte/​2013/​4231/​pdf/​p035-ernst.pdf|PanCake:​ A Data Structure for Pangenomes]]. Proc. of //GCB 2013//, 35-45, 2013. 
 +  - G. Holley, R. Wittler, and J. Stoye. [[https://​doi.org/​10.1186/​s13015-016-0066-8 |Bloom Filter Trie: an alignment-free and reference-free data structure for pan-genome storage]]. //​Algorithms Mol. Biol.// **11**: 3, 2016. 
 +  -  E. Garrison, J. Sirén, A. M. Novak, G. Hickey, J. M. Eizenga, E. T. Dawson, W. Jones, S. Garg, C. Markello, M. F Lin, B. Paten, and R. Durbin. [[https://​doi.org/​10.1038/​nbt.4227|Variation graph toolkit improves read mapping by representing genetic variation in the reference]]. //Nat. Biotechnol.//​ **36**, 875–879, 2018. 
 +  - G. Holley and P. Melsted. [[https://​doi.org/​10.1186/​s13059-020-02135-8|Bifrost:​ highly parallel construction and indexing of colored and compacted de Bruijn graphs]]. //Genome Biol.// **21**: 249, 2020. 
 + 
 +(B) Sequence-to-graph mapping/​alignment 
 +  - M. Rautiainen, V. Mäkinen, and T. Marschall. [[https://​doi.org/​10.1093/​bioinformatics/​btz162|Bit-parallel sequence-to-graph alignment]]. //​Bioinformatics//​ **35**(19), 3599-3607, 2019.  
 +  -  R. Martiniano, E. Garrison, E. R. Jones, A. Manica, and R. Durbin. [[ https://​doi.org/​10.1186/​s13059-020-02160-7|Removing reference bias and improving indel calling in ancient DNA data analysis by mapping to a sequence variation graph]]. //Genome Biol.// **21**: 250, 2020. 
 +  - M. Rautiainen and T. Marschall. [[https://​doi.org/​10.1186/​s13059-020-02157-2|GraphAligner:​ rapid and versatile sequence-to-graph alignment]]. //Genome Biol.// **21**: 253, 2020. 
 +  - A. Kuhnle, T. Mun, C. Boucher, T. Gagie, B. Langmead, and G. Manzini. [[https://​doi.org/​10.1089/​cmb.2019.0309|Efficient Construction of a Complete Index for Pan-Genomics Read Alignment]]. //J. Comp. Biol.// **27**(4), 500-513, 2020. 
 +  -  N. Luhmann, G. Holley, and M. Achtman. [[https://​doi.org/​10.1101/​2020.01.21.914168|BlastFrost:​ Fast querying of 100,000s of bacterial genomes in Bifrost graphs]]. //​BioRxiv//,​ 2020. 
 +  -  T. Schulz, R. Wittler, S. Rahmann, F. Hach, and J. Stoye. [[https://​doi.org/​10.1093/​bioinformatics/​btab077|Detecting High Scoring Local Alignments in Pangenome Graphs]]. //​Bioinformatics//​ **37**(16), 2266–2274,​ 2021. 
 + 
 +(C) Phylogenomics:​ 
 + 
 +  - R. Wittler. [[https://​doi.org/​10.1186/​s13015-020-00164-3|Alignment- and reference-free phylogenomics with colored de Bruijn graphs]]. //​Algorithms Mol. Biol.// **15**: 4, 2020. 
 +  - A. Rempel, R. Wittler. [[https://​doi.org/​10.1093/​bioinformatics/​btab444|SANS serif: alignment-free,​ whole-genome-based phylogenetic reconstruction]]. //​Bioinformatics//​ **37**(24), 4868-4870, 2021. 
 + 
 +(D) Haplotype inference:​ 
 + 
 +See [[#​haplotype_inference|below]].
  
 ==== Comparative genomics I: Genome alignment, repeat analysis ==== ==== Comparative genomics I: Genome alignment, repeat analysis ====
 Some relevant papers in this area are:  Some relevant papers in this area are: 
  
-  - A. L. Delcher, S. Kasif, R. D. Fleischmann,​ J. Peterson, O. White, and S. L. Salzberg. [[http://dx.doi.org/​10.1093/​nar/​27.11.2369|Alignment of whole genomes]]. //Nucleic Acids Res.// **27**(11):​2369-2376,​ 1999.  +  - A. L. Delcher, S. Kasif, R. D. Fleischmann,​ J. Peterson, O. White, and S. L. Salzberg. [[https://​doi.org/​10.1093/​nar/​27.11.2369|Alignment of whole genomes]]. //Nucleic Acids Res.// **27**(11):​2369-2376,​ 1999.  
-  - S. Schwartz, Z. Zhang, K. A. Fraser, A. Smit, C. Riemer, J. Bouck, R. Gibson, R. Hardisson, and W. Miller. [[http://dx.doi.org/​10.1101/​gr.10.4.577|PipMaker-A web server for aligning two genomic DNA sequences]]. //Genome Res.// **10**(4):​577-586,​ 2000.  +  - S. Schwartz, Z. Zhang, K. A. Fraser, A. Smit, C. Riemer, J. Bouck, R. Gibson, R. Hardisson, and W. Miller. [[https://​doi.org/​10.1101/​gr.10.4.577|PipMaker-A web server for aligning two genomic DNA sequences]]. //Genome Res.// **10**(4):​577-586,​ 2000.  
-  - M. Höhl, S. Kurtz, and E. Ohlebusch. [[http://dx.doi.org/​10.1093/​bioinformatics/​18.suppl_1.S312|Efficient multiple genome alignment]]. //​Bioinformatics//​ **18**(Suppl. 1):​S312-S320,​ 2002. (Proceedings of ISMB 2002).  +  - M. Höhl, S. Kurtz, and E. Ohlebusch. [[https://​doi.org/​10.1093/​bioinformatics/​18.suppl_1.S312|Efficient multiple genome alignment]]. //​Bioinformatics//​ **18**(Suppl. 1):​S312-S320,​ 2002. (Proceedings of ISMB 2002).  
-  - S. Kurtz, A. Phillippy, A. L. Delcher, M. Smoot, Shumway M., C. Antonescu, and S. L. Salzberg. [[http://dx.doi.org/​10.1186/​gb-2004-5-2-r12|Versatile and open software for comparing large genomes]]. //Genome Biol.// **5**:R 12, 2004.  +  - S. Kurtz, A. Phillippy, A. L. Delcher, M. Smoot, Shumway M., C. Antonescu, and S. L. Salzberg. [[https://​doi.org/​10.1186/​gb-2004-5-2-r12|Versatile and open software for comparing large genomes]]. //Genome Biol.// **5**:R 12, 2004.  
-  - A. C. E. Darling, B. Mau, F. R. Blattner, and N. T. Perna. [[http://dx.doi.org/​10.1101/​gr.2289704|Mauve:​ Multiple alignment of conserved genomic sequence with rearrangements]]. //Genome Res.// **14**(7):​1394-1403,​ 2004.  +  - A. C. E. Darling, B. Mau, F. R. Blattner, and N. T. Perna. [[https://​doi.org/​10.1101/​gr.2289704|Mauve:​ Multiple alignment of conserved genomic sequence with rearrangements]]. //Genome Res.// **14**(7):​1394-1403,​ 2004.  
-  - E. Ohlebusch, M. I. Abouelhoda. [[http://dx.doi.org/​10.1201/​9781420036275.ch15|Chaining Algorithms and Applications in Comparative Genomics]]. In: Handbook of Computational Molecular Biology (Chapter 15), edited by Srinivas ​Aluru, Chapman & Hall/CRC Computer and Information Science Series, 2006.  +  - E. Ohlebusch, M. I. Abouelhoda. [[https://​doi.org/​10.1201/​9781420036275.ch15|Chaining Algorithms and Applications in Comparative Genomics]]. In: Handbook of Computational Molecular Biology (Chapter 15), edited by S. Aluru, Chapman & Hall/CRC Computer and Information Science Series, 2006.  
-  - S. Kurtz, J. V. Choudhuri, E. Ohlebusch, C. Schleiermacher,​ J. Stoye, and R. Giegerich. [[http://dx.doi.org/​10.1093/​nar/​29.22.4633|REPuter:​ the manifold applications of repeat analysis on a genomic scale]]. //Nucleic Acids Res.// **29**(22):​4643-4653,​2001. ​+  - S. Kurtz, J. V. Choudhuri, E. Ohlebusch, C. Schleiermacher,​ J. Stoye, and R. Giegerich. [[https://​doi.org/​10.1093/​nar/​29.22.4633|REPuter:​ the manifold applications of repeat analysis on a genomic scale]]. //Nucleic Acids Res.// **29**(22):​4643-4653,​ 2001. 
 ==== Comparative genomics II: Genome rearrangement ==== ==== Comparative genomics II: Genome rearrangement ====
 The classical papers are by Hannenhalli and Pevzner (1999 and 1995). A much better readable description for the reversal distance can be found in the book chapter by Bergeron //et al.// (2005). The general DCJ model is described in (Yancopoulos //et al.//, 2005) and (much nicer) in (Bergeron //et al.//, 2006a). A correct algorithm for sorting by translocations is given in (Bergeron //et al.//, 2006b), an almost correct one for the HP distance in (Bergeron //et al.//, 2009), one formula of which is corrected in (Erdős et al., 2011). A very good overview of median, halving and guided halving results can be found in (Tannier et al., 2009). ​ The classical papers are by Hannenhalli and Pevzner (1999 and 1995). A much better readable description for the reversal distance can be found in the book chapter by Bergeron //et al.// (2005). The general DCJ model is described in (Yancopoulos //et al.//, 2005) and (much nicer) in (Bergeron //et al.//, 2006a). A correct algorithm for sorting by translocations is given in (Bergeron //et al.//, 2006b), an almost correct one for the HP distance in (Bergeron //et al.//, 2009), one formula of which is corrected in (Erdős et al., 2011). A very good overview of median, halving and guided halving results can be found in (Tannier et al., 2009). ​
  
-  - S. Hannenhalli and P. A. Pevzner. [[http://dx.doi.org/​10.1145/​300515.300516|Transforming cabbage into turnip: Polynomial algorithm for sorting signed permutations by reversals]]. //J. ACM// **46**(1):​127,​ 1999.  +  - S. Hannenhalli and P. A. Pevzner. [[https://​doi.org/​10.1145/​300515.300516|Transforming cabbage into turnip: Polynomial algorithm for sorting signed permutations by reversals]]. //J. ACM// **46**(1):​127,​ 1999.  
-  - S. Hannenhalli and P. A. Pevzner. [[http://dx.doi.org/​10.1109/​SFCS.1995.492588|Transforming men into mice (polynomial algorithm for genomic distance problem)]]. In //Proc. 36th Annu. Symp. Found. Comput. Sci.// (FOCS 1995), pages 581-592. IEEE Press, 1995.+  - S. Hannenhalli and P. A. Pevzner. [[https://​doi.org/​10.1109/​SFCS.1995.492588|Transforming men into mice (polynomial algorithm for genomic distance problem)]]. In //Proc. 36th Annu. Symp. Found. Comput. Sci.// (FOCS 1995), pages 581-592. IEEE Press, 1995.
   - A. Bergeron. [[https://​doi.org/​10.1007/​3-540-48194-X_9|A very elementary presentation of the Hannenhalli-Pevzner theory]]. In: Proceedings of the 12th Annual Symposium on Combinatorial Pattern Matching (CPM 2001), volume 2089 of LNCS, pages 106-117, 2001.   - A. Bergeron. [[https://​doi.org/​10.1007/​3-540-48194-X_9|A very elementary presentation of the Hannenhalli-Pevzner theory]]. In: Proceedings of the 12th Annual Symposium on Combinatorial Pattern Matching (CPM 2001), volume 2089 of LNCS, pages 106-117, 2001.
   - A. Bergeron, J. Mixtacki, and J. Stoye. The inversion distance problem. In O. Gascuel, editor, [[https://​global.oup.com/​academic/​product/​mathematics-of-evolution-and-phylogeny-9780198566106?​q=9780198566106&​lang=en&​cc=de|Mathematics of Evolution and Phylogeny]],​ chapter 10, pages 262-290. Oxford University Press, Oxford, UK, 2005.    - A. Bergeron, J. Mixtacki, and J. Stoye. The inversion distance problem. In O. Gascuel, editor, [[https://​global.oup.com/​academic/​product/​mathematics-of-evolution-and-phylogeny-9780198566106?​q=9780198566106&​lang=en&​cc=de|Mathematics of Evolution and Phylogeny]],​ chapter 10, pages 262-290. Oxford University Press, Oxford, UK, 2005. 
-  - S. Yancopoulos,​ O. Attie, and R. Friedberg. [[http://dx.doi.org/​10.1093/​bioinformatics/​bti535|Efficient sorting of genomic permutations by translocation,​ inversion and block interchange]]. //​Bioinformatics//​ **21**(16):​3340-3346,​ 2005.  +  - S. Yancopoulos,​ O. Attie, and R. Friedberg. [[https://​doi.org/​10.1093/​bioinformatics/​bti535|Efficient sorting of genomic permutations by translocation,​ inversion and block interchange]]. //​Bioinformatics//​ **21**(16):​3340-3346,​ 2005.  
-  - A. Bergeron, J. Mixtacki, and J. Stoye. [[http://dx.doi.org/​10.1007/​11851561_16|A unifying view of genome rearrangements]]. In: Proceedings of the 6th International Workshop on Algorithms in Bioinformatics (WABI 2006), volume 4175 of LNBI, pages 163-173, 2006.  +  - A. Bergeron, J. Mixtacki, and J. Stoye. [[https://​doi.org/​10.1007/​11851561_16|A unifying view of genome rearrangements]]. In: Proceedings of the 6th International Workshop on Algorithms in Bioinformatics (WABI 2006), volume 4175 of LNBI, pages 163-173, 2006.  
-  - A. Bergeron, J. Mixtacki, and J. Stoye. [[http://dx.doi.org/​10.1089/​cmb.2006.13.567|On sorting by translocations]]. //J. Comp. Biol.// **13**(2):​567-578,​ 2006.  +  - A. Bergeron, J. Mixtacki, and J. Stoye. [[https://​doi.org/​10.1089/​cmb.2006.13.567|On sorting by translocations]]. //J. Comp. Biol.// **13**(2):​567-578,​ 2006.  
-  - A. Bergeron, J. Mixtacki, and J. Stoye. [[http://dx.doi.org/​10.1016/​j.tcs.2009.09.008|A new linear time algorithm to compute the genomic distance via the double cut and join distance]]. //Theor. Comput. Sci.// **410**(51):​5300-5316,​ 2009.  +  - A. Bergeron, J. Mixtacki, and J. Stoye. [[https://​doi.org/​10.1016/​j.tcs.2009.09.008|A new linear time algorithm to compute the genomic distance via the double cut and join distance]]. //Theor. Comput. Sci.// **410**(51):​5300-5316,​ 2009.  
-  - P. L. Erdős, L. Soukup, J. Stoye. [[http://dx.doi.org/​10.1016/​j.aml.2010.08.021|Balanced vertices in trees and a simpler algorithm to compute the genomic distance]]. //Appl. Math. Lett.// **24**:​82-86,​ 2011.  +  - P. L. Erdős, L. Soukup, J. Stoye. [[https://​doi.org/​10.1016/​j.aml.2010.08.021|Balanced vertices in trees and a simpler algorithm to compute the genomic distance]]. //Appl. Math. Lett.// **24**:​82-86,​ 2011.  
-  - E. Tannier, C. Zheng, D. Sankoff. [[http://dx.doi.org/​10.1186/​1471-2105-10-120|Multichromosomal median and halving problems under different genomic distances]]. //BMC Bioinformatics//​ **10**:120, 2009.+  - E. Tannier, C. Zheng, D. Sankoff. [[https://​doi.org/​10.1186/​1471-2105-10-120|Multichromosomal median and halving problems under different genomic distances]]. //BMC Bioinformatics//​ **10**:120, 2009.
  
-==== Comparative genomics III: Synteny Hierarchies and Gene clusters ====+==== Comparative genomics III: Gene clusters ====
 The following are the algorithmic papers in this area. Apart from that, many papers on applications of gene clusters and statistical properties exist, but are not listed here.  The following are the algorithmic papers in this area. Apart from that, many papers on applications of gene clusters and statistical properties exist, but are not listed here. 
  
-  ​- T. Uno and M. Yagiura. [[http://dx.doi.org/​10.1007/​s004539910014|Fast algorithms to enumerate all common intervals of two permutations]]. //​Algorithmica//​ **26**(2):​290-309,​ 2000.  +(a.) Common intervals of permutations:​ 
-  - S. Heber and J. Stoye. [[http://dx.doi.org/​10.1007/​3-540-48194-X_19|Finding all common intervals of k permutations]]. In A. Amir and G. Landau, editors, Proceedings of the 12th Annual Symposium on Combinatorial Pattern Matching, //CPM 2001//, volume 2089 of LNCS, pages 207-218, Berlin, 2001. Springer Verlag.  +  ​- T. Uno and M. Yagiura. [[https://​doi.org/​10.1007/​s004539910014|Fast algorithms to enumerate all common intervals of two permutations]]. //​Algorithmica//​ **26**(2):​290-309,​ 2000.  
-  - S. Heber and J. Stoye. [[http://dx.doi.org/​10.1007/​3-540-44696-6_20|Algorithms for finding gene clusters]]. In O. Gascuel and B. Moret, editors,​Proceedings of the First International Workshop on Algorithms in Bioinformatics,​ //WABI 2001//, volume 2149 of LNCS, pages 252-263, Berlin, 2001. Springer Verlag.  +  - S. Heber and J. Stoye. [[https://​doi.org/​10.1007/​3-540-48194-X_19|Finding all common intervals of k permutations]]. In A. Amir and G. Landau, editors, Proceedings of the 12th Annual Symposium on Combinatorial Pattern Matching, //CPM 2001//, volume 2089 of LNCS, pages 207-218, Berlin, 2001. Springer Verlag.  
-  - A. Bergeron, S. Corteel, and M. Raffinot. [[http://dx.doi.org/​10.1007/​3-540-45784-4_36|The algorithmic of gene teams]]. In R. Guigó and D. Gusfield,​editors,​ Proceedings of the Second International Workshop on Algorithms in Bioinformatics,​ //WABI 2002//, volume 2452 of LNCS, pages 464-476, Berlin, 2002. Springer Verlag.  +  - S. Heber and J. Stoye. [[https://​doi.org/​10.1007/​3-540-44696-6_20|Algorithms for finding gene clusters]]. In O. Gascuel and B. Moret, editors,​Proceedings of the First International Workshop on Algorithms in Bioinformatics,​ //WABI 2001//, volume 2149 of LNCS, pages 252-263, Berlin, 2001. Springer Verlag. 
-  - N. Luc, J.-L. Risler, A. Bergeron, and M. Raffinot. [[http://dx.doi.org/​10.1016/​S1476-9271(02)00097-X|Gene teams: a new formalization of gene clusters for comparative genomics]]. //Comp. Biol. Chem.// **27**:​59-67,​ 2003.  +  - A. Bergeron, S. Corteel, and M. Raffinot. [[https://​doi.org/​10.1007/​3-540-45784-4_36|The algorithmic of gene teams]]. In R. Guigó and D. Gusfield,​editors,​ Proceedings of the Second International Workshop on Algorithms in Bioinformatics,​ //WABI 2002//, volume 2452 of LNCS, pages 464-476, Berlin, 2002. Springer Verlag.  
-  - T. Schmidt and J. Stoye. [[http://dx.doi.org/​10.1007/​978-3-540-27801-6_26|Quadratic time algorithms for finding common intervals in two and more sequences]]. In S. C. Sahinalp, S. Muthukrishnan,​ and U. Dogrusoz, editors, Proceedings of the 15th Annual Symposium on Combinatorial Pattern Matching, //CPM 2004//, volume 3109 of LNCS, pages 347-358, Berlin, 2004. Springer Verlag.  +  - N. Luc, J.-L. Risler, A. Bergeron, and M. Raffinot. [[https://​doi.org/​10.1016/​S1476-9271(02)00097-X|Gene teams: a new formalization of gene clusters for comparative genomics]]. //Comp. Biol. Chem.// **27**:​59-67,​ 2003.  
-  - XHe and MHGoldwasser. [[http://dx.doi.org/10.1089/cmb.2005.12.638|Identifying conserved gene clusters in the presence ​of homology families]]. //J. CompBiol.// **12**(6):638-6562005+  - G. M. Landau, L. Parida, and O. Weimann. [[https://​doi.org/​10.1089/​cmb.2005.12.1289|Gene proximity analysis across whole genomes via PQ tree]]. //J. Comp. Biol.// **12**(10):​1289–1306,​ 2005. 
-  - G. M. Landau, LParida, and OWeimann. {{http://​doi.org/​10.1089/​cmb.2005.12.1289|Gene proximity analysis across whole genomes via PQ tree}}. //J. Comp. Biol.// **12**(10):1289–1306, 2005. +  - A. Bergeron, C. Chauve, F. de Montgolfier,​ and M. Raffinot. [[https://​doi.org/​10.1137/​060651331|Computing common intervals of K permutations,​ with applications to modular decomposition of graphs]]. //SIAM J. Discrete Mathematics//​ **22**(3):​1022–1039,​ 2008. 
-  - ABergeron, C. Chauve, F. de Montgolfier,​ and M. Raffinot. {{http://​doi.org/​10.1137/​060651331|Computing ​common intervals of K permutations,​ with applications to modular decomposition of graphs.}} //SIAM J. Discrete Mathematics//​ **22**(3):1022–1039,​ 2008. +  - S. Heber, R. Mayr, J. Stoye. [[https://​doi.org/​10.1007/​s00453-009-9332-1|Common Intervals of Multiple Permutations]]. //​Algorithmica//​ **60**(2):​175-206,​ 2011. 
-  - S. Böcker, K. Jahn, J. Mixtacki, J. Stoye. [[http://dx.doi.org/​10.1089/​cmb.2009.0098|Computation of Median Gene Clusters]]. //J. Comp. Biol.// **16**(8):​1085-1099,​ 2009. + 
-  - F. Hufsky, L. Kuchenbecker,​ K. Jahn, J. Stoye, S. Böcker. [[http://dx.doi.org/​10.1186/​1471-2105-12-106|Swiftly Computing Center Strings]]. //BMC Bioinformatics//​ **12**:106, 2011+(b.) Common intervals of sequences: 
-  - D. Doerr, J. Stoye, S. Böcker, K. Jahn. [[http://​dx.doi.org/​10.1186/​1471-2164-15-S6-S2|Identifying Gene Clusters by Discovering Common Intervals in Indeterminate Strings]]. //BMC Genomics// **15**(Suppl. 6): S2, 2014.+  - T. Schmidt and J. Stoye. [[https://​doi.org/​10.1007/​978-3-540-27801-6_26|Quadratic time algorithms for finding common intervals in two and more sequences]]. In S. C. Sahinalp, S. Muthukrishnan,​ and U. Dogrusoz, editors, Proceedings of the 15th Annual Symposium on Combinatorial Pattern Matching, //CPM 2004//, volume 3109 of LNCS, pages 347-358, Berlin, 2004. Springer Verlag.  
 +  - GDidier, TSchmidt, JStoye, D. Tsur. [[https://​doi.org/​10.1016/j.jda.2006.03.021|Character Sets of Strings]]. //J. DiscrAlg.// **5**(2):330-3402007
 +  - XHe and M. HGoldwasser[[https://​doi.org/​10.1089/​cmb.2005.12.638|Identifying conserved gene clusters in the presence of homology families]]. //J. Comp. Biol.// **12**(6):638-656, 2005. 
 + 
 +(c.) Approximate ​common intervals of sequences
 +  - S. Böcker, K. Jahn, J. Mixtacki, J. Stoye. [[https://​doi.org/​10.1089/​cmb.2009.0098|Computation of Median Gene Clusters]]. //J. Comp. Biol.// **16**(8):​1085-1099,​ 2009. 
 +  - F. Hufsky, L. Kuchenbecker,​ K. Jahn, J. Stoye, S. Böcker. [[https://​doi.org/​10.1186/​1471-2105-12-106|Swiftly Computing Center Strings]]. //BMC Bioinformatics//​ **12**:106, 2011.
  
 +(d.) Common intervals of indeterminate strings:
 +  - D. Doerr, J. Stoye, S. Böcker, K. Jahn. [[https://​doi.org/​10.1186/​1471-2164-15-S6-S2|Identifying Gene Clusters by Discovering Common Intervals in Indeterminate Strings]]. //BMC Genomics// **15**(Suppl. 6): S2, 2014.
  
 ==== Comparative Genomics IV: Multiple genome rearrangement ==== ==== Comparative Genomics IV: Multiple genome rearrangement ====
Line 248: Line 301:
   - R. Warren, D. Sankoff. [[https://​doi.org/​10.1142/​S0219720009004102|Genome Halving with Double Cut and Join]]. //Journal of Bioinformatics and Computational Biology// **7**(2): 357-371, 2009.   - R. Warren, D. Sankoff. [[https://​doi.org/​10.1142/​S0219720009004102|Genome Halving with Double Cut and Join]]. //Journal of Bioinformatics and Computational Biology// **7**(2): 357-371, 2009.
   - P Feijão, J. Meidanis. [[https://​doi.org/​10.1109/​TCBB.2011.34|SCJ:​ A Breakpoint-Like Distance that Simplifies Several Rearrangement Problems]]. //IEEE/ACM Transactions on Computational Biology and Bioinformatics//​ **6**(3): 1318-1329, 2011.   - P Feijão, J. Meidanis. [[https://​doi.org/​10.1109/​TCBB.2011.34|SCJ:​ A Breakpoint-Like Distance that Simplifies Several Rearrangement Problems]]. //IEEE/ACM Transactions on Computational Biology and Bioinformatics//​ **6**(3): 1318-1329, 2011.
- 
  
 ==== Comparative Genomics V: Reconstruction of ancestral gene clusters ==== ==== Comparative Genomics V: Reconstruction of ancestral gene clusters ====
  
   - R. Wittler. [[https://​pub.uni-bielefeld.de/​publication/​2301939|Phylogeny-based analysis of gene clusters]]. Ph.D. Thesis, Faculty of Technology, Bielefeld University, 2010.    - R. Wittler. [[https://​pub.uni-bielefeld.de/​publication/​2301939|Phylogeny-based analysis of gene clusters]]. Ph.D. Thesis, Faculty of Technology, Bielefeld University, 2010. 
-  - J. Stoye, R. Wittler. [[http://dx.doi.org/​10.1109/​TCBB.2008.135|A Unified Approach for Reconstructing Ancient Gene Clusters]]. //IEEE/ACM Transactions on Computational Biology and Bioinformatics//​ **6**(3):​387-400,​ 2009. +  - J. Stoye, R. Wittler. [[https://​doi.org/​10.1109/​TCBB.2008.135|A Unified Approach for Reconstructing Ancient Gene Clusters]]. //IEEE/ACM Transactions on Computational Biology and Bioinformatics//​ **6**(3):​387-400,​ 2009. 
-  - R. Wittler, J. Maňuch, M. Patterson, J. Stoye. [[http://dx.doi.org/​10.1089/​cmb.2011.0083|Consistency of Sequence-Based Gene Clusters]]. //J. Comp. Biol.// **18**(9):​1023-1039,​ 2011.+  - R. Wittler, J. Maňuch, M. Patterson, J. Stoye. [[https://​doi.org/​10.1089/​cmb.2011.0083|Consistency of Sequence-Based Gene Clusters]]. //J. Comp. Biol.// **18**(9):​1023-1039,​ 2011. 
 ==== Haplotype inference ==== ==== Haplotype inference ====
 A great overview of the combinatorial problems and algorithms in the following book chapter: ​ A great overview of the combinatorial problems and algorithms in the following book chapter: ​
  
-  - D. Gusfield, S. Hecht Orzack. [[http://dx.doi.org/​10.1201/​9781420036275.ch18|Haplotype Inference]]. In: Handbook of Computational Molecular Biology (Chapter 18), edited by Srinivas ​Aluru, Chapman & Hall/CRC Computer and Information Science Series, 2006.+  - D. Gusfield, S. Hecht Orzack. [[https://​doi.org/​10.1201/​9781420036275|Haplotype Inference]]. In: Handbook of Computational Molecular Biology (Chapter 18), edited by S. Aluru, Chapman & Hall/CRC Computer and Information Science Series, 2006.
  
-A more recent ​paper on the topic is:+A more recent ​works on the topic, focussing on molecular haplotyping:
  
   - G. W. Klau, T. Marschall. [[https://​doi.org/​10.1007/​978-3-319-58741-7_6|A guided tour to computational haplotyping]]. In: Proc. of CiE 2017, LNCS 10307, Springer Verlag, 2017.   - G. W. Klau, T. Marschall. [[https://​doi.org/​10.1007/​978-3-319-58741-7_6|A guided tour to computational haplotyping]]. In: Proc. of CiE 2017, LNCS 10307, Springer Verlag, 2017.
 +  - M. Patterson, T. Marschall, N. Pisanti, L. v. Iersel, L. Stougie, G. W. Klau, A. Schönhuth. [[https://​doi.org/​10.1089/​cmb.2014.0157|WhatsHap:​ Weighted Haplotype Assembly for Future-Generation Sequencing Reads]]. //Journal of Computational Biology// **22**(6), 498-509, 2015.
 +
 +The ILP discussed in class is from the following textbook, Section 20.2:
 +
 +  - Dan Gusfield. [[https://​doi.org/​10.1017/​9781108377737|Integer Linear Programming in Computational and Systems Biology]]. Cambridge University Press, 2019.
 +
 ==== SNP-disease associations ==== ==== SNP-disease associations ====
 Here are a few of the more algorithmic papers on the topic, but there exist several more. You may look up the references in this one.  Here are a few of the more algorithmic papers on the topic, but there exist several more. You may look up the references in this one. 
  
-  - H. T. T. Toivonen, P. Onkamo, K. Vasko, V. Ollikainen, P. Sevon, H. Mannila, and J. Kere. [[http://dx.doi.org/​10.1109/​BIBE.2000.889596|Gene mapping by haplotype pattern mining]]. In //IEEE International Symposium on Bio-Informatics and Biomedical Engineering//​ (BIBE'​00):​ 99, 2000.  +  - H. T. T. Toivonen, P. Onkamo, K. Vasko, V. Ollikainen, P. Sevon, H. Mannila, and J. Kere. [[https://​doi.org/​10.1109/​BIBE.2000.889596|Gene mapping by haplotype pattern mining]]. In //IEEE International Symposium on Bio-Informatics and Biomedical Engineering//​ (BIBE'​00):​ 99, 2000.  
-  - T. Mailund, S. Besenbacher,​ and M. Schierup. [[http://dx.doi.org/​10.1186/​1471-2105-7-454|Whole genome association mapping by incompatibilities and local perfect phylogenies]]. //BMC Bioinformatics//​ **7**:454, 2006.  +  - T. Mailund, S. Besenbacher,​ and M. Schierup. [[https://​doi.org/​10.1186/​1471-2105-7-454|Whole genome association mapping by incompatibilities and local perfect phylogenies]]. //BMC Bioinformatics//​ **7**:454, 2006.  
-  - J. Nielsen and T. Mailund. [[http://dx.doi.org/​10.1186/​1471-2105-9-526|SNPFile - A software library and file format for large scale association mapping and population genetics studies]]. //BMC Bioinformatics//​ **9**:526, 2008.  +  - J. Nielsen and T. Mailund. [[https://​doi.org/​10.1186/​1471-2105-9-526|SNPFile - A software library and file format for large scale association mapping and population genetics studies]]. //BMC Bioinformatics//​ **9**:526, 2008.  
-  - S. Besenbacher,​ C. N. S. Pedersen, and T. Mailund. [[http://dx.doi.org/​10.1186/​1471-2105-10-S1-S74|A fast algorithm for genome-wide haplotype pattern mining]]. //BMC Bioinformatics//​ **10**(Suppl 1):S74, 2009. +  - S. Besenbacher,​ C. N. S. Pedersen, and T. Mailund. [[https://​doi.org/​10.1186/​1471-2105-10-S1-S74|A fast algorithm for genome-wide haplotype pattern mining]]. //BMC Bioinformatics//​ **10**(Suppl 1):S74, 2009. 
-  - S. Besenbacher,​ T. Mailand, M. H. Schierup. [[http://dx.doi.org/​10.1007/​978-1-61779-585-5_11|Association Mapping and Disease: Evolutionary Perspectives]]. //​Evolutionary Genomics: Statistical and Computational Methods, Volume 2// Methods in Molecular Biology, vol. 856. Chapter 11, 2012.+  - S. Besenbacher,​ T. Mailand, M. H. Schierup. [[https://​doi.org/​10.1007/​978-1-61779-585-5_11|Association Mapping and Disease: Evolutionary Perspectives]]. //​Evolutionary Genomics: Statistical and Computational Methods, Volume 2// Methods in Molecular Biology, vol. 856. Chapter 11, 2012. 
 ==== Computational metagenomics,​ astrobiology ==== ==== Computational metagenomics,​ astrobiology ====
 Papers on tools for the analysis of metagenomics data: Papers on tools for the analysis of metagenomics data:
  
-  - D. H. Huson, A. F. Auch, J. Qi, S. C. Schuster. [[http://dx.doi.org/​10.1101/​gr.5969107|MEGAN analysis of metagenomic data]]. //Genome Res.// **17**:​377-386,​ 2007. +  - D. H. Huson, A. F. Auch, J. Qi, S. C. Schuster. [[https://​doi.org/​10.1101/​gr.5969107|MEGAN analysis of metagenomic data]]. //Genome Res.// **17**:​377-386,​ 2007. 
-  - Q. Wang, G. M. Garrity, J. M. Tiedje, J. R. Cole. [[http://dx.doi.org/​10.1128/​AEM.00062-07|Naïve Bayesian classifier for rapid assignment of rRNA sequences into the new bacterial taxonomy]]. //Appl. Env. Microbiol.//,​ **73**(16):​5261-5267,​ 2007. +  - Q. Wang, G. M. Garrity, J. M. Tiedje, J. R. Cole. [[https://​doi.org/​10.1128/​AEM.00062-07|Naïve Bayesian classifier for rapid assignment of rRNA sequences into the new bacterial taxonomy]]. //Appl. Env. Microbiol.//,​ **73**(16):​5261-5267,​ 2007. 
-  - F. Meyer, D. Paarmann, M. D'​Souza,​ R. Olson, E. M. Glass, M. Kubal, T. Paczian, A. Rodriguez, R. Stevens, A. Wilke, J. Wilkening, R. A. Edwards. [[http://dx.doi.org/​10.1186/​1471-2105-9-386|The Metagenomics RAST server - A public resource for the automatic phylogenetic and functional analysis of metagenomes]]. //BMC Bioinformatics//​ **9**:386, 2008. +  - F. Meyer, D. Paarmann, M. D'​Souza,​ R. Olson, E. M. Glass, M. Kubal, T. Paczian, A. Rodriguez, R. Stevens, A. Wilke, J. Wilkening, R. A. Edwards. [[https://​doi.org/​10.1186/​1471-2105-9-386|The Metagenomics RAST server - A public resource for the automatic phylogenetic and functional analysis of metagenomes]]. //BMC Bioinformatics//​ **9**:386, 2008. 
-  - L. Krause, N. N. Diaz, A. Goesmann, S. Kelley, T. W. Nattkemper, F. Rohwer, R. A. Edwards, J. Stoye. [[http://dx.doi.org/​10.1093/​nar/​gkn038|Phylogenetic classification of short environmental DNA fragments]]. //Nuleic Acids Res.// **36**(7):​2230-2239,​ 2008. +  - L. Krause, N. N. Diaz, A. Goesmann, S. Kelley, T. W. Nattkemper, F. Rohwer, R. A. Edwards, J. Stoye. [[https://​doi.org/​10.1093/​nar/​gkn038|Phylogenetic classification of short environmental DNA fragments]]. //Nuleic Acids Res.// **36**(7):​2230-2239,​ 2008. 
-  - M. M. Haque, T. S. Ghosh, D. Komanduri, S. S. Mande. [[http://dx.doi.org/​10.1093/​bioinformatics/​btp317|SOrt-ITEMS:​ Sequence orthology based approach for improved taxonomic estimation of metagenomic sequences]]. //​Bioinformatics//​ **25**(14):​1722-1730,​ 2009. +  - M. M. Haque, T. S. Ghosh, D. Komanduri, S. S. Mande. [[https://​doi.org/​10.1093/​bioinformatics/​btp317|SOrt-ITEMS:​ Sequence orthology based approach for improved taxonomic estimation of metagenomic sequences]]. //​Bioinformatics//​ **25**(14):​1722-1730,​ 2009. 
-  - W. Gerlach, S. Jünemann, F. Tille, A. Goesmann, J. Stoye. [[http://dx.doi.org/​10.1186/​1471-2105-10-430|WebCARMA:​ a web application for the functional and taxonomic classification of unassembled metagenomic reads]]. //BMC Bioinformatics//​ **10**:430, 2009. +  - W. Gerlach, S. Jünemann, F. Tille, A. Goesmann, J. Stoye. [[https://​doi.org/​10.1186/​1471-2105-10-430|WebCARMA:​ a web application for the functional and taxonomic classification of unassembled metagenomic reads]]. //BMC Bioinformatics//​ **10**:430, 2009. 
-  - D. E. Wood, S. L. Salzberg. [[http://dx.doi.org/​10.1186/​gb-2014-15-3-r46|Kraken:​ ultrafast metagenomic sequence classification using exact alignments]]. //Genome Biol.// **15**:R46, 2014. +  - D. E. Wood, S. L. Salzberg. [[https://​doi.org/​10.1186/​gb-2014-15-3-r46|Kraken:​ ultrafast metagenomic sequence classification using exact alignments]]. //Genome Biol.// **15**:R46, 2014. 
-  - A. Oulas, C. Pavloudi, P. Polymenakou,​ G. A. Pavlopoulos,​ N. Papanikolaou,​ G. Kotoulas, C. Arvanitidis,​ I. Iliopoulos. [[http://dx.doi.org/​10.4137/​BBI.S12462|Metagenomics:​ Tools and insights for analyzing next-generation sequencing data derived from biodiversity studies.]] //​Bioinformatics and Biology Insights// **9**:​75-88,​ 2015.+  - A. Oulas, C. Pavloudi, P. Polymenakou,​ G. A. Pavlopoulos,​ N. Papanikolaou,​ G. Kotoulas, C. Arvanitidis,​ I. Iliopoulos. [[https://​doi.org/​10.4137/​BBI.S12462|Metagenomics:​ Tools and insights for analyzing next-generation sequencing data derived from biodiversity studies.]] //​Bioinformatics and Biology Insights// **9**:​75-88,​ 2015.
  
 (//​Astrobiology//​ was mainly meant as a joke, but have a look here: [[http://​astrobiology.nasa.gov/​nai/​|NAI]]. Or here: [[https://​www.wired.com/​2010/​12/​nasa-finds-arsenic-life-form|NASA unveils arsenic life form]].) ​ (//​Astrobiology//​ was mainly meant as a joke, but have a look here: [[http://​astrobiology.nasa.gov/​nai/​|NAI]]. Or here: [[https://​www.wired.com/​2010/​12/​nasa-finds-arsenic-life-form|NASA unveils arsenic life form]].) ​