This shows you the differences between two versions of the page.
Both sides previous revision Previous revision | Previous revision | ||
teaching:alggrliterature [2019/02/08 14:33] |
teaching:alggrliterature [2022/11/21 09:57] (current) jstoye [Genome assembly IIb: Hybrid/long read assembly] |
||
---|---|---|---|
Line 1: | Line 1: | ||
+ | ===== References for Algorithms in Genome Research ===== | ||
+ | ==== Overview ==== | ||
+ | Parts of the material is covered by the following two textbooks. Some topics are newer than these books. For most of these, specialized references are given below. | ||
+ | |||
+ | - D. Gusfield. Algorithms on Strings, Trees, and Sequences: Computer Science and Computational Biology. Cambridge University Press, New York, NY, 1997. | ||
+ | - D. W. Mount, editor. Bioinformatics Sequence and Genome Analysis. Cold Spring Harbor Laboratory Press, Cold Spring Harbor, N.Y., 2nd edition, 2004. | ||
+ | |||
+ | ==== Biological data types and formats ==== | ||
+ | The main issues of this section are discussed in Mounts textbook. Some of the major biological sequence databases are: | ||
+ | |||
+ | - Genbank (at NCBI): www.ncbi.nlm.nih.gov/Genbank | ||
+ | - Ensembl Genome Sequence Database: www.ensembl.org | ||
+ | - DDBJ (Japan): www.ddbj.nig.ac.jp | ||
+ | - UniProt (SwissProt and trEMBL): www.uniprot.org | ||
+ | |||
+ | ==== Sequencing technologies and applications ==== | ||
+ | Papers on SOLiD sequencing: | ||
+ | |||
+ | - Voelkerding et al. [[https://doi.org/10.1373/clinchem.2008.112789|Next-generation sequencing: from basic research to diagnostics]]. //Clinical Chemistry// **55**(4):641-658, 2009. | ||
+ | - Rumble et al. [[https://doi.org/10.1371/journal.pcbi.1000386|SHRiMP: Accurate Mapping of Short Color-space Reads]]. //PLOS Computational Biology// **5**(5), 2009. | ||
+ | |||
+ | ==== Physical mapping ==== | ||
+ | This section is mainly based on Chapter 16 of Gusfield's textbook. The Tightest Layout Problem is originally from (Alizadeh//et al.//, 1995). Another reference is (Heber //et al.//, 2000). | ||
+ | |||
+ | - F. Alizadeh, R. M. Karp, L. A. Newberg, and D. K. Weisser. [[https://doi.org/10.1007/BF01188581|Physical mapping of chromosomes: A combinatorial problem in molecular biology]]. //Algorithmica// **13**(1/2):52-76, 1995. | ||
+ | - S. Heber, J. Stoye, M. Frohme, J. Hoheisel, and M. Vingron. [[https://doi.org/10.1089/106652700750050853|Contig selection in physical mapping]]. //J. Comp. Biol.// **7**(3/4):395-408, 2000. | ||
+ | |||
+ | ==== Genome assembly Ia: Overlap-Layout-Consensus - Clone-by-clone and whole-genome shotgun assembly ==== | ||
+ | Lander-Waterman Statistics: | ||
+ | |||
+ | - E.S. Lander and M.S. Waterman, [[https://doi.org/10.1016/0888-7543(88)90007-9|Genomic mapping by fingerprinting random clones: a mathematical analysis]]. //Genomics// **2**:231-239, 1988. | ||
+ | |||
+ | Overlap-Layout-Consensus Assembly: There are many good description available online, just search for them. | ||
+ | |||
+ | ==== Genome assembly Ib: Re-sequencing, comparative (reference-based) assembly ==== | ||
+ | A good introduction to comparative genome assembly is [1]. The main algorithmic challenge is to map millions of (most very short) sequence reads onto one or more referene geneome(s). Suitable mapping algorithms for this task are [[http://bibiserv.cebitec.uni-bielefeld.de/swift/|SWIFT]] [2], [[http://bowtie-bio.sourceforge.net/index.shtml|Bowtie]] [6], ELAND (Cox, unpublished), [[http://maq.sourceforge.net/|MAQ]] [3], [[http://rulai.cshl.edu/rmap/|RMAP]], [[http://soap.genomics.org.cn/|SOAP]] [4], [[http://compbio.cs.toronto.edu/shrimp/|SHRiMP]], SeqMap [5], TAGGER [7], ZOOM [8], [[http://bio-bwa.sourceforge.net/bwa.shtml|BWA]] [9], GSNAP [10], SARUMAN [11], SSAHA2 [12], NextGenMap [13], etc. | ||
+ | |||
+ | - M. Pop, A. Phillippy, A. L. Delcher, and S. L. Salzberg. [[https://doi.org/10.1093/bib/5.3.237|Comparative genome assembly]]. //Briefings in Bioinformatics// **5**(3):237-248, 2004. | ||
+ | - K. Rasmussen, J. Stoye, E. W. Myers. [[https://doi.org/10.1089/cmb.2006.13.296|Efficient q-gram filters for finding all epsilon-matches over a given length]]. //J. Comp. Biol.// **13**(2):296-308, 2006. | ||
+ | - H. Li, J. Ruan, R. Durbin. [[https://doi.org/10.1101/gr.078212.108|Mapping short DNA sequencing reads and calling variants using mapping quality scores]]. //Genome Res.// **18**:1851-1858, 2008. | ||
+ | - R. Li, Y. Li, K. Kristiansen, J. Wang. [[https://doi.org/10.1093/bioinformatics/btn025|SOAP: short oligonucleotide alignment program]]. //Bioinformatics// **24**(5):713-714, 2008. | ||
+ | - H. Jiang, W. H Wong. [[https://doi.org/10.1093/bioinformatics/btn429|SeqMap: mapping massive amount of oligonucleotides to the genome]]. //Bioinformatics// **24**(20):2395-2396, 2008. | ||
+ | - B. Langmead, C. Trapnell, M. Pop, S. Salzberg. [[https://doi.org/10.1186/gb-2009-10-3-r25|Ultrafast and memory-efficient alignment of short DNA sequences to the human genome]]. //Genome Biol.// **10**:R25, 2009. | ||
+ | - C. Iseli, G. Ambrosini, P. Bucher, C. V. Jongeneel. [[https://doi.org/10.1371/journal.pone.0000579|Indexing Strategies for Rapid Searches of Short Words in Genome Sequences]]. //PLoS ONE// **2**(6):e579, 2007. | ||
+ | - H. Lin, Z. Zhang, M. Q. Zhang, B. Ma, M. Li. [[https://doi.org/10.1093/bioinformatics/btn416|ZOOM! Zillions of oligos mapped]]. //Bioinformatics// **24**(21): 2431-2437, 2008. | ||
+ | - H. Li, R. Durbin. [[https://doi.org/10.1093/bioinformatics/btp324|Fast and accurate short read alignment with Burrows–Wheeler transform]]. //Bioinformatics// **25**(14): 1754-1760, 2009. | ||
+ | - T. Wu, S. Nacu. [[https://doi.org/10.1093/bioinformatics/btq057|Fast and SNP-tolerant detection of complex variants and splicing in short reads]]. //Bioinformatics// **26**(7): 873–881, 2010. | ||
+ | - J. Blom, T. Jakobi, D. Doppmeier, S. Jaenicke, J. Kalinowski, J. Stoye, A. Goesmann. [[https://doi.org/10.1093/bioinformatics/btr151|Exact and complete short read alignment to microbial genomes using GPU programming]]. //Bioinformatics// **27**(10): 1351-1358, 2011. | ||
+ | - Z. Ning, A.J. Cox. [[https://doi.org/10.1101/gr.194201|SSAHA: A Fast Search Method for Large DNA Databases]]. //Genome Res.// **11**(10): 1725-1729, 2001. | ||
+ | - F. J. Sedlazeck, P. Rescheneder, A. von Haeseler. [[https://doi.org/10.1093/bioinformatics/btt468|NextGenMap: fast and accurate read mapping in highly polymorphic genomes]]. //Bioinformatics// **29**(21): 2790-2791, 2013. | ||
+ | - L. Oesper, A. Ritz, S. J. Aerni, R. Drebin, B. J. Raphael. [[https://doi.org/10.1186/1471-2105-13-S6-S10|Reconstructing cancer genomes from paired-end sequencing data]]. //BMC Bioinformatics// **13**(Suppl. 6):S10, 2012. | ||
+ | |||
+ | ==== Genome assembly IIa: De-novo assembly of HTS data ==== | ||
+ | These methods are all very similar, using variants of de-Bruijn graphs plus additional tricks: EULER-SR [1], Velvet [2,3], MIRA2 ([[http://www.chevreux.org/projectsmira.html|Link]]), SSAKE [4],VCAKE [5], SHARCGS [6], Medvedev/Brudno [7], ABySS ([[http://www.bcgsc.ca/platform/bioinfo/software/abyss|Link]]), ALLPATHS [8], SOAPdenovo [9], IDBA [10], etc. A very good recent review paper is [11]. | ||
+ | |||
+ | - M. J. Chaisson, P. A. Pevzner. [[https://doi.org/10.1101/gr.7088808|Short read fragment assembly of bacterial genomes]]. //Genome Res.// **18**(2):324-330, 2008. | ||
+ | - D. R. Zerbino, E. Birney. [[https://doi.org/10.1101/gr.074492.107|Velvet: Algorithms for de novo short read assembly using de Bruijn graphs]]. //Genome Res.// **18**(5):821-829, 2008. | ||
+ | - D. R. Zerbino, G. K. McEwen, E. H. Margulies, E. Birney. [[https://doi.org/10.1371/journal.pone.0008407|Pebble and Rock Band: Heuristic Resolution of Repeats and Scaffolding in the Velvet Short-Read de Novo Assembler]]. //PLoS ONE// **4**(12):e8407, 2009. | ||
+ | - R. L. Warren, G. G. Sutton, S. J. M. Jones, R. A. Holt. [[https://doi.org/10.1093/bioinformatics/btl629|Assembling millions of short dna sequences using ssake]]. //Bioinformatics// **23**(4):500-501, 2006. | ||
+ | - W. R. Jeck, J. A. Reinhardt, D. A. Baltrus, M. T. Hickenbotham, V. Magrini, E. R. Mardis, J. L. Dang, C. D. Jones. [[https://doi.org/10.1093/bioinformatics/btm451|Extending assembly of short dna sequences to handle error]]. //Bioinformatics// **23**(21):2942-2944, 2007. | ||
+ | - J. C. Dohm, C. Lottaz, T. Borodina, and H. Himmelbauer. [[https://doi.org/10.1101/gr.6435207|Sharcgs, a fast and highly accurate short-read assembly algorithm for de novo genomic sequencing]]. //Genome Res.// **17**(11):1697-1706, 2007. | ||
+ | - P. Medvedev, M. Brudno. [[https://doi.org/10.1007/978-3-540-78839-3_5|Ab initio whole genome shotgun assembly with mated short reads]]. //Proc. RECOMB 2008// pages 50-64, 2008. | ||
+ | - J. Butler, I. Maccallum, M. Kleber, I. A. Shlyakhter, M. K. Belmonte, E. S. Lander, C. Nusbaum, D. B. Jaffe. [[https://doi.org/10.1101/gr.7337908|Allpaths: De novo assembly of whole-genome shotgun microreads]]. //Genome Res.// **18**(5):810-820, 2008. | ||
+ | - R. Li, H. Zhu, J. Ruan, W. Qian, X. Fang, Z. Shi, Y. Li, S. Li, G. Shan, K. Kristiansen, S. Li, H. Yang, J. Wang, J. Wang. [[https://doi.org/10.1101/gr.097261.109|De novo assembly of human genomes with massively parallel short read sequencing]]. //Genome Res.// **20**(2):265-272, 2010. | ||
+ | - Y. Peng, H. C. M. Leung, S. M. Yiu, F. Y. L. Chin. [[https://doi.org/10.1007/978-3-642-12683-3_28|IDBA – A Practical Iterative de Bruijn Graph De Novo Assembler]]. Proceedings of RECOMB 2010, LNBI 6044, 426-440, 2010. | ||
+ | - J. R. Miller, S. Koren, G. Sutton. [[https://doi.org/10.1016/j.ygeno.2010.03.001|Assembly algorithms for next-generation sequencing data]]. //Genomics// in press, 2010. | ||
+ | - Phillip E. C. Compeau, Pavel A. Pevzner, Glenn Tesler. [[https://doi.org/10.1038/nbt.2023|How to apply de Bruijn graphs to genome assembly]]. //Nature biotechnology//, **29**, 987-991, 2011. | ||
+ | |||
+ | ==== Genome assembly IIb: Hybrid/long read assembly ==== | ||
+ | - A. C. English, S. Richards, Y. Han, M. Wang, V. Vee, J. Qu, X. Qin, D. M. Muzny, J. G. Reid, K. C. Worley, R. A. Gibbs. [[https://doi.org/10.1371/journal.pone.0047768|Mind the Gap: Upgrading Genomes with Pacific Biosciences RS Long-Read Sequencing Technology]]. //PLoS ONE// **7**(11): e47768, 2012. | ||
+ | - S. Koren, M. C. Schatz, B. P. Walenz, J. Martin, J. T. Howard, G. Ganapathy, Z. Wang, D. A. Rasko, W. R. McCombie, E. D. Jarvis, A. M. Phillippy. [[https://doi.org/10.1038/nbt.2280|Hybrid error correction and de novo assembly of single-molecule sequencing reads]]. //Nature Biotechnolopgy// **30**:693-700, 2012. | ||
+ | - C.-S. Chin, D. H. Alexander, P. Marks, A. A. Klammer, J. Drake, C. Heiner, A. Clum, A. Copeland, J. Huddleston, E. E. Eichler, S. W. Turner, J. Korlach. [[https://doi.org/10.1038/nmeth.2474|Nonhybrid, finished microbial genome assemblies from long-read SMRT sequencing data]]. //Nature Methods// **10**:563-569, 2013. | ||
+ | - G. Myers. [[https://doi.org/10.1007/978-3-662-44753-6_5|Efficient Local Alignment Discovery amongst Noisy Long Reads]]. //Proceedings of WABI 2014//, LNBI 8701, 52-67, 2014. | ||
+ | - F. J. Sedlazeck, P. Rescheneder, M. Smolka, H. Fang, M. Nattestad, A. von Haeseler, M. C. Schatz. [[https://doi.org/10.1038/s41592-018-0001-7|Accurate detection of complex structural variations using single molecule sequencing]]. //Nat. Methods// **15**(6): 461–468, 2018. | ||
+ | - E. Haghshenas, H. Asghari, J. Stoye, C. Chauve, F. Hach. [[https://doi.org/10.1016/j.isci.2020.101389|HASLR: Fast Hybrid Assembly of Long Reads]]. //iScience// **23**(8): 101389, 2020. | ||
+ | |||
+ | String graph assembly for diploid genomes with long reads is explained on the following [[https://s3.amazonaws.com/files.pacb.com/pdf/String+Graph+Assembly+For+Diploid+Genomes+with+Long+Reads.pdf|poster by PacBio]]. | ||
+ | |||
+ | Pre-processing (correction of long reads): | ||
+ | |||
+ | - L. Salmela and E. Rivals. [[https://doi.org/10.1093/bioinformatics/btu538|LoRDEC: accurate and efficient long read error correction]]. //Bioinformatics// **30**(24):3506–3514, 2014. | ||
+ | - L. Salmela, R. Walve, E. Rivals and E. Ukkonen. [[https://doi.org/10.1093/bioinformatics/btw321|Accurate self-correction of errors in long reads using de Bruijn graphs]]. //Bioinformatics//, **33**(6):799–806, 2017. | ||
+ | |||
+ | ==== Genome annotation I: Basic techniques ==== | ||
+ | A general introduction to HMMs in Bioinformatics is in the textbook by Durbin//et al.// [1]. Covariance models were introduced in [2]. A similar concept was developed in [3]. | ||
+ | |||
+ | - R. Durbin, S. Eddy, A. Krogh, G. Mitchison. Biological Sequence Analysis: Probabilistic Models of Proteins and Nucleic Acids. Cambridge University Press, Cambridge, UK, 1998. | ||
+ | - S. R. Eddy, R. Durbin. [[https://doi.org/10.1093/nar/22.11.2079|RNA sequence analysis using covariance models]]. //Nucleic Acids Res.// **22**(11):2079-2088, 1994. | ||
+ | - Y. Sakakibara, M. Brown, R. Hughey, I. S. Mian, K. Sjølander, R. C. Underwood, D. Haussler. [[https://doi.org/10.1093/nar/22.23.5112|Stochastic context-free grammars for tRNA modeling]]. //Nucleic Acids Res.// **22**(23):5112-5120, 1994. | ||
+ | |||
+ | ==== Genome annotation II: Finding RNA genes ==== | ||
+ | Here are several references, in chronological order: | ||
+ | |||
+ | - R. Staden. [[http://www.pubmedcentral.nih.gov/picrender.fcgi?artid=327313&blobtype=pdf|A computer program to search for tRNA genes]]. //Nucleic Acids Res.// **8**:817-825, 1980. | ||
+ | - G. A. Fichant and C. Burks. [[https://doi.org/10.1016/0022-2836(91)90108-I|Identifying potential tRNA genes in genomic DNA sequences]]. //J. Mol. Biol.// **220**:659-671, 1991. | ||
+ | - A. Pavesi, F. Conterio, A. Bolchi, G. Dieci, and S. Ottonello. [[https://doi.org/10.1093/nar/22.7.1247|Identification of new eukaryotic tRNA genes in genomic DNA databases by a multistep weight matrix analysis of transcriptional control regions]]. //Nucleic Acids Res.// **22**:1247-1256, 1994. | ||
+ | - S. R. Eddy and R. Durbin. [[https://doi.org/10.1093/nar/22.11.2079|RNA sequence analysis using covariance models]]. //Nucleic Acids Res.// **22**(11):2079-2088, 1994. | ||
+ | - T. M. Lowe and S. R. Eddy. [[https://doi.org/10.1093/nar/25.5.0955|tRNAscan-SE: A program for improved detection of transfer RNA genes in genomic sequence]]. //Nucleic Acids Res.// **25**:955-964, 1997. | ||
+ | |||
+ | ==== Genome annotation III: Finding protein-coding genes ==== | ||
+ | Here are several references, in chronological order. First, for **prokaryotes** ... | ||
+ | |||
+ | - M. Borodovsky and J. McIninch. [[https://doi.org/10.1016/0097-8485(93)85004-V|Genmark: Parallel gene recognition for both DNA strands]]. //Computers Chem.// **17**(2):123-133, 1993. | ||
+ | - A. V. Lukashin and M. Borodovsky. [[https://doi.org/10.1093/nar/26.4.1107|GeneMark.hmm: New solutions for gene finding]]. //Nucleic Acids Res.// **26**(4):1107-1115, 1998. | ||
+ | - S. L. Salzberg, A. L. Delcher, S. Kasif, and O. White. [[https://doi.org/10.1093/nar/26.2.544|Microbial gene identification using interpolated Markov models]]. //Nucleic Acids Res.// **26**(2):544-548, 1998. | ||
+ | - A. L. Delcher, D. Harmon, S. Kasif, O. White, and S. L. Salzberg. [[https://doi.org/10.1093/nar/27.23.4636|Improved microbial gene identification with GLIMMER]]. //Nucleic Acids Res.// **27**(23):4636-3641, 1999. | ||
+ | - D. Frishman, A. Mironov, H.-W. Mewes, and M. Gelfand. [[https://doi.org/10.1093/nar/26.12.2941|Combining diverse evidence for gene recognition in completely sequenced bacterial genomes]]. //Nucleic Acids Res.// **26**(12):2941-2947, 1998. | ||
+ | - S. S. Hannenhalli, W. S. Hayes, A. G. Hatzigeorgiou, and J. W. Fickett. [[https://doi.org/10.1093/nar/27.17.3577|Bacterial start site prediction]]. //Nucleic Acids Res.// **27**(17):3577-3582, 1999. | ||
+ | - J. H. Badger and G. J. Olsen. [[http://mbe.oxfordjournals.org/content/16/4/512|CRITICA: Coding region identification tool invoking comparative analysis]]. //Mol. Biol. Evol.// **16**(4):512-524, 1999. | ||
+ | - T. S. Larsen and A. Krogh. [[https://doi.org/10.1186/1471-2105-4-21|EasyGene a prokaryotic gene finder that ranks ORFs by statistical significance]]. //BMC Bioinformatics// **4**(21), 2003. | ||
+ | - L. Krause, A. McHardy, T. Nattkemper, A. Pühler, J. Stoye, F. Meyer. [[https://doi.org/10.1093/nar/gkl1083|GISMO - gene identification using a support vector machine for ORF classification]]. //Nucleic Acids Res.// **35**(2):540-549, 2006. | ||
+ | |||
+ | ... and now for **eukaryotes**: | ||
+ | |||
+ | - G. Stormo. [[https://doi.org/10.1016/0076-6879(90)83015-2|Consensus patterns in DNA]]. In R. F. Doolittle, editor, Molecular Evolution: Computer Analysis of Protein and Nucleic Acid Sequences, //Meth. Enzymol.// **volume 183** chapter 13, pages 211-221. Academic Press, San Diego, CA, 1990. | ||
+ | - R. Guigó, S. Knudsen, N. Drake, and T. Smith. [[https://doi.org/10.1016/0022-2836(92)90130-C|Prediction of gene structure]]. //J. Mol. Biol.// **226**(1):141-157, 1992. | ||
+ | - E. C. Uberbacher and R. J. Mural. [[http://www.pnas.org/content/88/24/11261|Locating protein-coding regions in human DNA sequences by a multiple sensor-neural network approach]]. //Proc. Natl. Acad. Sci. USA// **88**(24):11261-11265, 1991. | ||
+ | - E. C. Uberbacher, Y. Xu, and R. J. Mural. [[https://doi.org/10.1016/S0076-6879(96)66018-2|Discovering and understanding genes in human DNA sequence using GRAIL]]. In R. F. Doolittle, editor, Computer Methods for Macromolecular Sequence Analysis, //Meth. Enzymol.// **volume 266** chapter 28, pages 259-281. Academic Press, San Diego, CA, 1996. | ||
+ | - S. L. Salzberg. [[https://doi.org/10.1093/bioinformatics/13.4.365|A method for identifying splice sites and translation start sites in eukaryotic mRNA]]. //CABIOS// **13**(4):365-376, 1997. | ||
+ | - M. G. Reese, F. H. Eeckman, D. Kulp, and D. Haussler. [[https://doi.org/10.1089/cmb.1997.4.311|Improved splice site detection in Genie]]. //J. Comp. Biol.// **4**(3):311-323, 1997. | ||
+ | - C. Burge and S. Karlin. [[https://doi.org/10.1006/jmbi.1997.0951|Prediction of complete gene structures in human genomic DNA]]. //J. Mol. Biol.// **268**(1):78-94, 1997. | ||
+ | - G. Parra, E. Blanco, and R. Guigó. [[https://doi.org/10.1101/gr.10.4.511|GeneID in Drosophila]]. //Genome Res.// **10**(4):511-515, 2000. | ||
+ | - M. G. Reese, D. Kulp, H. Tammana, and D. Haussler. [[https://doi.org/10.1101/gr.10.4.529|Genie - gene finding in Drosophila melanogaster]]. //Genome Res.// **10**:529-538, 2000. | ||
+ | - M. Pertea and S. L. Salzberg. [[https://doi.org/10.1023/A:1013770123580|Computational gene finding in plants]]. //Plant Mol. Biol.// **48**(1-2):39-48, 2002. | ||
+ | |||
+ | ==== Genome annotation IV: Patterns in non-coding regions ==== | ||
+ | This part is almost completely based on the work of Mathieu Blanchette and co-authors: | ||
+ | |||
+ | - M. Blanchette, B. Schwikowski, and M. Tompa. [[https://doi.org/10.1089/10665270252935421|Algorithms for phylogenetic footprinting]]. //J. Comp. Biol.// **9**(2):211-223, 2002. | ||
+ | - M. Blanchette and M. Tompa. [[https://doi.org/10.1101/gr.6902|Discovery of regulatory elements by a computational method for phylogenetic footprinting]]. //Genome Res.// **12**:739-748, 2002. | ||
+ | |||
+ | ==== Genome annotation V: Functional genome annotation ==== | ||
+ | All bioinformatics textbooks contain good overviews of sequence analysis by Smith-Waterman and fast heuristic methods for database search like FASTA and BLAST. Here are some references to the original papers, including one by ourselves: | ||
+ | |||
+ | - T. F. Smith and M. S. Waterman. [[https://doi.org/10.1016/0022-2836(81)90087-5|Identification of common molecular subsequences]]. //J. Mol. Biol.// **147**(1):195-197, 1981. | ||
+ | - W. R. Pearson. [[https://doi.org/10.1016/0076-6879(90)83007-V|Rapid and sensitive sequence comparison with FASTP and FASTA]]. In R. F. Doolittle, editor, Molecular Evolution: Computer Analysis of Protein and Nucleic Acid Sequences, //Meth. Enzymol.// **volume 183** chapter 5, pages 63-98. Academic Press, San Diego, CA, 1990. | ||
+ | - S. F. Altschul, W. Gish, W. Miller, E. W. Myers, and D. J. Lipman. [[https://doi.org/10.1016/S0022-2836(05)80360-2|Basic local alignment search tool]]. //J. Mol. Biol.// **215**(3):403-410, 1990. | ||
+ | - S. F. Altschul, T. L. Madden, A. A. Schaffer, J. Zhang, Z. Zhang, W. Miller, and D. J. Lipman. [[https://doi.org/10.1093/nar/25.17.3389|Gapped BLAST and PSI-BLAST: A new generation of protein database search programs]]. //Nucleic Acids Res.// **25**(17):3389-3402, 1997. | ||
+ | - K. Rasmussen, J. Stoye, and E. W. Myers. [[https://doi.org/10.1089/cmb.2006.13.296|Efficient q-gram filters for finding all epsilon-matches over a given length]]. //J. Comp. Biol.// **13**(2):296-308, 2006. | ||
+ | |||
+ | ==== Transcriptomics I: mRNA analysis, RNA-Seq ==== | ||
+ | A few papers on EST clustering and splicing graphs: | ||
+ | |||
+ | - S. Heber, M. Alekseyev, S.-H. Sze, H. Tang, P. A. Pevzner. [[https://doi.org/10.1093/bioinformatics/18.suppl_1.S181 | ||
+ | |Splicing graphs and EST assembly problem]]. //Bioinformatics// **18**(Suppl. 1):S181-S188, 2002. | ||
+ | - M. Sammeth, S. Foissac, R. Guigó. [[https://doi.org/10.1371/journal.pcbi.1000147|A general definition and nomenclature for alternative splicing events]]. //PLoS Comput. Biol.// **4**(8):e1000147, 2008. | ||
+ | - V. Lacroix, M. Sammeth, R. Guigó, A. Bergeron. [[https://doi.org/10.1007/978-3-540-87361-7_5|Exact Transcriptome Reconstruction from Short Sequence reads]]. Proc. WABi 2008, LNBI 5251, 50-63, 2008. | ||
+ | |||
+ | Very good review about NGS transcriptomics (RNA-seq): | ||
+ | |||
+ | - J. A. Martin, Z. Wang. [[https://doi.org/10.1038/nrg3068|Next-generation transcriptome assembly]]. Nat. Rev. Genet., **12**(10), 671–682, 2011. | ||
+ | |||
+ | Examples for software supporting RNA-seq: | ||
+ | |||
+ | - C. Trapnell, L. Pachter, S. Salzberg. [[https://doi.org/10.1093/bioinformatics/btp120|TopHat: discovering splice junctions with RNA-Seq]]. //Bioinformatics// **25**(9): 1105-1111, 2009. | ||
+ | - K. Wang, D. Singh, Z. Zeng, S. J. Coleman, Y. Huang, G. L. Savich, X. He, P. Mieczkowski, S. A. Grimm, C. M. Perou, J. N. MacLeod, D. Y. Chiang, J. F. Prins, J. Liu. [[https://doi.org/10.1093/nar/gkq622|MapSplice: Accurate mapping of RNA-seq reads for splice junction discovery]]. //Nucleic Acids Res.// **38**(18): e178, 2010. | ||
+ | |||
+ | ==== Tanscriptomics II: DNA microarray design ==== | ||
+ | Just a few in-house papers, because they are so nice: | ||
+ | |||
+ | - S. Rahmann. [[https://doi.org/10.1142/S0219720003000125|Fast large scale oligonucleotide selection using the longest common factor approach]]. //J. Bioinf. Comput. Biol.// **1**(2):343-361, 2003. | ||
+ | - S. Rahmann. [[https://doi.org/10.1093/bioinformatics/btg1073|The shortest common supersequence problem in a microarray production setting]]. //Bioinformatics// **19**(Suppl. 2):156-161, 2003. (Proceedings of ECCB 2003). | ||
+ | - S.A. de Carvalho Jr. and S. Rahmann. [[https://doi.org/10.1007/11851561_30|Improving the layout of oligonucleotide microarrays: Pivot Partitioning]]. In Proc. 6th Workshop of Algorithms in Bioinformatics //LNBI// volume **4175** pages 321-332. Springer, 2006. | ||
+ | - S.A. de Carvalho Jr. and S. Rahmann. [[http://subs.emis.de/LNI/Proceedings/Proceedings83/article7005.html|Microarray layout as a quadratic assignment problem]]. In Proc. German Conference on Bioinformatics, volume P-83 of Lecture Notes in Informatics, pages 11-20. GI, 2006. | ||
+ | |||
+ | ==== Tanscriptomics III: DNA microarray analysis ==== | ||
+ | A couple of papers and one book: | ||
+ | |||
+ | - W. Huber, A. von Heydebreck, H. Sueltmann, A. Poustka, and M. Vingron. [[https://doi.org/10.1093/bioinformatics/18.suppl_1.S96|Variance stabilization applied to microarray data calibration and to the quantification of differential expression]]. //Bioinformatics// **18**(Suppl. 1):96-104, 2002. | ||
+ | - T. P. Speed, editor. [[https://www.crcpress.com/Statistical-Analysis-of-Gene-Expression-Microarray-Data/Speed/p/book/9781584883272|Statistical Analysis of Gene Expression Microarray Data]]. Chapman and Hall/CRC, 2003. | ||
+ | - G. K. Smyth. [[https://doi.org/10.2202/1544-6115.1027|Linear models and empirical bayes methods for assessing differential expression in microarray experiments]]. //Stat. Appl. Genet. Mol. Biol.// **3**(1):1-25, 2004. | ||
+ | - C. Steinhoff and M. Vingron. [[https://doi.org/10.1093/bib/bbl002|Normalization and quantification of differential expression in gene expression microarrays]]. //Brief. Bioinform.// **7**(2):166-177, 2006. | ||
+ | |||
+ | ==== RNA secondary structure, structure space, RNA-based regulation ==== | ||
+ | The most classical algorithm for RNA secondary structure prediction is Nussinov's algorithm: | ||
+ | |||
+ | - R. Nussinov, G. Pieczenik, J. R. Griggs, D. J. Kleitman. [[https://doi.org/10.1137/0135006|Algorithms for Loop Matchings]]. //SIAM J. Appl. Math.// **35**(1):68–82, 1978. | ||
+ | - R. Giegerich, B. Voß, M. Rehmsmeier. [[https://doi.org/10.1093/nar/gkh779|Abstract shapes of RNA]]. //Nucleic Acids Res.// **32**(16):4843-4851, 2004. | ||
+ | |||
+ | ==== Proteomics I: 2D gels ==== | ||
+ | A survey of various techniques for two-dimensional gel alignment is [1]. The method described in the lecture is from [2], [3]. | ||
+ | |||
+ | - A. W. Dowsey, M. J. Dunn, and G.-Z. Yang. [[https://doi.org/10.1002/pmic.200300459|The role of bioinformatics in two-dimensional gel electrophoresis]]. //Proteomics// **3**(8):1567-1596, 2003. | ||
+ | - F. A. Potra, X. Liu, F. Seillier-Moiseiwitsch, A. Roy, Y. Hang, M. R. Marten, B. Raman, and C. Whisnant. [[https://doi.org/10.1089/cmb.2006.13.614|Protein image alignment via piecewise affine transformations]]. //J. Comp. Biol.// **13**(3):614-630, 2006. | ||
+ | - F. A. Potra, X. Liu. [[https://doi.org/10.1089/cmb.2006.13.1384|Aligning Families of Two-Dimensional Gels by a Combined Multiresolution Forward-Inverse Transformation Approach]]. //J. Comp. Biol.// **13**(7):1384-1395, 2006. | ||
+ | - Andrew W. Dowsey, Michael J. Dunn and Guang-Zhong Yang. [[https://doi.org/10.1093/bioinformatics/btn059|Automated image alignment for 2D gel electrophoresis in a high-throughput proteomics pipeline]]. //Bioinformatics// **24**(7):950-957, 2008. | ||
+ | |||
+ | ==== Proteomics II: Analysis of mass spectra ==== | ||
+ | The algorithms discussed include: de-novo protein sequencing by mass spectra [1,2], the money changing problem [3], alignment of time-series of mass spectra [4]. | ||
+ | |||
+ | - T Chen, Ming-Yang Kao, M Tepel, J Rush, and G M Church. [[https://doi.org/10.1089/10665270152530872|A dynamic programming approach to de novo peptide sequencing via tandem mass spectrometry]]. //J. Comput. Biol.// **8**(3):325-337, 2001. | ||
+ | - Lijuan Mo, Debojyoti Dutta, Yunhu Wan, and Ting Chen. [[https://doi.org/10.1021/ac070039n|MSNovo: A Dynamic Programming Algorithm for de Novo Peptide Sequencing via Tandem Mass Spectrometry]]. //Anal. Chem.// **79**(13): 4870–4878, 2007. | ||
+ | - Sebastian Böcker and Zsuzsanna Lipták. [[https://doi.org/10.1007/s00453-007-0162-8|A fast and simple algorithm for the money changing problem]]. //Algorithmica// **48**(4):413-432, 2007. | ||
+ | - Amol Prakash, Parag Mallick, Jeffrey Whiteaker, Heidi Zhang, Amanda Paulovich, Mark Flory, Hookeun Lee, Ruedi Aebersold, and Benno Schwikowski. [[https://doi.org/10.1074/mcp.M500133-MCP200|Signal maps for mass spectrometry-based comparative proteomics]]. //Mol Cell Proteomics// **5**(3):423-432, 2006. | ||
+ | |||
+ | ==== Computational Metabolomics ==== | ||
+ | Some software developed in Bielefeld has been published here: | ||
+ | |||
+ | - H. Neuweger, S. P. Albaum, M. Dondrup, M. Persicke, T. Watt, K. Niehaus, J. Stoye, A. Goesmann. [[https://doi.org/10.1093/bioinformatics/btn452|MeltDB: A software Platform for the Analysis and Integration of Metabolomics Experiment Data]]. //Bioinformatics// **24**(23):2726-2732, 2008. | ||
+ | - N. Hoffmann, J. Stoye. [[https://doi.org/10.5772/31224|Generic Software Frameworks for GC-MS Based Metabolomics]]. In: U. Roessner (ed.): Metabolomics. Chapter 4, pp. 73-98. InTech, 2012. | ||
+ | |||
+ | ==== Computational Systems Biology: Networks, their models and algorithms ==== | ||
+ | A textbook that covers this (and much more) is: | ||
+ | |||
+ | - E. Klipp, R. Herwig, A. Kowald, C. Wierling, H. Lehrach. [[https://doi.org/10.1002/3527603603|Systems Biology in Practice - Concepts, Implementation and Application]]. Wiley-VCH, 2005. | ||
+ | |||
+ | ==== Computational pangenomics ==== | ||
+ | The gene based method is considered here (for example): | ||
+ | |||
+ | - H. Tettelin et al. [[https://doi.org/10.1073/pnas.0506758102|Genome analysis of multiple pathogenic isolates of Streptococcus agalactiae: Implicationsfor the microbial ‘‘pan-genome’’]]. //Proc. Natl. Academy. Sci. USA// **102**(39): 13950-13955, 2005. | ||
+ | - J. Blom, S. P. Albaum, D. Doppmeier, A. Pühler, F.-J. Vorhölter, M. Zakrzewski, and A. Goesmann. [[https://doi.org/10.1186/1471-2105-10-154|EDGAR: A software framework for the comparative analysis of prokaryotic genomes]]. //BMC Bioinformatics// 10:154, 2009. | ||
+ | - J. Blom, J. Kreis, S. Spänig, T. Juhre, C. Bertelli, C. Ernst, and A. Goesmann. [[https://doi.org/10.1093/nar/gkw255|EDGAR 2.0: an enhanced software platform for comparative gene content analyses]]. //Nucleic Acids Res.// **44**(W1):W22–W28, 2016. | ||
+ | - J. Blom, S. P. Glaeser, T. Juhre, J. Kreis, P. H. G. Hanel, J. G. Schrader, P. Kämpfer, and A. Goesmann. [[https://doi.org/10.1002/9781118960608.bm00038|EDGAR: A Versatile Tool for Phylogenomics]]. In: W. B. Whitman (ed.). Bergey's Manual of Systematics of Archaea and Bacteria, Wiley, 2019. | ||
+ | |||
+ | A good overview of genome-based computational pangenomics gives the following review paper: | ||
+ | |||
+ | - The Computational Pan-Genomics Consortium. [[https://doi.org/10.1093/bib/bbw089|Computational pan-genomics: status, promises and challenges]]. //Brief. Bioinf.// **19**(1), 118–135, 2018. | ||
+ | |||
+ | Some more specialized papers are the following. | ||
+ | |||
+ | (A) Data structures | ||
+ | - B. Paten, D. Earl, N. Nguyen, M. Diekhans, D. Zerbino, D. Haussler. [[https://doi.org/10.1101/gr.123356.111|Cactus: Algorithms for genome multiple sequence alignment]]. //Genome Research// **21**, 1512–1528, 2011 | ||
+ | - C. Ernst, S. Rahmann. [[https://drops.dagstuhl.de/opus/volltexte/2013/4231/pdf/p035-ernst.pdf|PanCake: A Data Structure for Pangenomes]]. Proc. of //GCB 2013//, 35-45, 2013. | ||
+ | - G. Holley, R. Wittler, and J. Stoye. [[https://doi.org/10.1186/s13015-016-0066-8 |Bloom Filter Trie: an alignment-free and reference-free data structure for pan-genome storage]]. //Algorithms Mol. Biol.// **11**: 3, 2016. | ||
+ | - E. Garrison, J. Sirén, A. M. Novak, G. Hickey, J. M. Eizenga, E. T. Dawson, W. Jones, S. Garg, C. Markello, M. F Lin, B. Paten, and R. Durbin. [[https://doi.org/10.1038/nbt.4227|Variation graph toolkit improves read mapping by representing genetic variation in the reference]]. //Nat. Biotechnol.// **36**, 875–879, 2018. | ||
+ | - G. Holley and P. Melsted. [[https://doi.org/10.1186/s13059-020-02135-8|Bifrost: highly parallel construction and indexing of colored and compacted de Bruijn graphs]]. //Genome Biol.// **21**: 249, 2020. | ||
+ | |||
+ | (B) Sequence-to-graph mapping/alignment | ||
+ | - M. Rautiainen, V. Mäkinen, and T. Marschall. [[https://doi.org/10.1093/bioinformatics/btz162|Bit-parallel sequence-to-graph alignment]]. //Bioinformatics// **35**(19), 3599-3607, 2019. | ||
+ | - R. Martiniano, E. Garrison, E. R. Jones, A. Manica, and R. Durbin. [[ https://doi.org/10.1186/s13059-020-02160-7|Removing reference bias and improving indel calling in ancient DNA data analysis by mapping to a sequence variation graph]]. //Genome Biol.// **21**: 250, 2020. | ||
+ | - M. Rautiainen and T. Marschall. [[https://doi.org/10.1186/s13059-020-02157-2|GraphAligner: rapid and versatile sequence-to-graph alignment]]. //Genome Biol.// **21**: 253, 2020. | ||
+ | - A. Kuhnle, T. Mun, C. Boucher, T. Gagie, B. Langmead, and G. Manzini. [[https://doi.org/10.1089/cmb.2019.0309|Efficient Construction of a Complete Index for Pan-Genomics Read Alignment]]. //J. Comp. Biol.// **27**(4), 500-513, 2020. | ||
+ | - N. Luhmann, G. Holley, and M. Achtman. [[https://doi.org/10.1101/2020.01.21.914168|BlastFrost: Fast querying of 100,000s of bacterial genomes in Bifrost graphs]]. //BioRxiv//, 2020. | ||
+ | - T. Schulz, R. Wittler, S. Rahmann, F. Hach, and J. Stoye. [[https://doi.org/10.1093/bioinformatics/btab077|Detecting High Scoring Local Alignments in Pangenome Graphs]]. //Bioinformatics// **37**(16), 2266–2274, 2021. | ||
+ | |||
+ | (C) Phylogenomics: | ||
+ | |||
+ | - R. Wittler. [[https://doi.org/10.1186/s13015-020-00164-3|Alignment- and reference-free phylogenomics with colored de Bruijn graphs]]. //Algorithms Mol. Biol.// **15**: 4, 2020. | ||
+ | - A. Rempel, R. Wittler. [[https://doi.org/10.1093/bioinformatics/btab444|SANS serif: alignment-free, whole-genome-based phylogenetic reconstruction]]. //Bioinformatics// **37**(24), 4868-4870, 2021. | ||
+ | |||
+ | (D) Haplotype inference: | ||
+ | |||
+ | See [[#haplotype_inference|below]]. | ||
+ | |||
+ | ==== Comparative genomics I: Genome alignment, repeat analysis ==== | ||
+ | Some relevant papers in this area are: | ||
+ | |||
+ | - A. L. Delcher, S. Kasif, R. D. Fleischmann, J. Peterson, O. White, and S. L. Salzberg. [[https://doi.org/10.1093/nar/27.11.2369|Alignment of whole genomes]]. //Nucleic Acids Res.// **27**(11):2369-2376, 1999. | ||
+ | - S. Schwartz, Z. Zhang, K. A. Fraser, A. Smit, C. Riemer, J. Bouck, R. Gibson, R. Hardisson, and W. Miller. [[https://doi.org/10.1101/gr.10.4.577|PipMaker-A web server for aligning two genomic DNA sequences]]. //Genome Res.// **10**(4):577-586, 2000. | ||
+ | - M. Höhl, S. Kurtz, and E. Ohlebusch. [[https://doi.org/10.1093/bioinformatics/18.suppl_1.S312|Efficient multiple genome alignment]]. //Bioinformatics// **18**(Suppl. 1):S312-S320, 2002. (Proceedings of ISMB 2002). | ||
+ | - S. Kurtz, A. Phillippy, A. L. Delcher, M. Smoot, Shumway M., C. Antonescu, and S. L. Salzberg. [[https://doi.org/10.1186/gb-2004-5-2-r12|Versatile and open software for comparing large genomes]]. //Genome Biol.// **5**:R 12, 2004. | ||
+ | - A. C. E. Darling, B. Mau, F. R. Blattner, and N. T. Perna. [[https://doi.org/10.1101/gr.2289704|Mauve: Multiple alignment of conserved genomic sequence with rearrangements]]. //Genome Res.// **14**(7):1394-1403, 2004. | ||
+ | - E. Ohlebusch, M. I. Abouelhoda. [[https://doi.org/10.1201/9781420036275.ch15|Chaining Algorithms and Applications in Comparative Genomics]]. In: Handbook of Computational Molecular Biology (Chapter 15), edited by S. Aluru, Chapman & Hall/CRC Computer and Information Science Series, 2006. | ||
+ | - S. Kurtz, J. V. Choudhuri, E. Ohlebusch, C. Schleiermacher, J. Stoye, and R. Giegerich. [[https://doi.org/10.1093/nar/29.22.4633|REPuter: the manifold applications of repeat analysis on a genomic scale]]. //Nucleic Acids Res.// **29**(22):4643-4653, 2001. | ||
+ | |||
+ | ==== Comparative genomics II: Genome rearrangement ==== | ||
+ | The classical papers are by Hannenhalli and Pevzner (1999 and 1995). A much better readable description for the reversal distance can be found in the book chapter by Bergeron //et al.// (2005). The general DCJ model is described in (Yancopoulos //et al.//, 2005) and (much nicer) in (Bergeron //et al.//, 2006a). A correct algorithm for sorting by translocations is given in (Bergeron //et al.//, 2006b), an almost correct one for the HP distance in (Bergeron //et al.//, 2009), one formula of which is corrected in (Erdős et al., 2011). A very good overview of median, halving and guided halving results can be found in (Tannier et al., 2009). | ||
+ | |||
+ | - S. Hannenhalli and P. A. Pevzner. [[https://doi.org/10.1145/300515.300516|Transforming cabbage into turnip: Polynomial algorithm for sorting signed permutations by reversals]]. //J. ACM// **46**(1):127, 1999. | ||
+ | - S. Hannenhalli and P. A. Pevzner. [[https://doi.org/10.1109/SFCS.1995.492588|Transforming men into mice (polynomial algorithm for genomic distance problem)]]. In //Proc. 36th Annu. Symp. Found. Comput. Sci.// (FOCS 1995), pages 581-592. IEEE Press, 1995. | ||
+ | - A. Bergeron. [[https://doi.org/10.1007/3-540-48194-X_9|A very elementary presentation of the Hannenhalli-Pevzner theory]]. In: Proceedings of the 12th Annual Symposium on Combinatorial Pattern Matching (CPM 2001), volume 2089 of LNCS, pages 106-117, 2001. | ||
+ | - A. Bergeron, J. Mixtacki, and J. Stoye. The inversion distance problem. In O. Gascuel, editor, [[https://global.oup.com/academic/product/mathematics-of-evolution-and-phylogeny-9780198566106?q=9780198566106&lang=en&cc=de|Mathematics of Evolution and Phylogeny]], chapter 10, pages 262-290. Oxford University Press, Oxford, UK, 2005. | ||
+ | - S. Yancopoulos, O. Attie, and R. Friedberg. [[https://doi.org/10.1093/bioinformatics/bti535|Efficient sorting of genomic permutations by translocation, inversion and block interchange]]. //Bioinformatics// **21**(16):3340-3346, 2005. | ||
+ | - A. Bergeron, J. Mixtacki, and J. Stoye. [[https://doi.org/10.1007/11851561_16|A unifying view of genome rearrangements]]. In: Proceedings of the 6th International Workshop on Algorithms in Bioinformatics (WABI 2006), volume 4175 of LNBI, pages 163-173, 2006. | ||
+ | - A. Bergeron, J. Mixtacki, and J. Stoye. [[https://doi.org/10.1089/cmb.2006.13.567|On sorting by translocations]]. //J. Comp. Biol.// **13**(2):567-578, 2006. | ||
+ | - A. Bergeron, J. Mixtacki, and J. Stoye. [[https://doi.org/10.1016/j.tcs.2009.09.008|A new linear time algorithm to compute the genomic distance via the double cut and join distance]]. //Theor. Comput. Sci.// **410**(51):5300-5316, 2009. | ||
+ | - P. L. Erdős, L. Soukup, J. Stoye. [[https://doi.org/10.1016/j.aml.2010.08.021|Balanced vertices in trees and a simpler algorithm to compute the genomic distance]]. //Appl. Math. Lett.// **24**:82-86, 2011. | ||
+ | - E. Tannier, C. Zheng, D. Sankoff. [[https://doi.org/10.1186/1471-2105-10-120|Multichromosomal median and halving problems under different genomic distances]]. //BMC Bioinformatics// **10**:120, 2009. | ||
+ | |||
+ | ==== Comparative genomics III: Gene clusters ==== | ||
+ | The following are the algorithmic papers in this area. Apart from that, many papers on applications of gene clusters and statistical properties exist, but are not listed here. | ||
+ | |||
+ | (a.) Common intervals of permutations: | ||
+ | - T. Uno and M. Yagiura. [[https://doi.org/10.1007/s004539910014|Fast algorithms to enumerate all common intervals of two permutations]]. //Algorithmica// **26**(2):290-309, 2000. | ||
+ | - S. Heber and J. Stoye. [[https://doi.org/10.1007/3-540-48194-X_19|Finding all common intervals of k permutations]]. In A. Amir and G. Landau, editors, Proceedings of the 12th Annual Symposium on Combinatorial Pattern Matching, //CPM 2001//, volume 2089 of LNCS, pages 207-218, Berlin, 2001. Springer Verlag. | ||
+ | - S. Heber and J. Stoye. [[https://doi.org/10.1007/3-540-44696-6_20|Algorithms for finding gene clusters]]. In O. Gascuel and B. Moret, editors,Proceedings of the First International Workshop on Algorithms in Bioinformatics, //WABI 2001//, volume 2149 of LNCS, pages 252-263, Berlin, 2001. Springer Verlag. | ||
+ | - A. Bergeron, S. Corteel, and M. Raffinot. [[https://doi.org/10.1007/3-540-45784-4_36|The algorithmic of gene teams]]. In R. Guigó and D. Gusfield,editors, Proceedings of the Second International Workshop on Algorithms in Bioinformatics, //WABI 2002//, volume 2452 of LNCS, pages 464-476, Berlin, 2002. Springer Verlag. | ||
+ | - N. Luc, J.-L. Risler, A. Bergeron, and M. Raffinot. [[https://doi.org/10.1016/S1476-9271(02)00097-X|Gene teams: a new formalization of gene clusters for comparative genomics]]. //Comp. Biol. Chem.// **27**:59-67, 2003. | ||
+ | - G. M. Landau, L. Parida, and O. Weimann. [[https://doi.org/10.1089/cmb.2005.12.1289|Gene proximity analysis across whole genomes via PQ tree]]. //J. Comp. Biol.// **12**(10):1289–1306, 2005. | ||
+ | - A. Bergeron, C. Chauve, F. de Montgolfier, and M. Raffinot. [[https://doi.org/10.1137/060651331|Computing common intervals of K permutations, with applications to modular decomposition of graphs]]. //SIAM J. Discrete Mathematics// **22**(3):1022–1039, 2008. | ||
+ | - S. Heber, R. Mayr, J. Stoye. [[https://doi.org/10.1007/s00453-009-9332-1|Common Intervals of Multiple Permutations]]. //Algorithmica// **60**(2):175-206, 2011. | ||
+ | |||
+ | (b.) Common intervals of sequences: | ||
+ | - T. Schmidt and J. Stoye. [[https://doi.org/10.1007/978-3-540-27801-6_26|Quadratic time algorithms for finding common intervals in two and more sequences]]. In S. C. Sahinalp, S. Muthukrishnan, and U. Dogrusoz, editors, Proceedings of the 15th Annual Symposium on Combinatorial Pattern Matching, //CPM 2004//, volume 3109 of LNCS, pages 347-358, Berlin, 2004. Springer Verlag. | ||
+ | - G. Didier, T. Schmidt, J. Stoye, D. Tsur. [[https://doi.org/10.1016/j.jda.2006.03.021|Character Sets of Strings]]. //J. Discr. Alg.// **5**(2):330-340, 2007. | ||
+ | - X. He and M. H. Goldwasser. [[https://doi.org/10.1089/cmb.2005.12.638|Identifying conserved gene clusters in the presence of homology families]]. //J. Comp. Biol.// **12**(6):638-656, 2005. | ||
+ | |||
+ | (c.) Approximate common intervals of sequences: | ||
+ | - S. Böcker, K. Jahn, J. Mixtacki, J. Stoye. [[https://doi.org/10.1089/cmb.2009.0098|Computation of Median Gene Clusters]]. //J. Comp. Biol.// **16**(8):1085-1099, 2009. | ||
+ | - F. Hufsky, L. Kuchenbecker, K. Jahn, J. Stoye, S. Böcker. [[https://doi.org/10.1186/1471-2105-12-106|Swiftly Computing Center Strings]]. //BMC Bioinformatics// **12**:106, 2011. | ||
+ | |||
+ | (d.) Common intervals of indeterminate strings: | ||
+ | - D. Doerr, J. Stoye, S. Böcker, K. Jahn. [[https://doi.org/10.1186/1471-2164-15-S6-S2|Identifying Gene Clusters by Discovering Common Intervals in Indeterminate Strings]]. //BMC Genomics// **15**(Suppl. 6): S2, 2014. | ||
+ | |||
+ | ==== Comparative Genomics IV: Multiple genome rearrangement ==== | ||
+ | |||
+ | - D. Sankoff, G. Sunaram, J. Kececioglu. [[https://doi.org/10.1142/S0129054196000026|Steiner points in the space of genome rearrangements]]. //International Journal of Foundations of Computer Science// **7**(1):1-9, 1996. | ||
+ | - G. Bourque, P. A. Pevzner. [[http://genome.cshlp.org/content/12/1/26|Genome-Scale Evolution: Reconstructing Gene Orders in the Ancestral Species]]. //Genome Res.// **12**:26-36, 2002. | ||
+ | - J. Mixtacki. [[https://doi.org/10.1007/978-3-540-69733-6_28|Genome halving under DCJ revisited]]. In: Proceedings of the 14th Annual International Conference on Computing and Combinatorics (COCOON 2008), volume 5092 of LNCS, pages 276-286, 2008. | ||
+ | - R. Warren, D. Sankoff. [[https://doi.org/10.1142/S0219720009004102|Genome Halving with Double Cut and Join]]. //Journal of Bioinformatics and Computational Biology// **7**(2): 357-371, 2009. | ||
+ | - P Feijão, J. Meidanis. [[https://doi.org/10.1109/TCBB.2011.34|SCJ: A Breakpoint-Like Distance that Simplifies Several Rearrangement Problems]]. //IEEE/ACM Transactions on Computational Biology and Bioinformatics// **6**(3): 1318-1329, 2011. | ||
+ | |||
+ | ==== Comparative Genomics V: Reconstruction of ancestral gene clusters ==== | ||
+ | |||
+ | - R. Wittler. [[https://pub.uni-bielefeld.de/publication/2301939|Phylogeny-based analysis of gene clusters]]. Ph.D. Thesis, Faculty of Technology, Bielefeld University, 2010. | ||
+ | - J. Stoye, R. Wittler. [[https://doi.org/10.1109/TCBB.2008.135|A Unified Approach for Reconstructing Ancient Gene Clusters]]. //IEEE/ACM Transactions on Computational Biology and Bioinformatics// **6**(3):387-400, 2009. | ||
+ | - R. Wittler, J. Maňuch, M. Patterson, J. Stoye. [[https://doi.org/10.1089/cmb.2011.0083|Consistency of Sequence-Based Gene Clusters]]. //J. Comp. Biol.// **18**(9):1023-1039, 2011. | ||
+ | |||
+ | ==== Haplotype inference ==== | ||
+ | A great overview of the combinatorial problems and algorithms in the following book chapter: | ||
+ | |||
+ | - D. Gusfield, S. Hecht Orzack. [[https://doi.org/10.1201/9781420036275|Haplotype Inference]]. In: Handbook of Computational Molecular Biology (Chapter 18), edited by S. Aluru, Chapman & Hall/CRC Computer and Information Science Series, 2006. | ||
+ | |||
+ | A more recent works on the topic, focussing on molecular haplotyping: | ||
+ | |||
+ | - G. W. Klau, T. Marschall. [[https://doi.org/10.1007/978-3-319-58741-7_6|A guided tour to computational haplotyping]]. In: Proc. of CiE 2017, LNCS 10307, Springer Verlag, 2017. | ||
+ | - M. Patterson, T. Marschall, N. Pisanti, L. v. Iersel, L. Stougie, G. W. Klau, A. Schönhuth. [[https://doi.org/10.1089/cmb.2014.0157|WhatsHap: Weighted Haplotype Assembly for Future-Generation Sequencing Reads]]. //Journal of Computational Biology// **22**(6), 498-509, 2015. | ||
+ | |||
+ | The ILP discussed in class is from the following textbook, Section 20.2: | ||
+ | |||
+ | - Dan Gusfield. [[https://doi.org/10.1017/9781108377737|Integer Linear Programming in Computational and Systems Biology]]. Cambridge University Press, 2019. | ||
+ | |||
+ | ==== SNP-disease associations ==== | ||
+ | Here are a few of the more algorithmic papers on the topic, but there exist several more. You may look up the references in this one. | ||
+ | |||
+ | - H. T. T. Toivonen, P. Onkamo, K. Vasko, V. Ollikainen, P. Sevon, H. Mannila, and J. Kere. [[https://doi.org/10.1109/BIBE.2000.889596|Gene mapping by haplotype pattern mining]]. In //IEEE International Symposium on Bio-Informatics and Biomedical Engineering// (BIBE'00): 99, 2000. | ||
+ | - T. Mailund, S. Besenbacher, and M. Schierup. [[https://doi.org/10.1186/1471-2105-7-454|Whole genome association mapping by incompatibilities and local perfect phylogenies]]. //BMC Bioinformatics// **7**:454, 2006. | ||
+ | - J. Nielsen and T. Mailund. [[https://doi.org/10.1186/1471-2105-9-526|SNPFile - A software library and file format for large scale association mapping and population genetics studies]]. //BMC Bioinformatics// **9**:526, 2008. | ||
+ | - S. Besenbacher, C. N. S. Pedersen, and T. Mailund. [[https://doi.org/10.1186/1471-2105-10-S1-S74|A fast algorithm for genome-wide haplotype pattern mining]]. //BMC Bioinformatics// **10**(Suppl 1):S74, 2009. | ||
+ | - S. Besenbacher, T. Mailand, M. H. Schierup. [[https://doi.org/10.1007/978-1-61779-585-5_11|Association Mapping and Disease: Evolutionary Perspectives]]. //Evolutionary Genomics: Statistical and Computational Methods, Volume 2// Methods in Molecular Biology, vol. 856. Chapter 11, 2012. | ||
+ | |||
+ | ==== Computational metagenomics, astrobiology ==== | ||
+ | Papers on tools for the analysis of metagenomics data: | ||
+ | |||
+ | - D. H. Huson, A. F. Auch, J. Qi, S. C. Schuster. [[https://doi.org/10.1101/gr.5969107|MEGAN analysis of metagenomic data]]. //Genome Res.// **17**:377-386, 2007. | ||
+ | - Q. Wang, G. M. Garrity, J. M. Tiedje, J. R. Cole. [[https://doi.org/10.1128/AEM.00062-07|Naïve Bayesian classifier for rapid assignment of rRNA sequences into the new bacterial taxonomy]]. //Appl. Env. Microbiol.//, **73**(16):5261-5267, 2007. | ||
+ | - F. Meyer, D. Paarmann, M. D'Souza, R. Olson, E. M. Glass, M. Kubal, T. Paczian, A. Rodriguez, R. Stevens, A. Wilke, J. Wilkening, R. A. Edwards. [[https://doi.org/10.1186/1471-2105-9-386|The Metagenomics RAST server - A public resource for the automatic phylogenetic and functional analysis of metagenomes]]. //BMC Bioinformatics// **9**:386, 2008. | ||
+ | - L. Krause, N. N. Diaz, A. Goesmann, S. Kelley, T. W. Nattkemper, F. Rohwer, R. A. Edwards, J. Stoye. [[https://doi.org/10.1093/nar/gkn038|Phylogenetic classification of short environmental DNA fragments]]. //Nuleic Acids Res.// **36**(7):2230-2239, 2008. | ||
+ | - M. M. Haque, T. S. Ghosh, D. Komanduri, S. S. Mande. [[https://doi.org/10.1093/bioinformatics/btp317|SOrt-ITEMS: Sequence orthology based approach for improved taxonomic estimation of metagenomic sequences]]. //Bioinformatics// **25**(14):1722-1730, 2009. | ||
+ | - W. Gerlach, S. Jünemann, F. Tille, A. Goesmann, J. Stoye. [[https://doi.org/10.1186/1471-2105-10-430|WebCARMA: a web application for the functional and taxonomic classification of unassembled metagenomic reads]]. //BMC Bioinformatics// **10**:430, 2009. | ||
+ | - D. E. Wood, S. L. Salzberg. [[https://doi.org/10.1186/gb-2014-15-3-r46|Kraken: ultrafast metagenomic sequence classification using exact alignments]]. //Genome Biol.// **15**:R46, 2014. | ||
+ | - A. Oulas, C. Pavloudi, P. Polymenakou, G. A. Pavlopoulos, N. Papanikolaou, G. Kotoulas, C. Arvanitidis, I. Iliopoulos. [[https://doi.org/10.4137/BBI.S12462|Metagenomics: Tools and insights for analyzing next-generation sequencing data derived from biodiversity studies.]] //Bioinformatics and Biology Insights// **9**:75-88, 2015. | ||
+ | |||
+ | (//Astrobiology// was mainly meant as a joke, but have a look here: [[http://astrobiology.nasa.gov/nai/|NAI]]. Or here: [[https://www.wired.com/2010/12/nasa-finds-arsenic-life-form|NASA unveils arsenic life form]].) |