This shows you the differences between two versions of the page.
Both sides previous revision Previous revision Next revision | Previous revision | ||
teaching:alggrliterature [2021/02/05 13:00] jstoye [Comparative genomics III: Synteny Hierarchies and Gene clusters] |
teaching:alggrliterature [2022/11/21 09:57] (current) jstoye [Genome assembly IIb: Hybrid/long read assembly] |
||
---|---|---|---|
Line 35: | Line 35: | ||
==== Genome assembly Ib: Re-sequencing, comparative (reference-based) assembly ==== | ==== Genome assembly Ib: Re-sequencing, comparative (reference-based) assembly ==== | ||
- | A good introduction to comparative genome assembly is [1]. The main algorithmic challenge is to map millions of (most very short) sequence reads onto one or more referene geneome(s). Suitable mapping algorithms for this task are [[http://bibiserv.cebitec.uni-bielefeld.de/swift/|SWIFT]] [2], [[http://bowtie-bio.sourceforge.net/index.shtml|Bowtie]] [6], ELAND (Cox, unpublished), [[http://maq.sourceforge.net/|MAQ]] [3], [[http://rulai.cshl.edu/rmap/|RMAP]], [[http://soap.genomics.org.cn/|SOAP]] [4], [[http://compbio.cs.toronto.edu/shrimp/|SHRiMP]], SeqMap [5], TAGGER [7], ZOOM [8], [[http://bio-bwa.sourceforge.net/bwa.shtml|BWA]] [9], GSNAP [10], SARUMAN [11], SSAHA2 [12] etc. Methods especially suited for mapping SOLiD reads are presented in [13,14]. | + | A good introduction to comparative genome assembly is [1]. The main algorithmic challenge is to map millions of (most very short) sequence reads onto one or more referene geneome(s). Suitable mapping algorithms for this task are [[http://bibiserv.cebitec.uni-bielefeld.de/swift/|SWIFT]] [2], [[http://bowtie-bio.sourceforge.net/index.shtml|Bowtie]] [6], ELAND (Cox, unpublished), [[http://maq.sourceforge.net/|MAQ]] [3], [[http://rulai.cshl.edu/rmap/|RMAP]], [[http://soap.genomics.org.cn/|SOAP]] [4], [[http://compbio.cs.toronto.edu/shrimp/|SHRiMP]], SeqMap [5], TAGGER [7], ZOOM [8], [[http://bio-bwa.sourceforge.net/bwa.shtml|BWA]] [9], GSNAP [10], SARUMAN [11], SSAHA2 [12], NextGenMap [13], etc. |
- M. Pop, A. Phillippy, A. L. Delcher, and S. L. Salzberg. [[https://doi.org/10.1093/bib/5.3.237|Comparative genome assembly]]. //Briefings in Bioinformatics// **5**(3):237-248, 2004. | - M. Pop, A. Phillippy, A. L. Delcher, and S. L. Salzberg. [[https://doi.org/10.1093/bib/5.3.237|Comparative genome assembly]]. //Briefings in Bioinformatics// **5**(3):237-248, 2004. | ||
Line 49: | Line 49: | ||
- J. Blom, T. Jakobi, D. Doppmeier, S. Jaenicke, J. Kalinowski, J. Stoye, A. Goesmann. [[https://doi.org/10.1093/bioinformatics/btr151|Exact and complete short read alignment to microbial genomes using GPU programming]]. //Bioinformatics// **27**(10): 1351-1358, 2011. | - J. Blom, T. Jakobi, D. Doppmeier, S. Jaenicke, J. Kalinowski, J. Stoye, A. Goesmann. [[https://doi.org/10.1093/bioinformatics/btr151|Exact and complete short read alignment to microbial genomes using GPU programming]]. //Bioinformatics// **27**(10): 1351-1358, 2011. | ||
- Z. Ning, A.J. Cox. [[https://doi.org/10.1101/gr.194201|SSAHA: A Fast Search Method for Large DNA Databases]]. //Genome Res.// **11**(10): 1725-1729, 2001. | - Z. Ning, A.J. Cox. [[https://doi.org/10.1101/gr.194201|SSAHA: A Fast Search Method for Large DNA Databases]]. //Genome Res.// **11**(10): 1725-1729, 2001. | ||
- | - L. Noé, M. Gîrdea, G. Kucherov. [[https://doi.org/10.1007/978-3-642-12683-3_25|Seed Design Framework for Mapping SOLiD Reads]]. Proceedings of RECOMB 2010, LNBI 6044, 384-396, 2010. | + | - F. J. Sedlazeck, P. Rescheneder, A. von Haeseler. [[https://doi.org/10.1093/bioinformatics/btt468|NextGenMap: fast and accurate read mapping in highly polymorphic genomes]]. //Bioinformatics// **29**(21): 2790-2791, 2013. |
- | - M. Csűrös, Sz. Juhos, A. Bérces. [[https://doi.org/10.1007/978-3-642-15294-8_15|Fast Mapping and Precise Alignment of AB SOLiD Color Reads to Reference DNA]]. Proceedings of WABI 2010, LNBI 6293, 176-188, 2010. | + | |
- L. Oesper, A. Ritz, S. J. Aerni, R. Drebin, B. J. Raphael. [[https://doi.org/10.1186/1471-2105-13-S6-S10|Reconstructing cancer genomes from paired-end sequencing data]]. //BMC Bioinformatics// **13**(Suppl. 6):S10, 2012. | - L. Oesper, A. Ritz, S. J. Aerni, R. Drebin, B. J. Raphael. [[https://doi.org/10.1186/1471-2105-13-S6-S10|Reconstructing cancer genomes from paired-end sequencing data]]. //BMC Bioinformatics// **13**(Suppl. 6):S10, 2012. | ||
Line 74: | Line 73: | ||
- C.-S. Chin, D. H. Alexander, P. Marks, A. A. Klammer, J. Drake, C. Heiner, A. Clum, A. Copeland, J. Huddleston, E. E. Eichler, S. W. Turner, J. Korlach. [[https://doi.org/10.1038/nmeth.2474|Nonhybrid, finished microbial genome assemblies from long-read SMRT sequencing data]]. //Nature Methods// **10**:563-569, 2013. | - C.-S. Chin, D. H. Alexander, P. Marks, A. A. Klammer, J. Drake, C. Heiner, A. Clum, A. Copeland, J. Huddleston, E. E. Eichler, S. W. Turner, J. Korlach. [[https://doi.org/10.1038/nmeth.2474|Nonhybrid, finished microbial genome assemblies from long-read SMRT sequencing data]]. //Nature Methods// **10**:563-569, 2013. | ||
- G. Myers. [[https://doi.org/10.1007/978-3-662-44753-6_5|Efficient Local Alignment Discovery amongst Noisy Long Reads]]. //Proceedings of WABI 2014//, LNBI 8701, 52-67, 2014. | - G. Myers. [[https://doi.org/10.1007/978-3-662-44753-6_5|Efficient Local Alignment Discovery amongst Noisy Long Reads]]. //Proceedings of WABI 2014//, LNBI 8701, 52-67, 2014. | ||
+ | - F. J. Sedlazeck, P. Rescheneder, M. Smolka, H. Fang, M. Nattestad, A. von Haeseler, M. C. Schatz. [[https://doi.org/10.1038/s41592-018-0001-7|Accurate detection of complex structural variations using single molecule sequencing]]. //Nat. Methods// **15**(6): 461–468, 2018. | ||
- E. Haghshenas, H. Asghari, J. Stoye, C. Chauve, F. Hach. [[https://doi.org/10.1016/j.isci.2020.101389|HASLR: Fast Hybrid Assembly of Long Reads]]. //iScience// **23**(8): 101389, 2020. | - E. Haghshenas, H. Asghari, J. Stoye, C. Chauve, F. Hach. [[https://doi.org/10.1016/j.isci.2020.101389|HASLR: Fast Hybrid Assembly of Long Reads]]. //iScience// **23**(8): 101389, 2020. | ||
Line 207: | Line 207: | ||
==== Computational pangenomics ==== | ==== Computational pangenomics ==== | ||
- | The gene based method is from the following papers: | + | The gene based method is considered here (for example): |
- | - J. Blom, S. P. Albaum, D. Doppmeier, A. Pühler, F.-J. Vorhölter, M. Zakrzewski, and A. Goesmann. [[https://doi.org/10.1186/1471-2105-10-154| EDGAR: A software framework for the comparative analysis of prokaryotic genomes]]. //BMC Bioinformatics// 10:154, 2009. | + | - H. Tettelin et al. [[https://doi.org/10.1073/pnas.0506758102|Genome analysis of multiple pathogenic isolates of Streptococcus agalactiae: Implicationsfor the microbial ‘‘pan-genome’’]]. //Proc. Natl. Academy. Sci. USA// **102**(39): 13950-13955, 2005. |
- | - J. Blom, J. Kreis, S. Spänig, T. Juhre, C. Bertelli, C. Ernst, and A. Goesmann. [[https://doi.org/10.1093/nar/gkw255| EDGAR 2.0: an enhanced software platform for comparative gene content analyses]]. //Nucleic Acids Res.// **44**(W1):W22–W28, 2016. | + | - J. Blom, S. P. Albaum, D. Doppmeier, A. Pühler, F.-J. Vorhölter, M. Zakrzewski, and A. Goesmann. [[https://doi.org/10.1186/1471-2105-10-154|EDGAR: A software framework for the comparative analysis of prokaryotic genomes]]. //BMC Bioinformatics// 10:154, 2009. |
- | - J. Blom, S. P. Glaeser, T. Juhre, J. Kreis, P. H. G. Hanel, J. G. Schrader, P. Kämpfer, and A. Goesmann. [[https://doi.org/10.1002/9781118960608.bm00038| EDGAR: A Versatile Tool for Phylogenomics]]. In: W. B. Whitman (ed.). Bergey's Manual of Systematics of Archaea and Bacteria, Wiley, 2019. | + | - J. Blom, J. Kreis, S. Spänig, T. Juhre, C. Bertelli, C. Ernst, and A. Goesmann. [[https://doi.org/10.1093/nar/gkw255|EDGAR 2.0: an enhanced software platform for comparative gene content analyses]]. //Nucleic Acids Res.// **44**(W1):W22–W28, 2016. |
+ | - J. Blom, S. P. Glaeser, T. Juhre, J. Kreis, P. H. G. Hanel, J. G. Schrader, P. Kämpfer, and A. Goesmann. [[https://doi.org/10.1002/9781118960608.bm00038|EDGAR: A Versatile Tool for Phylogenomics]]. In: W. B. Whitman (ed.). Bergey's Manual of Systematics of Archaea and Bacteria, Wiley, 2019. | ||
A good overview of genome-based computational pangenomics gives the following review paper: | A good overview of genome-based computational pangenomics gives the following review paper: | ||
Line 220: | Line 221: | ||
(A) Data structures | (A) Data structures | ||
+ | - B. Paten, D. Earl, N. Nguyen, M. Diekhans, D. Zerbino, D. Haussler. [[https://doi.org/10.1101/gr.123356.111|Cactus: Algorithms for genome multiple sequence alignment]]. //Genome Research// **21**, 1512–1528, 2011 | ||
- C. Ernst, S. Rahmann. [[https://drops.dagstuhl.de/opus/volltexte/2013/4231/pdf/p035-ernst.pdf|PanCake: A Data Structure for Pangenomes]]. Proc. of //GCB 2013//, 35-45, 2013. | - C. Ernst, S. Rahmann. [[https://drops.dagstuhl.de/opus/volltexte/2013/4231/pdf/p035-ernst.pdf|PanCake: A Data Structure for Pangenomes]]. Proc. of //GCB 2013//, 35-45, 2013. | ||
- G. Holley, R. Wittler, and J. Stoye. [[https://doi.org/10.1186/s13015-016-0066-8 |Bloom Filter Trie: an alignment-free and reference-free data structure for pan-genome storage]]. //Algorithms Mol. Biol.// **11**: 3, 2016. | - G. Holley, R. Wittler, and J. Stoye. [[https://doi.org/10.1186/s13015-016-0066-8 |Bloom Filter Trie: an alignment-free and reference-free data structure for pan-genome storage]]. //Algorithms Mol. Biol.// **11**: 3, 2016. | ||
Line 231: | Line 233: | ||
- A. Kuhnle, T. Mun, C. Boucher, T. Gagie, B. Langmead, and G. Manzini. [[https://doi.org/10.1089/cmb.2019.0309|Efficient Construction of a Complete Index for Pan-Genomics Read Alignment]]. //J. Comp. Biol.// **27**(4), 500-513, 2020. | - A. Kuhnle, T. Mun, C. Boucher, T. Gagie, B. Langmead, and G. Manzini. [[https://doi.org/10.1089/cmb.2019.0309|Efficient Construction of a Complete Index for Pan-Genomics Read Alignment]]. //J. Comp. Biol.// **27**(4), 500-513, 2020. | ||
- N. Luhmann, G. Holley, and M. Achtman. [[https://doi.org/10.1101/2020.01.21.914168|BlastFrost: Fast querying of 100,000s of bacterial genomes in Bifrost graphs]]. //BioRxiv//, 2020. | - N. Luhmann, G. Holley, and M. Achtman. [[https://doi.org/10.1101/2020.01.21.914168|BlastFrost: Fast querying of 100,000s of bacterial genomes in Bifrost graphs]]. //BioRxiv//, 2020. | ||
- | - T. Schulz, R. Wittler, S. Rahmann, F. Hach, and J. Stoye. [[https://doi.org/10.1101/2020.09.03.280958|Detecting High Scoring Local Alignments in Pangenome Graphs]]. //BioRxiv//, 2020. | + | - T. Schulz, R. Wittler, S. Rahmann, F. Hach, and J. Stoye. [[https://doi.org/10.1093/bioinformatics/btab077|Detecting High Scoring Local Alignments in Pangenome Graphs]]. //Bioinformatics// **37**(16), 2266–2274, 2021. |
(C) Phylogenomics: | (C) Phylogenomics: | ||
- R. Wittler. [[https://doi.org/10.1186/s13015-020-00164-3|Alignment- and reference-free phylogenomics with colored de Bruijn graphs]]. //Algorithms Mol. Biol.// **15**: 4, 2020. | - R. Wittler. [[https://doi.org/10.1186/s13015-020-00164-3|Alignment- and reference-free phylogenomics with colored de Bruijn graphs]]. //Algorithms Mol. Biol.// **15**: 4, 2020. | ||
+ | - A. Rempel, R. Wittler. [[https://doi.org/10.1093/bioinformatics/btab444|SANS serif: alignment-free, whole-genome-based phylogenetic reconstruction]]. //Bioinformatics// **37**(24), 4868-4870, 2021. | ||
(D) Haplotype inference: | (D) Haplotype inference: | ||
Line 266: | Line 269: | ||
- E. Tannier, C. Zheng, D. Sankoff. [[https://doi.org/10.1186/1471-2105-10-120|Multichromosomal median and halving problems under different genomic distances]]. //BMC Bioinformatics// **10**:120, 2009. | - E. Tannier, C. Zheng, D. Sankoff. [[https://doi.org/10.1186/1471-2105-10-120|Multichromosomal median and halving problems under different genomic distances]]. //BMC Bioinformatics// **10**:120, 2009. | ||
- | ==== Comparative genomics III: Synteny Hierarchies and Gene clusters ==== | + | ==== Comparative genomics III: Gene clusters ==== |
The following are the algorithmic papers in this area. Apart from that, many papers on applications of gene clusters and statistical properties exist, but are not listed here. | The following are the algorithmic papers in this area. Apart from that, many papers on applications of gene clusters and statistical properties exist, but are not listed here. | ||