Differences

This shows you the differences between two versions of the page.

Link to this comparison view

Both sides previous revision Previous revision
Next revision
Previous revision
teaching:alggrliterature [2021/02/05 13:00]
jstoye [Comparative genomics III: Synteny Hierarchies and Gene clusters]
teaching:alggrliterature [2022/11/21 09:57] (current)
jstoye [Genome assembly IIb: Hybrid/long read assembly]
Line 35: Line 35:
  
 ==== Genome assembly Ib: Re-sequencing,​ comparative (reference-based) assembly ==== ==== Genome assembly Ib: Re-sequencing,​ comparative (reference-based) assembly ====
-A good introduction to comparative genome assembly is [1]. The main algorithmic challenge is to map millions of (most very short) sequence reads onto one or more referene geneome(s). Suitable mapping algorithms for this task are [[http://​bibiserv.cebitec.uni-bielefeld.de/​swift/​|SWIFT]] [2], [[http://​bowtie-bio.sourceforge.net/​index.shtml|Bowtie]] [6], ELAND (Cox, unpublished),​ [[http://​maq.sourceforge.net/​|MAQ]] [3], [[http://​rulai.cshl.edu/​rmap/​|RMAP]],​ [[http://​soap.genomics.org.cn/​|SOAP]] [4], [[http://​compbio.cs.toronto.edu/​shrimp/​|SHRiMP]],​ SeqMap [5], TAGGER [7], ZOOM [8], [[http://​bio-bwa.sourceforge.net/​bwa.shtml|BWA]] [9], GSNAP [10], SARUMAN [11], SSAHA2 [12] etc. Methods especially suited for mapping SOLiD reads are presented in [13,14]. +A good introduction to comparative genome assembly is [1]. The main algorithmic challenge is to map millions of (most very short) sequence reads onto one or more referene geneome(s). Suitable mapping algorithms for this task are [[http://​bibiserv.cebitec.uni-bielefeld.de/​swift/​|SWIFT]] [2], [[http://​bowtie-bio.sourceforge.net/​index.shtml|Bowtie]] [6], ELAND (Cox, unpublished),​ [[http://​maq.sourceforge.net/​|MAQ]] [3], [[http://​rulai.cshl.edu/​rmap/​|RMAP]],​ [[http://​soap.genomics.org.cn/​|SOAP]] [4], [[http://​compbio.cs.toronto.edu/​shrimp/​|SHRiMP]],​ SeqMap [5], TAGGER [7], ZOOM [8], [[http://​bio-bwa.sourceforge.net/​bwa.shtml|BWA]] [9], GSNAP [10], SARUMAN [11], SSAHA2 [12], NextGenMap ​[13], etc.
  
   - M. Pop, A. Phillippy, A. L. Delcher, and S. L. Salzberg. [[https://​doi.org/​10.1093/​bib/​5.3.237|Comparative genome assembly]]. //Briefings in Bioinformatics//​ **5**(3):​237-248,​ 2004.    - M. Pop, A. Phillippy, A. L. Delcher, and S. L. Salzberg. [[https://​doi.org/​10.1093/​bib/​5.3.237|Comparative genome assembly]]. //Briefings in Bioinformatics//​ **5**(3):​237-248,​ 2004. 
Line 49: Line 49:
   - J. Blom, T. Jakobi, D. Doppmeier, S. Jaenicke, J. Kalinowski, J. Stoye, A. Goesmann. [[https://​doi.org/​10.1093/​bioinformatics/​btr151|Exact and complete short read alignment to microbial genomes using GPU programming]]. //​Bioinformatics//​ **27**(10): 1351-1358, 2011.    - J. Blom, T. Jakobi, D. Doppmeier, S. Jaenicke, J. Kalinowski, J. Stoye, A. Goesmann. [[https://​doi.org/​10.1093/​bioinformatics/​btr151|Exact and complete short read alignment to microbial genomes using GPU programming]]. //​Bioinformatics//​ **27**(10): 1351-1358, 2011. 
   - Z. Ning, A.J. Cox. [[https://​doi.org/​10.1101/​gr.194201|SSAHA:​ A Fast Search Method for Large DNA Databases]]. //Genome Res.// **11**(10): 1725-1729, 2001.    - Z. Ning, A.J. Cox. [[https://​doi.org/​10.1101/​gr.194201|SSAHA:​ A Fast Search Method for Large DNA Databases]]. //Genome Res.// **11**(10): 1725-1729, 2001. 
-  - LNoéMGîrdeaGKucherov. [[https://​doi.org/​10.1007/978-3-642-12683-3_25|Seed Design Framework for Mapping SOLiD Reads]]. Proceedings of RECOMB 2010, LNBI 6044, 384-396, 2010.  +  - FJ. SedlazeckPReschenederAvon Haeseler. [[https://​doi.org/​10.1093/bioinformatics/​btt468|NextGenMap: fast and accurate read mapping in highly polymorphic genomes]]. //Bioinformatics// **29**(21): 2790-27912013.
-  - M. Csűrös, Sz. Juhos, A. Bérces. [[https://doi.org/10.1007/978-3-642-15294-8_15|Fast Mapping and Precise Alignment of AB SOLiD Color Reads to Reference DNA]]. Proceedings of WABI 2010, LNBI 6293, 176-1882010+
   - L. Oesper, A. Ritz, S. J. Aerni, R. Drebin, B. J. Raphael. [[https://​doi.org/​10.1186/​1471-2105-13-S6-S10|Reconstructing cancer genomes from paired-end sequencing data]]. //BMC Bioinformatics//​ **13**(Suppl. 6):S10, 2012.    - L. Oesper, A. Ritz, S. J. Aerni, R. Drebin, B. J. Raphael. [[https://​doi.org/​10.1186/​1471-2105-13-S6-S10|Reconstructing cancer genomes from paired-end sequencing data]]. //BMC Bioinformatics//​ **13**(Suppl. 6):S10, 2012. 
  
Line 74: Line 73:
   - C.-S. Chin, D. H. Alexander, P. Marks, A. A. Klammer, J. Drake, C. Heiner, A. Clum, A. Copeland, J. Huddleston, E. E. Eichler, S. W. Turner, J. Korlach. [[https://​doi.org/​10.1038/​nmeth.2474|Nonhybrid,​ finished microbial genome assemblies from long-read SMRT sequencing data]]. //Nature Methods// **10**:​563-569,​ 2013.   - C.-S. Chin, D. H. Alexander, P. Marks, A. A. Klammer, J. Drake, C. Heiner, A. Clum, A. Copeland, J. Huddleston, E. E. Eichler, S. W. Turner, J. Korlach. [[https://​doi.org/​10.1038/​nmeth.2474|Nonhybrid,​ finished microbial genome assemblies from long-read SMRT sequencing data]]. //Nature Methods// **10**:​563-569,​ 2013.
   - G. Myers. [[https://​doi.org/​10.1007/​978-3-662-44753-6_5|Efficient Local Alignment Discovery amongst Noisy Long Reads]]. //​Proceedings of WABI 2014//, LNBI 8701, 52-67, 2014.   - G. Myers. [[https://​doi.org/​10.1007/​978-3-662-44753-6_5|Efficient Local Alignment Discovery amongst Noisy Long Reads]]. //​Proceedings of WABI 2014//, LNBI 8701, 52-67, 2014.
 +  - F. J. Sedlazeck, P. Rescheneder,​ M. Smolka, H. Fang, M. Nattestad, A. von Haeseler, M. C. Schatz. [[https://​doi.org/​10.1038/​s41592-018-0001-7|Accurate detection of complex structural variations using single molecule sequencing]]. //Nat. Methods// **15**(6): 461–468, 2018.
   - E. Haghshenas, H. Asghari, J. Stoye, C. Chauve, F. Hach. [[https://​doi.org/​10.1016/​j.isci.2020.101389|HASLR:​ Fast Hybrid Assembly of Long Reads]]. //​iScience//​ **23**(8): 101389, 2020.   - E. Haghshenas, H. Asghari, J. Stoye, C. Chauve, F. Hach. [[https://​doi.org/​10.1016/​j.isci.2020.101389|HASLR:​ Fast Hybrid Assembly of Long Reads]]. //​iScience//​ **23**(8): 101389, 2020.
  
Line 207: Line 207:
  
 ==== Computational pangenomics ==== ==== Computational pangenomics ====
-The gene based method is from the following papers:+The gene based method is considered here (for example):
  
-  - J. Blom, S. P. Albaum, D. Doppmeier, A. Pühler, F.-J. Vorhölter, M. Zakrzewski, and A. Goesmann. [[https://​doi.org/​10.1186/​1471-2105-10-154| EDGAR: A software framework for the comparative analysis of prokaryotic genomes]]. //BMC Bioinformatics//​ 10:154, 2009. +  ​- H. Tettelin et al. [[https://​doi.org/​10.1073/​pnas.0506758102|Genome analysis of multiple pathogenic isolates of Streptococcus agalactiae: Implicationsfor the microbial ‘‘pan-genome’’]]. //Proc. Natl. Academy. Sci. USA// **102**(39):​ 13950-13955,​ 2005. 
-  - J. Blom,  J. Kreis, ​ S. Spänig, ​ T. Juhre, ​ C. Bertelli, C. Ernst, and A. Goesmann. [[https://​doi.org/​10.1093/​nar/​gkw255| EDGAR 2.0: an enhanced software platform for comparative gene content analyses]]. //Nucleic Acids Res.// **44**(W1):​W22–W28,​ 2016. +  ​- J. Blom, S. P. Albaum, D. Doppmeier, A. Pühler, F.-J. Vorhölter, M. Zakrzewski, and A. Goesmann. [[https://​doi.org/​10.1186/​1471-2105-10-154|EDGAR:​ A software framework for the comparative analysis of prokaryotic genomes]]. //BMC Bioinformatics//​ 10:154, 2009. 
-  - J. Blom, S. P. Glaeser, T. Juhre, J. Kreis, P. H. G. Hanel, J. G. Schrader, P. Kämpfer, and A. Goesmann. [[https://​doi.org/​10.1002/​9781118960608.bm00038| EDGAR: A Versatile Tool for Phylogenomics]]. In: W. B. Whitman (ed.). Bergey'​s Manual of Systematics of Archaea and Bacteria, Wiley, 2019.+  - J. Blom,  J. Kreis, ​ S. Spänig, ​ T. Juhre, ​ C. Bertelli, C. Ernst, and A. Goesmann. [[https://​doi.org/​10.1093/​nar/​gkw255|EDGAR 2.0: an enhanced software platform for comparative gene content analyses]]. //Nucleic Acids Res.// **44**(W1):​W22–W28,​ 2016. 
 +  - J. Blom, S. P. Glaeser, T. Juhre, J. Kreis, P. H. G. Hanel, J. G. Schrader, P. Kämpfer, and A. Goesmann. [[https://​doi.org/​10.1002/​9781118960608.bm00038|EDGAR:​ A Versatile Tool for Phylogenomics]]. In: W. B. Whitman (ed.). Bergey'​s Manual of Systematics of Archaea and Bacteria, Wiley, 2019.
  
 A good overview of genome-based computational pangenomics gives the following review paper: A good overview of genome-based computational pangenomics gives the following review paper:
Line 220: Line 221:
  
 (A) Data structures (A) Data structures
 +  - B. Paten, D. Earl, N. Nguyen, M. Diekhans, D. Zerbino, D. Haussler. [[https://​doi.org/​10.1101/​gr.123356.111|Cactus:​ Algorithms for genome multiple sequence alignment]]. //Genome Research// **21**, 1512–1528,​ 2011
   - C. Ernst, S. Rahmann. [[https://​drops.dagstuhl.de/​opus/​volltexte/​2013/​4231/​pdf/​p035-ernst.pdf|PanCake:​ A Data Structure for Pangenomes]]. Proc. of //GCB 2013//, 35-45, 2013.   - C. Ernst, S. Rahmann. [[https://​drops.dagstuhl.de/​opus/​volltexte/​2013/​4231/​pdf/​p035-ernst.pdf|PanCake:​ A Data Structure for Pangenomes]]. Proc. of //GCB 2013//, 35-45, 2013.
   - G. Holley, R. Wittler, and J. Stoye. [[https://​doi.org/​10.1186/​s13015-016-0066-8 |Bloom Filter Trie: an alignment-free and reference-free data structure for pan-genome storage]]. //​Algorithms Mol. Biol.// **11**: 3, 2016.   - G. Holley, R. Wittler, and J. Stoye. [[https://​doi.org/​10.1186/​s13015-016-0066-8 |Bloom Filter Trie: an alignment-free and reference-free data structure for pan-genome storage]]. //​Algorithms Mol. Biol.// **11**: 3, 2016.
Line 231: Line 233:
   - A. Kuhnle, T. Mun, C. Boucher, T. Gagie, B. Langmead, and G. Manzini. [[https://​doi.org/​10.1089/​cmb.2019.0309|Efficient Construction of a Complete Index for Pan-Genomics Read Alignment]]. //J. Comp. Biol.// **27**(4), 500-513, 2020.   - A. Kuhnle, T. Mun, C. Boucher, T. Gagie, B. Langmead, and G. Manzini. [[https://​doi.org/​10.1089/​cmb.2019.0309|Efficient Construction of a Complete Index for Pan-Genomics Read Alignment]]. //J. Comp. Biol.// **27**(4), 500-513, 2020.
   -  N. Luhmann, G. Holley, and M. Achtman. [[https://​doi.org/​10.1101/​2020.01.21.914168|BlastFrost:​ Fast querying of 100,000s of bacterial genomes in Bifrost graphs]]. //​BioRxiv//,​ 2020.   -  N. Luhmann, G. Holley, and M. Achtman. [[https://​doi.org/​10.1101/​2020.01.21.914168|BlastFrost:​ Fast querying of 100,000s of bacterial genomes in Bifrost graphs]]. //​BioRxiv//,​ 2020.
-  -  T. Schulz, R. Wittler, S. Rahmann, F. Hach, and J. Stoye. [[https://​doi.org/​10.1101/2020.09.03.280958|Detecting High Scoring Local Alignments in Pangenome Graphs]]. //BioRxiv//, 2020.+  -  T. Schulz, R. Wittler, S. Rahmann, F. Hach, and J. Stoye. [[https://​doi.org/​10.1093/bioinformatics/​btab077|Detecting High Scoring Local Alignments in Pangenome Graphs]]. //Bioinformatics// **37**(16), 2266–22742021.
  
 (C) Phylogenomics:​ (C) Phylogenomics:​
  
   - R. Wittler. [[https://​doi.org/​10.1186/​s13015-020-00164-3|Alignment- and reference-free phylogenomics with colored de Bruijn graphs]]. //​Algorithms Mol. Biol.// **15**: 4, 2020.   - R. Wittler. [[https://​doi.org/​10.1186/​s13015-020-00164-3|Alignment- and reference-free phylogenomics with colored de Bruijn graphs]]. //​Algorithms Mol. Biol.// **15**: 4, 2020.
 +  - A. Rempel, R. Wittler. [[https://​doi.org/​10.1093/​bioinformatics/​btab444|SANS serif: alignment-free,​ whole-genome-based phylogenetic reconstruction]]. //​Bioinformatics//​ **37**(24), 4868-4870, 2021.
  
 (D) Haplotype inference: (D) Haplotype inference: