Differences

This shows you the differences between two versions of the page.

Link to this comparison view

Both sides previous revision Previous revision
Next revision
Previous revision
teaching:alggrliterature [2021/02/05 13:00]
jstoye [Comparative genomics III: Synteny Hierarchies and Gene clusters]
teaching:alggrliterature [2022/11/21 09:57] (current)
jstoye [Genome assembly IIb: Hybrid/long read assembly]
Line 35: Line 35:
  
 ==== Genome assembly Ib: Re-sequencing,​ comparative (reference-based) assembly ==== ==== Genome assembly Ib: Re-sequencing,​ comparative (reference-based) assembly ====
-A good introduction to comparative genome assembly is [1]. The main algorithmic challenge is to map millions of (most very short) sequence reads onto one or more referene geneome(s). Suitable mapping algorithms for this task are [[http://​bibiserv.cebitec.uni-bielefeld.de/​swift/​|SWIFT]] [2], [[http://​bowtie-bio.sourceforge.net/​index.shtml|Bowtie]] [6], ELAND (Cox, unpublished),​ [[http://​maq.sourceforge.net/​|MAQ]] [3], [[http://​rulai.cshl.edu/​rmap/​|RMAP]],​ [[http://​soap.genomics.org.cn/​|SOAP]] [4], [[http://​compbio.cs.toronto.edu/​shrimp/​|SHRiMP]],​ SeqMap [5], TAGGER [7], ZOOM [8], [[http://​bio-bwa.sourceforge.net/​bwa.shtml|BWA]] [9], GSNAP [10], SARUMAN [11], SSAHA2 [12] etc. Methods especially suited for mapping SOLiD reads are presented in [13,14]. +A good introduction to comparative genome assembly is [1]. The main algorithmic challenge is to map millions of (most very short) sequence reads onto one or more referene geneome(s). Suitable mapping algorithms for this task are [[http://​bibiserv.cebitec.uni-bielefeld.de/​swift/​|SWIFT]] [2], [[http://​bowtie-bio.sourceforge.net/​index.shtml|Bowtie]] [6], ELAND (Cox, unpublished),​ [[http://​maq.sourceforge.net/​|MAQ]] [3], [[http://​rulai.cshl.edu/​rmap/​|RMAP]],​ [[http://​soap.genomics.org.cn/​|SOAP]] [4], [[http://​compbio.cs.toronto.edu/​shrimp/​|SHRiMP]],​ SeqMap [5], TAGGER [7], ZOOM [8], [[http://​bio-bwa.sourceforge.net/​bwa.shtml|BWA]] [9], GSNAP [10], SARUMAN [11], SSAHA2 [12], NextGenMap ​[13], etc.
  
   - M. Pop, A. Phillippy, A. L. Delcher, and S. L. Salzberg. [[https://​doi.org/​10.1093/​bib/​5.3.237|Comparative genome assembly]]. //Briefings in Bioinformatics//​ **5**(3):​237-248,​ 2004.    - M. Pop, A. Phillippy, A. L. Delcher, and S. L. Salzberg. [[https://​doi.org/​10.1093/​bib/​5.3.237|Comparative genome assembly]]. //Briefings in Bioinformatics//​ **5**(3):​237-248,​ 2004. 
Line 49: Line 49:
   - J. Blom, T. Jakobi, D. Doppmeier, S. Jaenicke, J. Kalinowski, J. Stoye, A. Goesmann. [[https://​doi.org/​10.1093/​bioinformatics/​btr151|Exact and complete short read alignment to microbial genomes using GPU programming]]. //​Bioinformatics//​ **27**(10): 1351-1358, 2011.    - J. Blom, T. Jakobi, D. Doppmeier, S. Jaenicke, J. Kalinowski, J. Stoye, A. Goesmann. [[https://​doi.org/​10.1093/​bioinformatics/​btr151|Exact and complete short read alignment to microbial genomes using GPU programming]]. //​Bioinformatics//​ **27**(10): 1351-1358, 2011. 
   - Z. Ning, A.J. Cox. [[https://​doi.org/​10.1101/​gr.194201|SSAHA:​ A Fast Search Method for Large DNA Databases]]. //Genome Res.// **11**(10): 1725-1729, 2001.    - Z. Ning, A.J. Cox. [[https://​doi.org/​10.1101/​gr.194201|SSAHA:​ A Fast Search Method for Large DNA Databases]]. //Genome Res.// **11**(10): 1725-1729, 2001. 
-  - LNoéMGîrdeaGKucherov. [[https://​doi.org/​10.1007/978-3-642-12683-3_25|Seed Design Framework for Mapping SOLiD Reads]]. Proceedings of RECOMB 2010, LNBI 6044, 384-396, 2010.  +  - FJ. SedlazeckPReschenederAvon Haeseler. [[https://​doi.org/​10.1093/bioinformatics/​btt468|NextGenMap: fast and accurate read mapping in highly polymorphic genomes]]. //Bioinformatics// **29**(21): 2790-27912013.
-  - M. Csűrös, Sz. Juhos, A. Bérces. [[https://doi.org/10.1007/978-3-642-15294-8_15|Fast Mapping and Precise Alignment of AB SOLiD Color Reads to Reference DNA]]. Proceedings of WABI 2010, LNBI 6293, 176-1882010+
   - L. Oesper, A. Ritz, S. J. Aerni, R. Drebin, B. J. Raphael. [[https://​doi.org/​10.1186/​1471-2105-13-S6-S10|Reconstructing cancer genomes from paired-end sequencing data]]. //BMC Bioinformatics//​ **13**(Suppl. 6):S10, 2012.    - L. Oesper, A. Ritz, S. J. Aerni, R. Drebin, B. J. Raphael. [[https://​doi.org/​10.1186/​1471-2105-13-S6-S10|Reconstructing cancer genomes from paired-end sequencing data]]. //BMC Bioinformatics//​ **13**(Suppl. 6):S10, 2012. 
  
Line 74: Line 73:
   - C.-S. Chin, D. H. Alexander, P. Marks, A. A. Klammer, J. Drake, C. Heiner, A. Clum, A. Copeland, J. Huddleston, E. E. Eichler, S. W. Turner, J. Korlach. [[https://​doi.org/​10.1038/​nmeth.2474|Nonhybrid,​ finished microbial genome assemblies from long-read SMRT sequencing data]]. //Nature Methods// **10**:​563-569,​ 2013.   - C.-S. Chin, D. H. Alexander, P. Marks, A. A. Klammer, J. Drake, C. Heiner, A. Clum, A. Copeland, J. Huddleston, E. E. Eichler, S. W. Turner, J. Korlach. [[https://​doi.org/​10.1038/​nmeth.2474|Nonhybrid,​ finished microbial genome assemblies from long-read SMRT sequencing data]]. //Nature Methods// **10**:​563-569,​ 2013.
   - G. Myers. [[https://​doi.org/​10.1007/​978-3-662-44753-6_5|Efficient Local Alignment Discovery amongst Noisy Long Reads]]. //​Proceedings of WABI 2014//, LNBI 8701, 52-67, 2014.   - G. Myers. [[https://​doi.org/​10.1007/​978-3-662-44753-6_5|Efficient Local Alignment Discovery amongst Noisy Long Reads]]. //​Proceedings of WABI 2014//, LNBI 8701, 52-67, 2014.
 +  - F. J. Sedlazeck, P. Rescheneder,​ M. Smolka, H. Fang, M. Nattestad, A. von Haeseler, M. C. Schatz. [[https://​doi.org/​10.1038/​s41592-018-0001-7|Accurate detection of complex structural variations using single molecule sequencing]]. //Nat. Methods// **15**(6): 461–468, 2018.
   - E. Haghshenas, H. Asghari, J. Stoye, C. Chauve, F. Hach. [[https://​doi.org/​10.1016/​j.isci.2020.101389|HASLR:​ Fast Hybrid Assembly of Long Reads]]. //​iScience//​ **23**(8): 101389, 2020.   - E. Haghshenas, H. Asghari, J. Stoye, C. Chauve, F. Hach. [[https://​doi.org/​10.1016/​j.isci.2020.101389|HASLR:​ Fast Hybrid Assembly of Long Reads]]. //​iScience//​ **23**(8): 101389, 2020.
  
Line 207: Line 207:
  
 ==== Computational pangenomics ==== ==== Computational pangenomics ====
-The gene based method is from the following papers:+The gene based method is considered here (for example):
  
-  - J. Blom, S. P. Albaum, D. Doppmeier, A. Pühler, F.-J. Vorhölter, M. Zakrzewski, and A. Goesmann. [[https://​doi.org/​10.1186/​1471-2105-10-154| EDGAR: A software framework for the comparative analysis of prokaryotic genomes]]. //BMC Bioinformatics//​ 10:154, 2009. +  ​- H. Tettelin et al. [[https://​doi.org/​10.1073/​pnas.0506758102|Genome analysis of multiple pathogenic isolates of Streptococcus agalactiae: Implicationsfor the microbial ‘‘pan-genome’’]]. //Proc. Natl. Academy. Sci. USA// **102**(39):​ 13950-13955,​ 2005. 
-  - J. Blom,  J. Kreis, ​ S. Spänig, ​ T. Juhre, ​ C. Bertelli, C. Ernst, and A. Goesmann. [[https://​doi.org/​10.1093/​nar/​gkw255| EDGAR 2.0: an enhanced software platform for comparative gene content analyses]]. //Nucleic Acids Res.// **44**(W1):​W22–W28,​ 2016. +  ​- J. Blom, S. P. Albaum, D. Doppmeier, A. Pühler, F.-J. Vorhölter, M. Zakrzewski, and A. Goesmann. [[https://​doi.org/​10.1186/​1471-2105-10-154|EDGAR:​ A software framework for the comparative analysis of prokaryotic genomes]]. //BMC Bioinformatics//​ 10:154, 2009. 
-  - J. Blom, S. P. Glaeser, T. Juhre, J. Kreis, P. H. G. Hanel, J. G. Schrader, P. Kämpfer, and A. Goesmann. [[https://​doi.org/​10.1002/​9781118960608.bm00038| EDGAR: A Versatile Tool for Phylogenomics]]. In: W. B. Whitman (ed.). Bergey'​s Manual of Systematics of Archaea and Bacteria, Wiley, 2019.+  - J. Blom,  J. Kreis, ​ S. Spänig, ​ T. Juhre, ​ C. Bertelli, C. Ernst, and A. Goesmann. [[https://​doi.org/​10.1093/​nar/​gkw255|EDGAR 2.0: an enhanced software platform for comparative gene content analyses]]. //Nucleic Acids Res.// **44**(W1):​W22–W28,​ 2016. 
 +  - J. Blom, S. P. Glaeser, T. Juhre, J. Kreis, P. H. G. Hanel, J. G. Schrader, P. Kämpfer, and A. Goesmann. [[https://​doi.org/​10.1002/​9781118960608.bm00038|EDGAR:​ A Versatile Tool for Phylogenomics]]. In: W. B. Whitman (ed.). Bergey'​s Manual of Systematics of Archaea and Bacteria, Wiley, 2019.
  
 A good overview of genome-based computational pangenomics gives the following review paper: A good overview of genome-based computational pangenomics gives the following review paper:
Line 220: Line 221:
  
 (A) Data structures (A) Data structures
 +  - B. Paten, D. Earl, N. Nguyen, M. Diekhans, D. Zerbino, D. Haussler. [[https://​doi.org/​10.1101/​gr.123356.111|Cactus:​ Algorithms for genome multiple sequence alignment]]. //Genome Research// **21**, 1512–1528,​ 2011
   - C. Ernst, S. Rahmann. [[https://​drops.dagstuhl.de/​opus/​volltexte/​2013/​4231/​pdf/​p035-ernst.pdf|PanCake:​ A Data Structure for Pangenomes]]. Proc. of //GCB 2013//, 35-45, 2013.   - C. Ernst, S. Rahmann. [[https://​drops.dagstuhl.de/​opus/​volltexte/​2013/​4231/​pdf/​p035-ernst.pdf|PanCake:​ A Data Structure for Pangenomes]]. Proc. of //GCB 2013//, 35-45, 2013.
   - G. Holley, R. Wittler, and J. Stoye. [[https://​doi.org/​10.1186/​s13015-016-0066-8 |Bloom Filter Trie: an alignment-free and reference-free data structure for pan-genome storage]]. //​Algorithms Mol. Biol.// **11**: 3, 2016.   - G. Holley, R. Wittler, and J. Stoye. [[https://​doi.org/​10.1186/​s13015-016-0066-8 |Bloom Filter Trie: an alignment-free and reference-free data structure for pan-genome storage]]. //​Algorithms Mol. Biol.// **11**: 3, 2016.
Line 231: Line 233:
   - A. Kuhnle, T. Mun, C. Boucher, T. Gagie, B. Langmead, and G. Manzini. [[https://​doi.org/​10.1089/​cmb.2019.0309|Efficient Construction of a Complete Index for Pan-Genomics Read Alignment]]. //J. Comp. Biol.// **27**(4), 500-513, 2020.   - A. Kuhnle, T. Mun, C. Boucher, T. Gagie, B. Langmead, and G. Manzini. [[https://​doi.org/​10.1089/​cmb.2019.0309|Efficient Construction of a Complete Index for Pan-Genomics Read Alignment]]. //J. Comp. Biol.// **27**(4), 500-513, 2020.
   -  N. Luhmann, G. Holley, and M. Achtman. [[https://​doi.org/​10.1101/​2020.01.21.914168|BlastFrost:​ Fast querying of 100,000s of bacterial genomes in Bifrost graphs]]. //​BioRxiv//,​ 2020.   -  N. Luhmann, G. Holley, and M. Achtman. [[https://​doi.org/​10.1101/​2020.01.21.914168|BlastFrost:​ Fast querying of 100,000s of bacterial genomes in Bifrost graphs]]. //​BioRxiv//,​ 2020.
-  -  T. Schulz, R. Wittler, S. Rahmann, F. Hach, and J. Stoye. [[https://​doi.org/​10.1101/2020.09.03.280958|Detecting High Scoring Local Alignments in Pangenome Graphs]]. //BioRxiv//, 2020.+  -  T. Schulz, R. Wittler, S. Rahmann, F. Hach, and J. Stoye. [[https://​doi.org/​10.1093/bioinformatics/​btab077|Detecting High Scoring Local Alignments in Pangenome Graphs]]. //Bioinformatics// **37**(16), 2266–22742021.
  
 (C) Phylogenomics:​ (C) Phylogenomics:​
  
   - R. Wittler. [[https://​doi.org/​10.1186/​s13015-020-00164-3|Alignment- and reference-free phylogenomics with colored de Bruijn graphs]]. //​Algorithms Mol. Biol.// **15**: 4, 2020.   - R. Wittler. [[https://​doi.org/​10.1186/​s13015-020-00164-3|Alignment- and reference-free phylogenomics with colored de Bruijn graphs]]. //​Algorithms Mol. Biol.// **15**: 4, 2020.
 +  - A. Rempel, R. Wittler. [[https://​doi.org/​10.1093/​bioinformatics/​btab444|SANS serif: alignment-free,​ whole-genome-based phylogenetic reconstruction]]. //​Bioinformatics//​ **37**(24), 4868-4870, 2021.
  
 (D) Haplotype inference: (D) Haplotype inference:
Line 266: Line 269:
   - E. Tannier, C. Zheng, D. Sankoff. [[https://​doi.org/​10.1186/​1471-2105-10-120|Multichromosomal median and halving problems under different genomic distances]]. //BMC Bioinformatics//​ **10**:120, 2009.   - E. Tannier, C. Zheng, D. Sankoff. [[https://​doi.org/​10.1186/​1471-2105-10-120|Multichromosomal median and halving problems under different genomic distances]]. //BMC Bioinformatics//​ **10**:120, 2009.
  
-==== Comparative genomics III: Synteny Hierarchies and Gene clusters ====+==== Comparative genomics III: Gene clusters ====
 The following are the algorithmic papers in this area. Apart from that, many papers on applications of gene clusters and statistical properties exist, but are not listed here.  The following are the algorithmic papers in this area. Apart from that, many papers on applications of gene clusters and statistical properties exist, but are not listed here.