Parts of the material is covered by the following two textbooks. Some topics are newer than these books. For most of these, specialized references are given below.
The main issues of this section are discussed in Mounts textbook. Some of the major biological sequence databases are:
Papers on SOLiD sequencing:
This section is mainly based on Chapter 16 of Gusfield's textbook. The Tightest Layout Problem is originally from (Alizadehet al., 1995). Another reference is (Heber et al., 2000).
A document with a good algorithmic description of the method will be published here soon.
References to the original papers (to be completed):
- E.S. Lander and M.S. Waterman, Genomic mapping by fingerprinting random clones: a mathematical analysis. Genomics 2:231-239, 1988.
These methods are all very similar, using variants of de-Bruijn graphs plus additional tricks: EULER-SR , Velvet [2,3], MIRA2 (Link), SSAKE ,VCAKE , SHARCGS , Medvedev/Brudno , ABySS (Link), ALLPATHS , SOAPdenovo , IDBA , etc. A very good recent review paper is .
A good introduction to comparative genome assembly is . The main algorithmic challenge is to map millions of (most very short) sequence reads onto one or more referene geneome(s). Suitable mapping algorithms for this task are SWIFT , Bowtie , ELAND (Cox, unpublished), MAQ , RMAP, SOAP , SHRiMP, SeqMap , TAGGER , ZOOM , BWA , GSNAP , SARUMAN , SSAHA2  etc. Methods especially suited for mapping SOLiD reads are presented in [13,14].
String graph assembly for diploid genomes with long reads is explained on the following poster by PacBio.
Pre-processing (correction of long reads):
A general introduction to HMMs in Bioinformatics is in the textbook by Durbinet al. . Covariance models were introduced in . A similar concept was developed in .
Here are several references, in chronological order:
Here are several references, in chronological order. First, for prokaryotes …
… and now for eukaryotes:
This part is almost completely based on the work of Mathieu Blanchette and co-authors:
All bioinformatics textbooks contain good overviews of sequence analysis by Smith-Waterman and fast heuristic methods for database search like FASTA and BLAST. Here are some references to the original papers, including one by ourselves:
A few papers on EST clustering and splicing graphs:
Very good review about NGS transcriptomics (RNA-seq):
Examples for software supporting RNA-seq:
Just a few in-house papers, because they are so nice:
A couple of papers and one book:
The most classical algorithm for RNA secondary structure prediction is Nussinov's algorithm:
A survey of various techniques for two-dimensional gel alignment is . The method described in the lecture is from , .
The algorithms discussed include: de-novo protein sequencing by mass spectra [1,2], the money changing problem , alignment of time-series of mass spectra .
Some software developed in Bielefeld has been published here:
A textbook that covers this (and much more) is:
Some relevant papers in this area are:
The classical papers are by Hannenhalli and Pevzner (1999 and 1995). A much better readable description for the reversal distance can be found in the book chapter by Bergeron et al. (2005). The general DCJ model is described in (Yancopoulos et al., 2005) and (much nicer) in (Bergeron et al., 2006a). A correct algorithm for sorting by translocations is given in (Bergeron et al., 2006b), an almost correct one for the HP distance in (Bergeron et al., 2009), one formula of which is corrected in (Erdős et al., 2011). A very good overview of median, halving and guided halving results can be found in (Tannier et al., 2009).
The following are the algorithmic papers in this area. Apart from that, many papers on applications of gene clusters and statistical properties exist, but are not listed here.
A great overview of the combinatorial problems and algorithms in the following book chapter:
A more recent paper on the topic is:
Here are a few of the more algorithmic papers on the topic, but there exist several more. You may look up the references in this one.
Papers on tools for the analysis of metagenomics data: