====== Preliminary Discussion ======
===== Discussions on the new Rose =====
==== What type of program do we want ? ====
- **Rose 2.0 (+ Rearrangements + parameter estimation)**
- Game
* Educational
* Entertaining
Rearrangements + parameter estimation
==== Closer look at Rose 2.0 ====
* Tree vs. DAG
* Genome data structure including meta data
* Evolution simulator
* Individuals vs. species
* Operatoions:
* Indels
* Substitutions
* Rearrangements
* Horizontal gene transfer
* Duplications
and or or: Grammar or else ...
===== Considerations and feature requests for the new Rose version =====
* Nice User Interface which helps setting up a configuration file
* Some niceness score for quality of sequences for fitness function and to pass //abilities// on to child generation.
What //dawg// has and Rose hasn't:
* General time reversal model
* Model for Recombination
* Indel parameter estimation
* Poisson process as model for indel formations
* Consideration of Indels overlapping at sequence ends
* Alignment algorithm
* In Rose mean sequence length after many Indel operations always grows larger
* Substitution model in Rose: minimum branch length can't get smaller than 1 PAM
What //iSG// has:
* Generate root sequence out of a multiple alignment
What //GSIMULATOR/SIMGRAM/SIMGENOME// has
* a model!
* //GSIMULATOR//: transducer-based simulator supplying substitution, indels and transducer mutations along a phylogenetic tree
* //SIMGRAM//: samples data using phylo-grammars. Uses //XRATE// for parameter estimation
* //SIMGENOME//: combines //SIMGRAM// and //GSIMULATOR//, can model protein-coding genes, non-coding genes, pseudogenes, transponsons, conserved elements, microsatellites
What covers the //infinite sites model//?
* two breakpoint rearrangement
* deletion/insertion as special cases of two breakpoint rearrangements
* three breakpoint rearrangement
* duplication
* speciation
The //infinite sites model// treats chromosomes either as continuous intervals or continuous circles, which are divided in sites. No breakpoints are reused. Model can be transfered to //finite sites model// with some special characteristics.
===== Requirements =====
* fixed default parameters
* parameter file (created by a wizard)
* we can use a existing tree format
* DAG nice to have
* no wasteful datastructure
* modular & extendable
* genome annotatable (Intron, Exon, Tetramer, ...)
* annotation per clicky-clicky possible
* output:
* pro Blatt : Sequenz
* pro Block : Mulitiple Alignment
* pro Kante : Operations
* Abfolge der Blöcke
* Klicki-Bunti
* interactiv explorer nice to have
Kantenlängen bedeutung: nicht wie bei Rose 1 (Kantenlänge 1 = Mutationswahrscheinlichkeit 1%)
use of a Markov-Cain
===== Input Parameters =====
==== ROSE ====
- Alphabet /* UndefinedMacro: latex($\Sigma,\ \ |\Sigma|=\ell$) */
- root sequence //**s**//
* OR
average sequence length //**n**//
character frequencies /* UndefinedMacro: latex($f=(f_1,\ldots,f_\ell)$) */
mutation guide tree //**T**// (edge length, standard: 1)
* OR
sequence distance /* UndefinedMacro: latex($d_{AV}$) */ (generate binary //**T**// over average pairwise sequence distance)
mutation matrix /* UndefinedMacro: latex($M,\ \ \ell\times\ell$) */ (pairwise mutation frequencies for substitutions
insertion / deletion probability functions
* /* UndefinedMacro: latex($\begin{array}{lclll}p_{ins}&/&p_{del}&&\mbox{probabilities}\\\ell_{ins}&/&\ell_{del}&&\mbox{indel lengths}\end{array}$) */
mutation probability likelihood vector /* UndefinedMacro: latex($\nu,\ \ |\nu|=n$) */ (specify sequence motifs)
===== Group Meeting 19.05.2009 =====
Groups as follows:
| \\ \\ **Input** \\ Marvin \\ Rolf \\ Stefan | **Tree/Evolution** \\ Christoph \\ Eyla \\ Konstantin | \\ \\ **Output** \\ Marvin \\ Rolf \\ Stefan |
| |
| **Grammar** \\ Daniel \\ Kai \\ Madis |
==== Input/Output Group ====
* Marvin: Tree parser (Roland/Pina)
* Rolf: Wizard page (swing labs wizard), branches & threadding possible
* Out: AGCT, AGCU, RNA, gene order
* convert sequences
==== Grammar Group ====
2 possibilities:
- Sequence -> Annotation (Intron/Exon) **parameter estimation**
- Evolution (parameters given)
Problems:
* haskell -> java: no SCFG (stochastic context free grammars)
* haskell -> java2: still under construction (Georg)
Alternative: Markov Models/chains (HMM)
==== Datastructure/Evolution Group ====
* Memory:
* Sequence vs. Operations + root
* bases, affiliation to regions (eg introns, exons, telomeres, CDS, open reading frame ...)
Circular vs. Linear
Edge length: discrete vs. continous
==== Interfaces / Interactions between groups ====
| **Input** | **Tree** | **Grammar** | **Output** | **Input->Interior** |
|genomes ? | | | | |
|root sequence **OR** \\ length | X | X | root sequence ?| |
|annotation | X | X | | frequencies, sequences WATCH: copies!!|
|Newick tree **OR** \\ #species | X | X | Newick tree ?| Roland-Tree|
|character frequencies | Transitions | (H)MM | | Matrix |
=== Data structures ===
* Wie: Interfaces (Alles als Java-Objekte übergeben (z.B. "Genom","Sequenz"... mit getter & setter Methoden)
* Sequenz: Genom.getChromosome.getSequence = String zurückgeliefert entweder Typ DNA oder AA
* Proteine bestehen aus Domänen/AA Container sollte enthalten: Liste von Sequenzen & Array von Annotationsintervallen für jede Sequenz Sequenzinterface sollte an Proteinsequenz & DNASequenz vererben, die in Sequenzcontainer kommen
* (Sequenz hat: Alphabet, String(Sequenz), Annotation oder Hash
Newick Tree: Roland nach Format fragen & schon fertig geparst übergeben als "Baum-Objekt" - Input - Output - Arbeitsumgebung:
==== Conventions ====
* 1. Gruppe sets style
* ENGLISH!!
* Checkstyle
* Eclipse
* No commit before update and ALWAYS runnable
* .... and more ....
==== Open Questions ====
Apart from sequences in the input: Genomes (i.e. linear, circular chromosomes) ?
==== Meeting 4.6.09 ====
{{ 656Tafelbild_090604.jpg | Tafelbild_090604.jpg }} {{656Tafelbild_090604.jpg|}}