====== Preliminary Discussion ======


===== Discussions on the new Rose =====


==== What type of program do we want ? ====

  - **Rose 2.0 (+ Rearrangements + parameter estimation)** 
  - Game 
    * Educational 
    * Entertaining 

    Rearrangements + parameter estimation 


==== Closer look at Rose 2.0 ====

  * Tree vs. DAG 
  * Genome data structure including meta data 
  * Evolution simulator 
  * <del>Individuals</del> vs. species 
  * Operatoions: 
    * Indels 
    * Substitutions 
    * Rearrangements 
    * Horizontal gene transfer 
    * Duplications 

    and or or: Grammar or else ... 


===== Considerations and feature requests for the new Rose version =====

  * Nice User Interface which helps setting up a configuration file 
  * Some niceness score for quality of sequences for fitness function and to pass //abilities// on to child generation. 
What //dawg// has and Rose hasn't:  


  * General time reversal model 
  * Model for Recombination  
  * Indel parameter estimation 
  * Poisson process as model for indel formations 
  * Consideration of Indels overlapping at sequence ends 
  * Alignment algorithm 
  * In Rose mean sequence length after many Indel operations always grows larger 
  * Substitution model in Rose: minimum branch length can't get smaller than 1 PAM 
What //iSG// has:  


  * Generate root sequence out of a multiple alignment 
What //GSIMULATOR/SIMGRAM/SIMGENOME// has 


  * a model! 
  * //GSIMULATOR//: transducer-based simulator supplying substitution, indels and transducer mutations along a phylogenetic tree 
  * //SIMGRAM//: samples data using phylo-grammars. Uses //XRATE// for parameter estimation 
  * //SIMGENOME//: combines //SIMGRAM// and //GSIMULATOR//, can model protein-coding genes, non-coding genes, pseudogenes, transponsons, conserved elements, microsatellites 
What covers the //infinite sites model//? 


  * two breakpoint rearrangement 
  * deletion/insertion as special cases of two breakpoint rearrangements 
  * three breakpoint rearrangement 
  * duplication 
  * speciation 
The //infinite sites model// treats chromosomes either as continuous intervals or continuous circles, which are divided in sites. No breakpoints are reused. Model can be transfered to //finite sites model// with some special characteristics. 


===== Requirements =====

  * fixed default parameters 
  * parameter file (created by a wizard) 
  * we can use a existing tree format 
  * DAG nice to have 
  * no wasteful datastructure 
  * modular & extendable 
  * genome annotatable (Intron, Exon, Tetramer, ...) 
  * annotation per clicky-clicky possible 
  * output: 
    * pro Blatt : Sequenz 
    * pro Block : Mulitiple Alignment 
    * pro Kante : Operations 
    * Abfolge der Blöcke 
    * Klicki-Bunti 
    * interactiv explorer nice to have 

    Kantenlängen bedeutung: nicht wie bei Rose 1 (Kantenlänge 1 = Mutationswahrscheinlichkeit 1%) 
    use of a Markov-Cain 


===== Input Parameters =====


==== ROSE ====

  - Alphabet /* UndefinedMacro: latex($\Sigma,\ \ |\Sigma|=\ell$) */ 
  - root sequence //**s**//  
    * OR 
average sequence length //**n**// 
    character frequencies /* UndefinedMacro: latex($f=(f_1,\ldots,f_\ell)$) */ 
    mutation guide tree //**T**// (edge length, standard: 1) 
    * OR 
sequence distance /* UndefinedMacro: latex($d_{AV}$) */ (generate binary //**T**// over average pairwise sequence distance) 
    mutation matrix /* UndefinedMacro: latex($M,\ \ \ell\times\ell$) */ (pairwise mutation frequencies for substitutions 
    insertion / deletion probability functions  
    * /* UndefinedMacro: latex($\begin{array}{lclll}p_{ins}&/&p_{del}&&\mbox{probabilities}\\\ell_{ins}&/&\ell_{del}&&\mbox{indel lengths}\end{array}$) */ 

    mutation <del>probability</del> likelihood vector /* UndefinedMacro: latex($\nu,\ \ |\nu|=n$) */ (specify sequence motifs) 


===== Group Meeting 19.05.2009 =====
Groups as follows: 


| \\  \\ **Input** \\ Marvin \\ Rolf \\ Stefan | **Tree/Evolution** \\ Christoph \\ Eyla \\ Konstantin | \\  \\ **Output** \\ Marvin \\ Rolf \\ Stefan |
| |
| **Grammar** \\ Daniel \\ Kai \\ Madis |

==== Input/Output Group ====

  * Marvin: Tree parser (Roland/Pina) 
  * Rolf: Wizard page (swing labs wizard), branches & threadding possible 
  * Out: AGCT, AGCU, RNA, gene order 
  * convert sequences 


==== Grammar Group ====
2 possibilities: 


  - Sequence -> Annotation (Intron/Exon) **parameter estimation** 
  - Evolution (parameters given) 
Problems: 


  * haskell -> java: no SCFG (stochastic context free grammars) 
  * haskell -> java2: still under construction (Georg) 
Alternative: Markov Models/chains (HMM) 


==== Datastructure/Evolution Group ====

  * Memory:  
    * Sequence vs. Operations + root 
    * bases, affiliation to regions (eg introns, exons, telomeres, CDS, open reading frame ...) 

    Circular vs. Linear 
    Edge length: discrete vs. continous 


==== Interfaces / Interactions between groups ====

| **Input** | **Tree** | **Grammar** | **Output** | **Input->Interior** |
|genomes ? | | | | |
|root sequence **OR** \\ length | X | X | root sequence ?| |
|annotation | X | X | | frequencies, sequences WATCH: copies!!|
|Newick tree **OR** \\ #species | X | X | Newick tree ?| Roland-Tree|
|character frequencies | Transitions | (H)MM | | Matrix |

=== Data structures ===

  * Wie: Interfaces (Alles als Java-Objekte übergeben (z.B. "Genom","Sequenz"... mit getter & setter Methoden) 
    * Sequenz: Genom.getChromosome.getSequence = String zurückgeliefert entweder Typ DNA oder AA 
      * Proteine bestehen aus Domänen/AA Container sollte enthalten: Liste von Sequenzen & Array von Annotationsintervallen für jede Sequenz  Sequenzinterface sollte an Proteinsequenz & DNASequenz vererben, die in Sequenzcontainer kommen 
        * (Sequenz hat: Alphabet, String(Sequenz), Annotation oder Hash 

Newick Tree: Roland nach Format fragen & schon fertig geparst übergeben als "Baum-Objekt"  - Input  - Output  - Arbeitsumgebung:  


==== Conventions ====

  * 1. Gruppe sets style 
  * ENGLISH!! 
  * Checkstyle 
  * Eclipse 
  * No commit before update and ALWAYS runnable 
  * .... and more .... 


==== Open Questions ====
Apart from sequences in the input: Genomes (i.e. linear, circular chromosomes) ? 


==== Meeting 4.6.09 ====
{{ 656Tafelbild_090604.jpg | Tafelbild_090604.jpg }} {{656Tafelbild_090604.jpg|}}