====== Preliminary Discussion ====== ===== Discussions on the new Rose ===== ==== What type of program do we want ? ==== - **Rose 2.0 (+ Rearrangements + parameter estimation)** - Game * Educational * Entertaining Rearrangements + parameter estimation ==== Closer look at Rose 2.0 ==== * Tree vs. DAG * Genome data structure including meta data * Evolution simulator * Individuals vs. species * Operatoions: * Indels * Substitutions * Rearrangements * Horizontal gene transfer * Duplications and or or: Grammar or else ... ===== Considerations and feature requests for the new Rose version ===== * Nice User Interface which helps setting up a configuration file * Some niceness score for quality of sequences for fitness function and to pass //abilities// on to child generation. What //dawg// has and Rose hasn't: * General time reversal model * Model for Recombination * Indel parameter estimation * Poisson process as model for indel formations * Consideration of Indels overlapping at sequence ends * Alignment algorithm * In Rose mean sequence length after many Indel operations always grows larger * Substitution model in Rose: minimum branch length can't get smaller than 1 PAM What //iSG// has: * Generate root sequence out of a multiple alignment What //GSIMULATOR/SIMGRAM/SIMGENOME// has * a model! * //GSIMULATOR//: transducer-based simulator supplying substitution, indels and transducer mutations along a phylogenetic tree * //SIMGRAM//: samples data using phylo-grammars. Uses //XRATE// for parameter estimation * //SIMGENOME//: combines //SIMGRAM// and //GSIMULATOR//, can model protein-coding genes, non-coding genes, pseudogenes, transponsons, conserved elements, microsatellites What covers the //infinite sites model//? * two breakpoint rearrangement * deletion/insertion as special cases of two breakpoint rearrangements * three breakpoint rearrangement * duplication * speciation The //infinite sites model// treats chromosomes either as continuous intervals or continuous circles, which are divided in sites. No breakpoints are reused. Model can be transfered to //finite sites model// with some special characteristics. ===== Requirements ===== * fixed default parameters * parameter file (created by a wizard) * we can use a existing tree format * DAG nice to have * no wasteful datastructure * modular & extendable * genome annotatable (Intron, Exon, Tetramer, ...) * annotation per clicky-clicky possible * output: * pro Blatt : Sequenz * pro Block : Mulitiple Alignment * pro Kante : Operations * Abfolge der Blöcke * Klicki-Bunti * interactiv explorer nice to have Kantenlängen bedeutung: nicht wie bei Rose 1 (Kantenlänge 1 = Mutationswahrscheinlichkeit 1%) use of a Markov-Cain ===== Input Parameters ===== ==== ROSE ==== - Alphabet /* UndefinedMacro: latex($\Sigma,\ \ |\Sigma|=\ell$) */ - root sequence //**s**// * OR average sequence length //**n**// character frequencies /* UndefinedMacro: latex($f=(f_1,\ldots,f_\ell)$) */ mutation guide tree //**T**// (edge length, standard: 1) * OR sequence distance /* UndefinedMacro: latex($d_{AV}$) */ (generate binary //**T**// over average pairwise sequence distance) mutation matrix /* UndefinedMacro: latex($M,\ \ \ell\times\ell$) */ (pairwise mutation frequencies for substitutions insertion / deletion probability functions * /* UndefinedMacro: latex($\begin{array}{lclll}p_{ins}&/&p_{del}&&\mbox{probabilities}\\\ell_{ins}&/&\ell_{del}&&\mbox{indel lengths}\end{array}$) */ mutation probability likelihood vector /* UndefinedMacro: latex($\nu,\ \ |\nu|=n$) */ (specify sequence motifs) ===== Group Meeting 19.05.2009 ===== Groups as follows: | \\ \\ **Input** \\ Marvin \\ Rolf \\ Stefan | **Tree/Evolution** \\ Christoph \\ Eyla \\ Konstantin | \\ \\ **Output** \\ Marvin \\ Rolf \\ Stefan | | | | **Grammar** \\ Daniel \\ Kai \\ Madis | ==== Input/Output Group ==== * Marvin: Tree parser (Roland/Pina) * Rolf: Wizard page (swing labs wizard), branches & threadding possible * Out: AGCT, AGCU, RNA, gene order * convert sequences ==== Grammar Group ==== 2 possibilities: - Sequence -> Annotation (Intron/Exon) **parameter estimation** - Evolution (parameters given) Problems: * haskell -> java: no SCFG (stochastic context free grammars) * haskell -> java2: still under construction (Georg) Alternative: Markov Models/chains (HMM) ==== Datastructure/Evolution Group ==== * Memory: * Sequence vs. Operations + root * bases, affiliation to regions (eg introns, exons, telomeres, CDS, open reading frame ...) Circular vs. Linear Edge length: discrete vs. continous ==== Interfaces / Interactions between groups ==== | **Input** | **Tree** | **Grammar** | **Output** | **Input->Interior** | |genomes ? | | | | | |root sequence **OR** \\ length | X | X | root sequence ?| | |annotation | X | X | | frequencies, sequences WATCH: copies!!| |Newick tree **OR** \\ #species | X | X | Newick tree ?| Roland-Tree| |character frequencies | Transitions | (H)MM | | Matrix | === Data structures === * Wie: Interfaces (Alles als Java-Objekte übergeben (z.B. "Genom","Sequenz"... mit getter & setter Methoden) * Sequenz: Genom.getChromosome.getSequence = String zurückgeliefert entweder Typ DNA oder AA * Proteine bestehen aus Domänen/AA Container sollte enthalten: Liste von Sequenzen & Array von Annotationsintervallen für jede Sequenz Sequenzinterface sollte an Proteinsequenz & DNASequenz vererben, die in Sequenzcontainer kommen * (Sequenz hat: Alphabet, String(Sequenz), Annotation oder Hash Newick Tree: Roland nach Format fragen & schon fertig geparst übergeben als "Baum-Objekt" - Input - Output - Arbeitsumgebung: ==== Conventions ==== * 1. Gruppe sets style * ENGLISH!! * Checkstyle * Eclipse * No commit before update and ALWAYS runnable * .... and more .... ==== Open Questions ==== Apart from sequences in the input: Genomes (i.e. linear, circular chromosomes) ? ==== Meeting 4.6.09 ==== {{ 656Tafelbild_090604.jpg | Tafelbild_090604.jpg }} {{656Tafelbild_090604.jpg|}}