12. Ancestral reconstruction Literature: - Duchemin, Wandrille, et al. "DeCoSTAR: Reconstructing the ancestral organization of genes or genomes using reconciled phylogenies." Genome biology and evolution 9.5 (2017): 1312-1319. 12.1 Small Parsimony Problem The Small Parsimony Problem is similar to the Multiple Genome Rearrangement Problem, except that the species tree is given: Problem 12.1 (Small Parsimony Problem): Given a tree T, extant genomes at its leaves, a measure of genome rearrangements d (DCJ, reversals, breakpoint, ...), find for each internal vertex v of T a genome G_v such that the measure D(T) = \sum_{(u,v) \in E(T)} d(G_u,G_v) is minimized. Solving the small parsimony problem is appropriate if genome models are considered that do not allow gene content modifications such as deletions, insertions or duplications. Yet, such modifications can be frequently observed in biological data. This leads to the question, how ancestral organization of genes can be reconstructed in the face of gene loss and gain, assuming such events are available (in form of a gene tree). 12.2 Ancestral reconstruction using reconciled phylogenies Adjacencies between extant genes can be partitioned into homologous families. Two adjacencies a_1, a_2 and b_1, b_2 are homologous if a_1 and b_1, respectively a_2 and b_2, have a common ancestor i_1, respectively i_2, such that i_1 and i_2 are in a different gene tree or, if they are in the same gene tree, one is not an ancestor of the other. This relation is transitive, yielding a partition of the full set of input adjacencies into families. The time interval between two consecutive internal nodes of a species tree in this total order (one is not necessarily the descendant of the other) defines a /time slice/. Genome rearrangement events that co-occur within the same time slice are said to be /synchronous/. Problem 12.2: Given a species tree, a homologous family of adjacencies with its two reconciled gene trees and a cost function d(A, B) for adjacency gains and breakages between two adjacency sets A and B, find for each internal vertex v of T an adjacency set A_v such that the measure D(T) = \sum_{(u,v) \in E(T)} d(A_u,A_v) is minimized. Definition 12.1: Given two gene trees T, T', any two gene tree nodes a \in T and b \in T', let c_1(a, b) be the minimum cost of a history for the two gene subtrees rooted at a and b, assuming there is an adjacency between a and b, and let c_0(a, b) be the minimum cost of a history for two gene subtrees rooted at a and b, assuming there is no adjacency between a and b. Proposition 12.1: The minimum cost of a history of two gene tree nodes a, b dependents only the cost of their children and the costs assigned at the two nodes themselves. This allows to solve Problem 12.2 using a dynamic programming scheme that propagates costs from the leaves of the tree to the root (bottom-up). The approach has been implemented in the algorithm DeCoSTAR. It features 19 propagation rules that score adjacency gains and breakages according to all possible scenarios that can occur due to gene gain/loss events recorded by gene trees. Definition 12.2: Two gene tree nodes a and b (from the same gene tree or not) are said to be /comparable/ if they are in the same species, if they are synchronous, and if one is not an ancestor of the other. Otherwise they are /incomparable/. Both, synchronous and asynchronous events must be taken into account, which results in 19 propagation rules described in Duchemin, et al. "DeCoSTAR: Reconstructing the ancestral organization of genes or genomes using reconciled phylogenies." Genome biology and evolution 9.5 (2017): 1312-1319, Table 1.