12. Ancestral reconstruction

Literature:
    - Duchemin, Wandrille, et al. "DeCoSTAR: Reconstructing the ancestral
      organization of genes or genomes using reconciled phylogenies." Genome
      biology and evolution 9.5 (2017): 1312-1319.

12.1 Small Parsimony Problem

The Small Parsimony Problem is similar to the Multiple Genome Rearrangement
Problem, except that the species tree is given:

Problem 12.1 (Small Parsimony Problem): Given a tree T, extant genomes at its
leaves, a measure of genome rearrangements d (DCJ, reversals, breakpoint, ...),
find for each internal vertex v of T a genome G_v such that the measure 
    D(T) = \sum_{(u,v) \in E(T)} d(G_u,G_v)
is minimized. 

Solving the small parsimony problem is appropriate if genome models are
considered that do not allow gene content modifications such as deletions,
insertions or duplications. Yet, such modifications can be frequently observed
in biological data. This leads to the question, how ancestral organization of
genes can be reconstructed in the face of gene loss and gain, assuming such
events are available (in form of a gene tree).

12.2 Ancestral reconstruction using reconciled phylogenies

Adjacencies between extant genes can be partitioned into homologous families.
Two adjacencies a_1, a_2 and b_1, b_2 are homologous if a_1 and b_1,
respectively a_2 and b_2, have a common ancestor i_1, respectively i_2, such
that i_1 and i_2 are in a different gene tree or, if they are in the same gene
tree, one is not an ancestor of the other. This relation is transitive, yielding
a partition of the full set of input adjacencies into families.

The time interval between two consecutive internal nodes of a species tree in
this total order (one is not necessarily the descendant of the other) defines a
/time slice/. Genome rearrangement events that co-occur within the same time
slice are said to be /synchronous/. 

Problem 12.2: Given a species tree, a homologous family of adjacencies with its
two reconciled gene trees and a cost function d(A, B) for adjacency gains and
breakages between two adjacency sets A and B, find for each internal vertex v of
T an adjacency set A_v such that the measure
    D(T) = \sum_{(u,v) \in E(T)} d(A_u,A_v)
is minimized. 

Definition 12.1: Given two gene trees T, T', any two gene tree nodes a \in T and
b \in T', let c_1(a, b) be the minimum cost of a history for the two gene
subtrees rooted at a and b, assuming there is an adjacency between a and b, and
let c_0(a, b) be the minimum cost of a history for two gene subtrees rooted at a
and b, assuming there is no adjacency between a and b.

Proposition 12.1: The minimum cost of a history of two gene tree nodes a, b
dependents only the cost of their children and the costs assigned at the two
nodes themselves. 

This allows to solve Problem 12.2 using a dynamic programming scheme that
propagates costs from the leaves of the tree to the root (bottom-up). The
approach has been implemented in the algorithm DeCoSTAR. It features 19
propagation rules that score adjacency gains and breakages according to all
possible scenarios that can occur due to gene gain/loss events recorded by gene
trees. 

Definition 12.2: Two gene tree nodes a and b (from the same gene tree or not)
are said to be /comparable/ if they are in the same species, if they are
synchronous, and if one is not an ancestor of the other. Otherwise they are
/incomparable/. 

Both, synchronous and asynchronous events must be taken into account, which
results in 19 propagation rules described in Duchemin, et al.  "DeCoSTAR:
Reconstructing the ancestral organization of genes or genomes using reconciled
phylogenies." Genome biology and evolution 9.5 (2017): 1312-1319, Table 1.