Literature:
- Landau, G. M., Parida, L., & Weimann, O. (2005). Gene proximity analysis
across whole genomes via PQ trees. Journal of Computational Biology : a
Journal of Computational Molecular Cell Biology, 12(10), 1289–1306.
http://doi.org/10.1089/cmb.2005.12.1289
- Bergeron, A., Chauve, C., de Montgolfier, F., & Raffinot, M. (2008).
Computing common intervals of K permutations, with applications to modular
decomposition of graphs. SIAM Journal on Discrete Mathematics, 22(3),
1022–1039. http://doi.org/10.1137/060651331
------------------------------------------------------------------------------
1. Synteny Hierarchies
Synteny Hierarchies can be represented by PQ-trees
------------------------------------------------------------------------------
2. PQ-Trees
Def.: A PQ-tree on set V is a tree whose leaves are labeled from 1 to |V| and
whose internal nodes are labeled P-nodes or Q-nodes. A P-node must have at least
two children, and a Q-node must have at least three children. The children of a
P-node are unordered, and the children of a Q-node are totally ordered.
Example: The PQ-tree of permutations P1 = (1..8) = ID and P2 = (2 1 5 3 4 8 6 7)
is shown below.
_____Q_____
| __P_ _P_
_P_ | _P_ | _P_
| | | | | | | |
P2 = (2 1 5 3 4 8 6 7)
In the following, we will always assume that the identity permutation is part of
the studied collection of permutations.
If a binary matrix obeys the consecutive ones property (C1P), then it can be
represented by a PQ-tree. The PQ-tree represents the class of all admissible
permutations under which the matrix is C1P.
Example:
C1P Matrix:
___________________|_1_|_2_|_3_|_4_|_5_|_6_|_7_|_8_|
O_1 = {3,4,5,6,7,8}| 1 1 1 1 1 1 |
O_2 = {1,2,3,4,5} | 1 1 1 1 1 |
O_3 = {1,2} | 1 1 |
O_4 = {3,4,5} | 1 1 1 |
O_5 = {3,4} | 1 1 |
O_6 = {6,7,8} | 1 1 1 |
O_7 = {6,7} | 1 1 |
We will now study an algorithm that allows the construction of a PQ-tree from a
collection of k permutations of size n in optimal time (O(kn)), if such exists.
Observe that there is a direct relation between a PQ tree and certain
collections of intervals of the studied permutations:
A PQ-tree is a generator for the following sets:
1. Trivial intervals: (1), .. (n), and (1..n)
2. The (single) interval represented by the union of all intervals generated by
its children.
3. Intervals of a Q node: Union of any consecutive subset of the intervals of
its children.
These sets correspond to collections of intervals in the given set of
permutations that satisfy the following property:
Def. (Common Intervals): Intervals (l1, r1),..,(lk, rk) of k permutations P1, ..
Pk are /common/, if P1[l1,r1] = .. Pk[lk,rk].
The common intervals of permutation P2 are:
P2 = (2 1 5 3 4 8 6 7)
|_______________| <- "root" interval
|_|_|_|_|_|_|_|_| <- singletons
|___________|
|_________|
|_____|_____|
|___| |___| |___|
If the set of common intervals C of a permutation is /closed/, i.e. If two
intervals (i1, j1) (i2, j2) overlap, i.e i1 < i2 and (i1 < i2 or j1 < j2), then
also (i1, j2), (i1, j2), (i2, i1), (j1, j2) \in C.
i1 j1
|__________|
i2 j2
|___________|
=> |________________|
|____| |_____|
|_____|
A straightforward strategy would be to (i) identify all common intervals of a
permutation, then (ii) start with the most confined PQ tree (which corresponds
to a single P-Node whose children are the 1..n leaves) that can only generate
the trivial common intervals that are part of every set of permutations (i.e.,
{(1..n), (1), (2), ..., (n)}). Subsequently, (iii) refine the tree by iterating
through the set of intervals and adding additional internal vertices to the
PQ-tree. In each iteration, the tree will be the most confined tree that can
generate the intervals that have been observed so far.
Even if all common intervals could be found in optimal time and the tree be
refined in constant time, the algorithm would be in O(n^2). That's because of
the possible number of common intervals, which is in O(n^2).
But: not all intervals are required! The set of common intervals can be
partitioned "overlap classes":
Def. (commuting and overlapping intervals): Two intervals A and B /commute/ if A
\subseteq B or B \subseteq A or A and be are disjoint, and otherwise, they
/overlap/.
Def. (overlap class) An /overlap class/ is an equivalence class formed by the
transitive closure of the overlap relation within a given set of intervals.
The overlap classes of a set of all common intervals of a permutation can be
organized as follows:
Def.: An overlap class holding only a single member is /trivial/, and
/non-trivial/ otherwise.
Def. The intervals corresponding to trivial overlap classes are /strong/.
Lemma: The set of strong (common) intervals of k permutations is in bijection
the vertices of their PQ-tree.
Obs.: The set of strong intervals commutes.
The set of strong intervals of permutation P2 is (1..8), {all singletons},
(3,4,5), (6,7,8), (1,2), (3,4), and (6,7).
P2 = (2 1 5 3 4 8 6 7)
|_______________|
|_|_|_|_|_|_|_|_|
|_____|_____|
|___| |___| |___|
Obs.: A PQ-tree is an inclusion tree.
Inclusion trees can be build in time linear to their number of intervals
(vertices).
Algorithm 1 (Construction of an inclusion tree)-------------------------------
Input: Set F of commuting intervals
Output: Inclusion tree of F
1. Bucket-sort in decreasing order the intervals of F according to their right
bound
2. Bucket-sort in increasing order the intervals of F according to their left
bound
3. Let I1..Im be the list of sorted intervals
4. F <- I1 // I1 = V is the root
5. k <- 2
6. While k ≤ m
7. If Ik ⊂ F
8. Parent(Ik) <- F
9. F <- Ik
10. k <- k+1
11. Else
12. F <- Parent(F)
------------------------------------------------------------------------------
Labeling the internal vertices of the PQ tree can be done by the following rule
set:
1. If v has size 2, label it P
2. Otherwise, test if the interval represented by the first two of its children
is a common interval: If so, label it Q, otherwise P.
Thus, all there is left, is to identify strong intervals.
------------------------------------------------------------------------------
2. Generators of common intervals
Def. A generator for the common intervals of a set of permutations P is a pair
(R, L) of vectors of size n such that:
1. R[i] ≥ i and L[j] ≤ j for all i,j ∈ {1,2,...,n},
2. (i..j) is a common interval of P if and only if (i..j) = (i..R[i]) ∩
(L[j]..j), or, equivalently L[j] <= i <= j <= R[i].
There are many possible generators, here is one:
Def.: Let P = (p1,..,pn ) be a permutation of size n. For each element pi, we
define two intervals containing pi:
- IMax[pi] is the largest *set* of elements ≥ pi that forms an interval
around pi in P.
- IMin[pi] is the largest *set* of elements ≤ pi that forms an interval
around pi in P.
And we define the following two integer vectors:
- Sup[pi] is the largest integer such that (pi..Sup[pi]) ⊆ IMax[pi];
- Inf[pi] is the smallest integer such that (Inf[pi]..pi) ⊆ IMin[pi].
The pair of vectors (Sup, Inf) is a generator for the common intervals of a
permutation P.
Example: IMax and IMin, Sup, and Inf of permutation P2 are
IMax Sup IMin Inf
p[1,8]= (1,2,3,4,5,6,7,8) 1 [8] p[2,2]= (1) 1 [1]
p[1,1]= (2) 2 [2] p[1,2]= (1,2) 2 [1]
p[3,8]= (3,4,5,6,7,8) 3 [8] p[4,4]= (3) 3 [3]
p[5,8]= (4,6,7,8) 4 [4] p[4,5]= (3,4) 4 [3]
p[3,3]= (5) 5 [5] p[1,5]= (1,2,3,4,5) 5 [1]
p[6,8]= (6,7,8) 6 [8] p[7,7]= (6) 6 [6]
p[8,8]= (7) 7 [7] p[7,8]= (6,7) 7 [6]
p[6,6]= (8) 8 [8] p[1,8]= (1,2,3,4,5,6,7,8) 8 [1]
Lemma: Let (R1, L1 ) and (R2, L2) be generators for the common intervals of two
sets A1 and A2 of permutations. The pair (min(R1, R2), max(L1, L2)) is a
generator for the common intervals of A1 ∪ A2.
Example: (R1, L1) and (R2, L2) are generators of permutations P2 and
P3 = (1 3 2 4 5 7 6 8) are
R1 L1 R2 L2 R=min(R1, R2) L=max(L1, L2)
1 [8] [1] [8] [1] [8] [1]
2 [2] [1] [6] [2] [2] [2]
3 [8] [3] [6] [3] [3] [3]
4 [4] [3] [6] [3] [4] [3]
5 [5] [1] [6] [3] [5] [1]
6 [8] [6] [6] [3] [8] [6]
7 [7] [6] [7] [3] [7] [6]
8 [8] [1] [8] [1] [8] [1]
Proof. Interval (i..j) is a common interval of A1 ∪ A2 if and only if it is a
common interval of both A1 and A2, which is equivalent to L1[j] ≤ i ≤ j ≤ R1[i]
and L2[j] ≤ i ≤ j ≤ R2[i] and finally to max(L1[j],L2[j]) ≤ i ≤ j ≤ min(R1 [i],
R2 [i]).
Given IMin, Inf can be computed in linear time with this simple algorithm:
Algorithm 2 (Construction of Inf from IMin)-----------------------------------
1. Inf[k] <- k for k=1..n
2. For k from 2 to n
3. While Inf[k] − 1 is in IMin[k]
4. Inf[k] <- Inf[Inf[k] − 1]
------------------------------------------------------------------------------
(A similar algorithm can be designed for the computation of Sup from IMin)
We will now aim to use the set of common intervals defined by R and L to find
the strong intervals of the given set of permutations. But there are still two
problems:
1. Not all intervals of R and L are necessarily common intervals.
2. Recall that the set of strong intervals commutes. The set of intervals
defined by R commutes, same holds for L. However, their union doesn't!
------------------------------------------------------------------------------
2.1 Fixing Problem 1
Example: Intervals of R and L of permutations P2 and P3:
Intervals of R: Intervals of L:
1 2 3 4 5 6 7 8 1 2 3 4 5 6 7 8
1 --------------- 1 -
2 - 2 -
3 - 3 -
4 - 4 --- <-- interval is not common!
5 - 5 ---------
6 ----- 6 -
7 - 7 ---
8 - 8 ---------------
Generators are called /canonical/ if each interval of R or L is a common
interval:
Def.: A generator (R, L) for a closed family of common intervals F is
/canonical/ if, for all i=1..n, intervals (i..R[i]) and (L[i]..i) belong to F.
We can always construct a canonical version of any generator by processing R and
L, independently. To find a canonical variant of R, we apply the following
strategy:
1. Iterate through each interval I_i = (i..R[i]), 2 <= i <= n in *decreasing*
order.
2. If I_i is not common:
3. Truncate I_i's right border to the right border of the largest subinterval
I_j, j > i if such exists, otherwise set R[i] = i.
(Same can be done for L)
But there is a faster way, which uses the /support/ of R and L respectively.
Def. The /support/ of vector R is a vector Support_R that refers at each
position Support_R[i] to the index i' of the smallest interval (i'..R[i']) that
is a super interval of (i..R[i]) and is undefined if no such interval exists.
Example:
Intervals of R: Support_R
1 2 3 4 5 6 7 8
1 --------------- /
2 - 1
3 - 1
4 - 1
5 - 1
6 ----- 1
7 - 6
8 - 6
We make the following observation:
- The support for an interval of R is only undefined for its first interval,
which is always (1..n).
- The support for interval (i..R[i]) is must be the interval corresponding to
the *highest index* i' < i s.t. (i..R[i]) \subset (i'..R[i']).
Support_R can be computed in O(n) (No proof).
Using support, we can compute the canonical vector of R in linear time, using
the following algorithm:
Algorithm 3 (Construction of canonical variant of vector R)-------------------
Input: Support_R, R, L
Output: canonical vector R'
1. R' <- [1..n]
2. R'[1] <- n
3. For k from n to 2
// test if (Support_R[k]..R'[k]) is a common interval
4. If L[k] <= Support_R[k] <= R'[k]) <= R[k]
5. R'[Support_R[k]] <- max(R'[k],R'[Support_R[k]])
------------------------------------------------------------------------------
(A similar algorithm can be designed for computing Support_L)
The correctness of the algorithm can be derived by the facts that
1. (k..R'[k]) is the largest common interval with rightmost bound k by the time
of the k^th iteration, and
2. once the largest common interval for R'[k'] is found, it will never be
truncated (the max(.) function in line 5 ensures that).
Example: Intervals of canonical R and L of permutations P2 and P3:
Intervals of canonical R: Intervals of canonical L:
1 2 3 4 5 6 7 8 1 2 3 4 5 6 7 8
1 --------------- 1 -
2 - 2 -
3 - 3 -
4 - 4 -
5 - 5 ---------
6 ----- 6 -
7 - 7 ---
8 - 8 ---------------
------------------------------------------------------------------------------
2.2 Fixing Problem 2
Lemma: A trivial overlap class of interval set {(i..R[i]) | i=1..n} \cup
{(L[i]..i) | i=1..n} is a strong interval of the given set of permutations.
Example: Intervals of canonical R and L of permutation P2:
Intervals of canonical R: Intervals of canonical L:
1 2 3 4 5 6 7 8 1 2 3 4 5 6 7 8
1 --------------- 1 -
2 - 2 ---
3 ----------- 3 -
4 - 4 ---
5 - 5 ---------
6 ----- 6 -
7 - 7 ---
8 - 8 ---------------
We make the following observation:
Observation: Let (R,L) be the canonical generator of a closed family of
intervals. We have the following: (1) if (i..R[i]) overlaps (L[j]..j) and
(L[j]..j), then L[j] = L[j]; and (2) if (L[j]..j) overlaps (i..R[i]) and
(i..R[i]), then R[i] = R[i].
The following lemma shows that the overlaps generate strong intervals of the
given permutation:
Lemma: Let (R,L) be the canonical generator of a closed family of intervals F,
and let C be a nontrivial overlap class containing (i1..R[C]), ... , (ik..R[C])
and (L[C]..j1), ... , (L[C]..jl), with i1 <···< ik and j1 <···< jl. Then k=l,
and for all a ∈ (1..k), (ia..ja) is a strong interval of F.
Theorem: The set of intervals given by the union of trivial overlap classes of R
and L, and the set of strong intervals constructed from the overlaps of
non-trivial overlap classes is equal to the family of strong intervals of a
closed family of intervals.
------------------------------------------------------------------------------
2.3 Algorithm for enumerating all strong intervals
The above theorem motivates the following simple strategy for enumerating all
strong intervals of a closed family of intervals.
1. Sort the 4n bounds of intervals of the families (i..R[i]) and (L[j]..j) for
i, j ∈ (1..n) in increasing order, with the left bounds placed before the
right bounds when they are equal.
2. Apply Algorithm 4
Example: The 4n bounds of intervals for permutation P2:
0(,0(,0(,0(,0(,0),1(,1),1),2(,2(,2(,2),3(,3),3),4(,4),4),5(,5(,5(,5),6(,6),6),
7(,7),7),7),7),7)
Algorithm 4 (Computation of the strong intervals)-----------------------------
Input: 4n bounds of intervals of the families (i..R[i]) and (L[j]..j)
Output: Set of strong intervals
(S is a stack of bounds; s denotes the top of S. )
1. For i from 1 to 4n:
2. If ai is a left bound
3. Push ai on S
4. Else
5. Output (s..ai) // Interval (s..ai) is strong
6. Pop the top of S
------------------------------------------------------------------------------