>>> RECALL
2.2 A 2-approximation algorithm for the reversal distance (see Kececioglu and
Sankoff, 1992)
Definition 2.2: For a permutation π = (π_1 π_2 ··· π_n), two elements π_i and
π_{i+1} form a /breakpoint/ (BP), if |π_i − π_i+1| > 1 and otherwise an
/adjacency/ (ADJ).
Ex.: in π^2 are (1 5), (5 3) breakpoints
... but so are (2) and (4)!
-> add 0, n+1 to beginning and end of permutation.
Observation 2.1: There are at most n+1 breakpoints and the only permutation
without breakpoints is the identity.
<<<
Definition 2.2 (cont'd): The number of breakpoints in permutation \pi is denoted
by b(\pi)
Idea: Apply a reversal that reduces b(.) in every step.
Ex.: π^4 = (2 3 1 4 6 5)
π^4'= (0|2 3|1|4|6 5|7) # b(.) = 5
^---^
(0|2 3|1|4 5 6 7) # b(.) = 3
^-----^
(0 1|3 2|4 5 6 7) # b(.) = 2
^---^
(0 1 2 3 4 5 6 7) # b(.) = 0
Observation 2.2: An algorithm that always reduces b(.) by one in each step is a
2-approximation.
Proof: Any reversal can eliminate at most 2 breakpoints (one at the left end and
one at the right end), therefore OPT(\pi) <= b(\pi)/2. Thus,
r = A(\pi)/OPT(\pi) <= 2 if A(\pi) >= b(\pi).
However, it is not always possible to reduce b(.):
π^5 = \(0|4 5 6|1 2 3|7)
Any permutation can be partitioned into increasing strips (overlined) and
decreasing strips (underlined). A strip with one element is increasing for 0 and
n+1, decreasing otherwise.
π^6 = (^0^_21_^345^^78^_6_^9^)
Lemma 2.2: If there is at least one decreasing strip, there is a reversal that
reduces the number of breakpoints.
Proof: Consider the smallest element k in all decreasing strips. The element k
− 1 must be in a increasing strip.
All the strips in permutation π^5 are increasing. What can we do to guarantee
that we can decrease the number of breakpoints in such a case?
A reversal of an increasing strip (b(.) does not change) produces a decreasing
strip.
(^0^_654_^123^^7^)
Theorem 2.1: Let \pi be a permutation with a decreasing strip. If every
reversal that reduces b(\pi) leaves a permutation with no decreasing strips, \pi
accommodates a reversal that reduces b(\pi) by two. (Exercise: Prove!)
Algorithm 2.2 (greedy):
Input: \pi
Output: reversal distance rd(\pi), sorting scenario (\rho_1, \rho_2, ...,
\rho_d)
d <- 0
while \pi contains a breakpoint do
d <- d + 1
Let \rho_d be a reversal that removes the most breakpoints of \pi,
resolving ties among those that remove one breakpoint in favor of
reversals that leave a decreasing strip
\pi <- \pi \circ \rho_d
end
return d, (\rho_1, \rho_2, ... \rho_d)
Lemma 2.3: Algorithm 2.2 sorts sorts every permutation \pi in at most b(n)
reversals.
Best approximation ratio: 11/8, Berman, Hannenhalli, and Karpinski 2002
3. The signed reversal distance
Definition 3.1: A signed permutation is a permutation on the set {1, . . . , n}
in which every element has an orientation, indicated by a sign "+" or "-". To
simplify, the "+" is usually omitted.
Example: \pi^1=(-2 -1 4 3 5 -8 6 7 9)
Application in genome rearrangement studies:
A chromosome is a DNA molecule composed of antiparallel strands that can be read
in either of the two possible directions. A /gene/ is associated with an
interval on a DNA strand hand has a /reading direction/ (5'-to-3' or
left-to-right, by convention).
(draw genome above in arrow notation)
Definition 3.1 (cont'd): By convention, a permutation of size n representing a
chromosomal sequence with n genes is bordered by 0 and n+1.
Definition 3.2: In a signed permutation, a pair of consecutive elements i·(i +1)
or -(i +1)·-i is called an /adjacency/ (ADJ) and otherwise a /breakpoint/ (BP).
Definition 3.3: A /reversal/ of an interval in a signed permutation reverts the
order and sign of all elements of the interval.
Let's sort this signed permutation
\pi^2 = (0 -3 -4 1 -5 -2 6)
\pi^2 \circ \rho(5,6) = (0 -3 -4 1 2 5 6)
.. \circ \rho(3,5) = (0 -3 -2 -1 4 5 6)
.. \circ \rho(2,4) = (0 1 2 3 4 5 6)
Problem 3.1 ("Reversal Distance"): Given two signed permutations \pi and \sigma,
find srd(\pi, \sigma), the minimum number of reversals needed to transform \pi
into \sigma
(again, we assume that \sigma is always the identity and use abbreviated
notation srd(\pi) := srd(\pi, id)
First linear time algorithm solving Problem 3.1 by Bader and Moret (2001)
-> srd(\pi^2) = 3 (optimal, thus a solution to Problem 3.1)
3.1 A tight lower bound for srd(\pi)
Definition 3.4: The /breakpoint graph/ of a signed permutation \pi is the graph
BG(\pi) = (V, E), whose vertex set V contains, for 1 \leq g \leq n, two vertices
g^t and g^h called the /tail/ and the /head/ of gene g, plus two vertices 0^h
and n+1^t. The edge set E is the union of two perfect matchings R and D of V:
- "reality edges" R contains edge from \pi_i^h if \pi_i is non-negative, and
from \pi_i^t otherwise, to \pi_i^t if \pi_{i+1} is non-negative, and to
\pi_{i+1}^h otherwise, for 0 \leq i \leq n.
- "desire edges" D := {{g_h, (g+1)_t} | 0 \leq g \leq n } (adjacencies of the
identity) --> Question, how would BG(id) look like?
(BG(\pi^2) drawn, using two different colors for the two matchings R and D)