5. Algebraic theory for genome rearrangements 5.1 Composition of permutations Have you ever wondered what the '\circ' operator does? What does \pi \circ \rho mean? '\circ' is the composition (multiplication) operator of permutations which is, just as the composition of functions, applied from right to left, i.e. in composition \pi \circ \sigma, we first apply sigma (to (1 ... n)), and then \pi, resulting in the permutation (\pi_{\sigma_1} ... \pi_{\sigma_n}) ex.: \pi = (2 1 4 3), \sigma = (1 4 2 3) \pi \circ \sigma = (\pi_1 \pi_4 \pi_2 \pi_3) = (2 3 1 4) Then a reversal \rho(2, 3) means = (1 3 2 4) and \pi \circ \rho(2, 3) = (2 4 1 3) The set of all permutations of n symbols, denoted \Pi_n, is a group (\Pi_n, \circ) under composition operator \circ. Its neutral element is (1 ... n) and the inverse of a permutation (\pi_1 \pi_2 .. \pi_n) is obtained by exchanging positions and elements in \pi, i.e. \pi^-1_{\pi_i} = i. Note that in general \pi \circ \sigma != \sigma \circ \pi 5.2 Cycles of a permutation A permutation can also represented by a /composition of one or more cycles/: Definition 5.1: A /cycle/ of a permutation \pi, denoted by C=(i_1, ..., i_n) is a set of elements such that for 1 \leq j \leq n-1, \pi_{i_j} = i_{j+1) and \pi_{i_k} = i_1. IMPORTANT: we distinguish between a permutation and its cycle decomposition by delimiting elements in the latter by ',' Theorem 5.2: Every permutation has a unique disjoint cycle decomposition (up to the order of cycles and the rotation of elements within cycles). ex.: the disjoint cycle composition of \sigma is: 1 2 3 4 5 6 7 8 \pi^1 = (2 1 4 3 5 8 6 7) = (1, 2) (3, 4) (5) (6, 7, 8) (cycle graph of \pi^1 drawn) Note: - (1 2 3) != (1, 2, 3) but (1, 2, 3) = (2, 3, 1) = (3, 1, 2) - the inverse of a cycle is the cycle in which elements have reversed order, the neutral element is (1) ... (n) - cycles with a single element are usually omitted from the cycle representation e.g. \pi^1 = (1, 2) (3, 4) (6, 7, 8) Permutations can be also decomposed into multiple (non-disjoint) cycles. Alternative representations of the cycle \pi^2 = (1, 2, 3, 4, 5) include (1, 2, 3) (3, 4, 5) (1, 2) (2, 3) (3, 4) (4, 5) (1, 5) (1, 4) (1, 3) (1, 2) The order of non-disjoint cycles is no longer arbitrary i.e. (1, 2) (2, 3) != (2, 3) (1, 2) How to obtain the disjoint cycle decomposition of a sequence of sequence of cycles? - Algorithm 5.1 --------------------------------------------------------------- Input: collection of cycles \mathcal C = C^1, C^2, .. Output: collection of disjoint cycles \mathcal C' = C^1', C^2', .. 1: i_1 = 1 2: let \mathcal C' be an empty collection of cycles 3: add new cycle C' = (i_1, ) to \mathcal C'. 4: for j = 2 .. n do 5: i_j = i_{j-1} 6: traverse from right to left through all cycles C of \mathcal C, apply in each cycle C transformation i_j <- C_{k+1} if and only if C_k = i_j 7: if i_j = C_1': 8: let i_j be the next higher number not contained in any cycle of \mathcal C' 9: associate variable C' with a new cycle (i_j, ) 10: add C' to \mathcal C' 11: else: 12: append i_j to cycle C' 13: end 14: return \mathcal C' ------------------------------------------------------------------------------- ex.: \pi^3 = (1, 5, 3, 2) \pi^4 = (1, 4) (3, 5), \pi^3 \pi^4 = (1, 5, 3, 2) (1, 4) (3, 5) = (1, 4, 5, 2) 5.3 Algebraic theory meets genome rearrangements (or: going beyond reversals) Definition 5.2: A /genome/ is an (unsigned) permutation π in disjoint cycle notation where each cycle corresponds to a circular chromosome. ex.: \pi^5 = (1, 4, 5) (2, 3) is a genome with two chromosomes. A rearrangement in a genome π can be modeled by a product with a permutation ρ. ex.: \rho = (2, 4, 5) \rho \pi^2 = (2, 4, 5) (1, 2, 3, 4, 5) = (1, 4, 2, 3, 5) \rho is a transposition of of the blocks [2, 3] and [4] Applying a 2-cycle ρ = (a, b) to a genome has the following effect: - Fission: if a and b are in the same cycle, this cycle is split in two, separating a and b - Fusion: if a and b are in different cycles, the cycles are joined in one ex.: fission: (4, 5) \pi^5 = (1, 5) (2, 3) (4) fusion: (2, 5) \pi^5 = (1, 4, 2, 3, 5) 5.4 Rearrangement power of permutations Definition 5.3: The /norm/ of a permutation \pi, denoted by ||\pi||, is the minimum number of 2-cycles needed to decompose \pi. The norm of a permutation can be seen as a measure of its rearrangement power. Observation 5.1: (i) The norm of a cycle with k elements is k−1. (ii) The norm of a permutation \pi with n elements and c cycles is ||\pi|| = n − c. ex.: ||\pi^5|| = 5 - 2 = || (1, 4) (4, 5), (2, 3)|| = 3 Problem 5.1 (Algebraic Rearrangement Problem): Given genomes π and σ, find permutations ρ_1, ρ_2, ..., ρ_k that transform π into σ such that the algebraic distance ad(π, σ) := \sum_i=1^k || ρ_i || is minimum. How to compute ad(\pi, \sigma)? \rho_k ...\rho 2 \rho_1 \pi = \sigma <=> \rho_k ...\rho 2 \rho_1 = \sigma \pi^-1 => ||\rho_k ...\rho 2 \rho_1|| = ||\sigma \pi^-1|| \sum_{i=1}^l ||p_i|| >=||\sigma \pi^-1|| (norm property) ad(π, σ) >= ||\sigma \pi^-1|| Observation 5.2: We can obtain the rearrangement operations by decomposing σπ^−1 and ||σπ^−1|| serves as lower bound of ad(\pi, \sigma). Thus, a solution to Problem 5.1 is to find a minimal 2-cycle decomposition ρ_k ... ρ_2 ρ_1 of \sigma \pi^-1. Then, by definition, ad(\pi, \sigma) = ||\sigma \pi^-1 || = k-1. Each 2-cycle corresponds to a fusion, fission. A fission followed by a fusion can model a transposition: ex.: \pi^6 = (1,3,4,2,5,6) (2,4) (4,6) \pi^6 = (2, 4, 6) \pi^6 // transposition of interval [25] Observation 5.3: If elements a, b and c are in the same cycle in π and appear in this order, then ρ = (a b c) is a transposition in π. But, in terms of the algebraic distance, a fusion will still cost 1 fusion and 1 fission. Therefore the algebraic distance ad(\pi, \sigma) = || \sigma \pi^-1 || is also known as fusion, fission and transposition (FFT) distance.