^{1}

^{2}

^{1}

^{*}

^{1}

^{2}

Edited by: Marco Pellegrini, Consiglio Nazionale Delle Ricerche (CNR), Italy

Reviewed by: Hao Wang, University of Georgia, United States; Fabio Henrique Viduani Martinez, Federal University of Mato Grosso do Sul, Brazil

*Correspondence: Max A. Alekseyev

This article was submitted to Bioinformatics and Computational Biology, a section of the journal Frontiers in Genetics

This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) or licensor are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.

Genome rearrangements are large-scale evolutionary events that shuffle genomic architectures. The minimal number of such events between two genomes is often used in phylogenomic studies to measure the evolutionary distance between the genomes. Double-Cut-and-Join (DCJ) operations represent a convenient model of most common genome rearrangements (reversals, translocations, fissions, and fusions), while other genome rearrangements, such as transpositions, can be modeled by pairs of DCJs. Since the DCJ model does not directly account for transpositions, their impact on DCJ scenarios is unclear. In the present work, we study implicit appearance of transpositions (as pairs of DCJs) in DCJ scenarios. We consider shortest DCJ scenarios satisfying the maximum parsimony assumption, as well as more general DCJ scenarios based on some realistic but less restrictive assumptions. In both cases, we derive a uniform lower bound for the rate of implicit transpositions, which depends only on the genomes but not a particular DCJ scenario between them. Our results imply that implicit appearance of transpositions in DCJ scenarios may be unavoidable or even abundant for some pairs of genomes. We estimate that for mammalian genomes implicit transpositions constitute at least 6% of genome rearrangements.

Genome rearrangements are dramatic evolutionary events that change genome structures. The number of genome rearrangements between two genomes represents a good measure for their evolutionary closeness and is used as such in phylogenomic studies. This measure is often based on the maximum parsimony assumption, implying that the evolutionary distance can be estimated as the

The most common rearrangements are

^{1}

While a transposition cannot be directly modeled by a DCJ, it can be modeled by a pair of DCJs. We refer to such pair of DCJs as an

The paper is organized as follows. In section 2, we describe graph-theoretical representation of genomes and DCJ rearrangements. In section 3, we analyze shuffling of DCJ scenarios and introduce the dependency graphs capturing their combinatorial structure. In section 4, we study the appearance of (disjoint) implicit transpositions in proper and shortest DCJ scenarios between two genomes, and prove uniform lower bounds for their rate. In section 5, we use our results to estimate the rate of implicit transpositions in DCJ scenarios between mammalian genomes and between yeast genomes. We conclude the paper with discussion in section 6.

Let

_{even}(_{odd}(_{DCJ}(

A DCJ in genome

{

{

{

{

where

For genomes _{2}(_{2}(_{even}(_{odd}(_{even}(_{odd}(

While at the ends of an even path there is always a

A DCJ scenario transforming genome

which consists of trivial cycles and trivial paths (Figure

Reconstruction of DCJs happened in the evolution between genomes of extant species represents a challenging task in comparative genomics. Such reconstruction is often based on the parsimony assumption that evolutionary DCJs (i.e., genome rearrangements) between two genomes form a

(P1) any edge once removed is never recreated (that is, in the course of evolution, each gene adjacency is either preserved, or broken and never restored);

(P2) no pair of DCJs (not necessarily adjacent) can be replaced by an equivalent single DCJ (that is, there is no obvious way to shorten the scenario);

(P3) the number of fusions and fissions does not exceed

Below we prove that shortest DCJ scenarios satisfy these properties and thus are proper. We start with recalling and proving some useful lemmas.

THEOREM 1 (Tannier et al.

^{2}

It it easy to construct a shortest DCJ scenario that uses DCJs of types (i), (ii), (iv), (v) only. Indeed, these types of DCJs define how to process existing connected components in the breakpoint graph until they all turn into trivial path/cycles. Such scenario eliminates

Now, we are ready to prove that any shortest DCJ scenario is proper.

To prove the condition (P1), we notice that if an edge (

Recall that each DCJ removes and adds some edges in a breakpoint graph. Two adjacent DCJs α and β in a DCJ scenario are called ^{3}

In a DCJ scenario, one can change the order of two adjacent independent DCJs and obtain another DCJ scenario of the same length between the same two genomes. Similarly, a pair of adjacent weakly dependent DCJs in a DCJ scenario can be replaced with another pair of weakly dependent DCJs, resulting in a new DCJ scenario of the same length between the same two genomes (Braga and Stoye,

We therefore consider the following two types of length-preserving operations, which can be applied to a pair of adjacent DCJs (α, β) in a DCJ scenario:

(T1) If α and β are independent, replace (α, β) with (β, α).

(T2) If α and β are weakly dependent, replace (α, β) with an equivalent pair of weakly dependent DCJs.

To better capture and analyze the combinatorial structure of DCJs in a DCJ scenario

The dependency graph DG(

Furthermore, any DCJ in

If (α, β) is an arc in DG(

Braga and Stoye (

_{1} and t_{2} be proper DCJ scenarios between the same two genomes. Scenario t_{1} can be obtained from scenario t_{2} with operations (T1) if and only if_{1}) = DG(_{2}).

_{1} and _{2} correspond to the same dependency graph, i.e., DG(_{1}) = DG(_{2}) = _{1} and _{2} represent topological orderings of _{1} and _{2} can be obtained from each other with operations (T1). Suppose that _{1} and _{2} start with the same _{1} = (α_{1}, α_{2}, …, α_{k}, γ, …) and _{2} = (α_{1}, α_{2}, …, α_{k}, β_{1}, β_{2}, …, β_{m}, γ, …), where γ≠β_{1} are the first DCJs different in the two scenarios. We will show that γ in _{2} can be moved to (_{1}) with operations (T1). Since β_{m} follows γ in _{1} but precedes γ in _{2}, these vertices are not connected with an arc in _{2} to obtain (α_{1}, α_{2}, …, α_{k}, β_{1}, β_{2}, …, γ, β_{m}, …). After _{1}, α_{2}, …, α_{k}, γ, β_{1}, β_{2}, …, β_{m}, …), where γ is at the same position as in _{1}. Using induction on _{1} can be obtained from _{2} with operations (T1), and vice versa.

Now, suppose that DCJ scenarios _{1} and _{2} can be obtained from each other with operations (T1). Since operations (T1) changes only the order of DCJs in the scenario but keeps the DCJs themselves intact, the dependency graph is not affected by such operations either. Therefore, DG(_{1}) = DG(_{2}). □

Let

which gives a lower bound for the number of arcs in DG(

From Theorem 1 and 7, we easily get the following statement:

COROLLARY 8.

While DCJs mimic most common genome rearrangements (reversals, translocations, fissions, fusions), more complex rearrangements such as transpositions cannot be modeled by a single DCJ. A transposition, which cuts off a segment of a chromosome and inserts it into some other place in the genome, can be modeled by a pair of weakly dependent DCJs, replacing three undirected edges with three other undirected edges on the same six vertices in the genome graph. We remark that this graph operation is also known as a 3-break rearrangement (Alekseyev and Pevzner,

Below we study how transpositions appearing in the course of evolution between two genomes may be captured by DCJ scenarios between these genomes. While a transposition constitutes a pair of DCJs, their positions in a DCJ scenario may not always be reconstructed correctly. In particular, the two DCJs forming a transposition may appear interweaved with other independent DCJs that precede or follow this transposition in the course of evolution. This inspires the following definition.

In a DCJ scenario _{1}, α_{2}, …, α_{n}), a pair of weakly dependent DCJs (α_{i}, α_{j}) forms an

Since two distinct implicit transpositions in a proper DCJ scenario

Simultaneously recovering DIT(

From the definition of an implicit transposition it follows that an implicit transposition formed by a pair of DCJs (α, β) in a proper DCJ scenario

_{1}, α_{2}) is a shortcut in G if and only if there does not exist no topological ordering of G in which α_{1} and α_{2} are adjacent

Now, we prove that if an arc (α_{1}, α_{2}) is not a shortcut then there exists a topological ordering of _{1}, α_{2}) are adjacent. Let _{1}, …, β_{k}, α_{1}, γ_{1}, …, γ_{m}, α_{2}, δ_{1}, …, δ_{w}) be any topological ordering of _{i} such that there is a directed path from α_{1} to γ_{i}, and let _{1}, γ_{2}, …, γ_{m}} \ _{1} to a vertex _{2}. Hence, we construct a new topological ordering _{1}, α_{2}) are adjacent. □

_{1}, α_{2}) ∈ M, DCJs α_{1} and α_{2} are adjacent in t

Let |_{1}, α_{2}) be an arc in _{1},α_{2})}. Let _{1}, α_{2}) and gluing vertices α_{1}, α_{2} into a new single vertex β. Since the arc (α_{1}, α_{2}) is not a shortcut, such contraction of arc (α_{1}, α_{2}) cannot created a cycle in _{1} and γ_{2} are adjacent in

We obtain _{1}, α_{2}. It is easy to see that such

For a directed graph

We will need the following lemma.

_{l} be the set of vertices _{0} contains all the sources. Let _{0}| be the number of sources.

From the definition, it follows that each vertex from _{l} for _{l−1}. Let us fix one such incoming arc for each vertex from _{l}, and consider the subgraph

From the definition of _{l−1} and _{l} for some _{l}. Therefore,

The following theorem gives a uniform lower bound for

COROLLARY 14.

_{2}(_{2}(

□

In this section we focus on shortest DCJ scenarios, which represent a special case of proper DCJ scenarios. For shortest DCJ scenarios, we can refine the uniform lower bound for the rate of implicit transposition given in Corollary 14.

Let ^{4}

THEOREM 15 (Shao et al.

By Theorem 15, DG(

Similarly to Corollary 14, from Theorem 16 we can immediately derive a better lower bound for

COROLLARY 17.

In this section, we estimate the rate of implicit transpositions recovered from pairwise DCJ scenarios between mammalian genomes, and between yeast genomes. For each pair of genomes, we use Corollary 14 and Corollary 17 for proper and shortest DCJ scenarios, respectively, to compute the lower bound for the rate of disjoint implicit transpositions between these genomes.

We analyze a set of three mammalian genomes:

The results in Table

Lower bounds for the rate of disjoint implicit transpositions between pairs of mammalian genomes among

Human and macaque | 106 | 0.06:0.06 | 0.09:0.10 | 0.25 |

Human and rat | 707 | 0.10:0.11 | 0.15:0.17 | 0.26 |

Macaque and rat | 701 | 0.09:0.10 | 0.15:0.17 | 0.28 |

We also analyze a set of five yeast genomes:

Lower bounds for the rate of disjoint implicit transpositions between pairs of yeast genomes among

Ago and Kla | 359 | 0.15 | 0.25 |

Ago and Kth | 247 | 0.14 | 0.23 |

Ago and Skl | 215 | 0.13 | 0.20 |

Ago and Zro | 317 | 0.14 | 0.23 |

Kla and Kth | 272 | 0.14 | 0.23 |

Kla and Skl | 238 | 0.12 | 0.20 |

Kla and Zro | 342 | 0.14 | 0.24 |

Kth and Skl | 69 | 0.06 | 0.11 |

Kth and Zro | 193 | 0.11 | 0.17 |

Skl and Zro | 158 | 0.10 | 0.15 |

The present work continues the study of the combinatorial structure of DCJ scenarios from the perspective of simple shuffling operations, each affecting only a pair of consecutive DCJs (first introduced in Braga and Stoye

Recently it was shown (Jiang and Alekseyev,

In the present work, we study how evolutionary transpositions may implicitly appear in DCJ scenarios and prove a uniform lower bound for their rate. Since transpositions are rather powerful rearrangements, it is not surprising that they may appear in a significant proportion that cannot be easily bounded in rearrangement scenarios between some genomes. Even though we do not yet have a recipe for limiting the effect of transpositions in the combined DCJ (2-break) and 3-break model (for which the weighting approach was proved to be a failure by Jiang and Alekseyev

Our analysis of mammalian genomes demonstrates that the lower bound for the (disjoint) implicit transposition rate is consistent with the estimation for the transposition rate obtained with statistical methods (Alexeev et al.,

In the future work, we plan to extend our method to support other evolutionary events such as gene deletions/insertions and duplications. This will increase the accuracy and make the method applicable to genomes (such as plants) whose evolutionary history is rich in such events.

All authors listed have made a substantial, direct and intellectual contribution to the work, and approved it for publication.

The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Some preliminary results of the present work appeared in the proceedings of the 2nd International Conference on Algorithms for Computational Biology (Jiang and Alekseyev,

^{1}While not all 3-breaks represent transpositions, they provide a convenient model for analysis of transpositions and other transposition-like rearrangements. In the present study, we adopt this model and commonly refer to 3-breaks as (generalized) transpositions.

^{2}Bergeron et al. (

^{3}Such DCJs are called

^{4}The study (Shao et al.,