^{*}

Edited by: Mariana Benítez, Masaryk University, Czech Republic

Reviewed by: Dongying Gao, University of Georgia, USA; Pedro T. Monteiro, Instituto Gulbenkian de Ciência, Portugal

*Correspondence: David A. Rosenblueth, Departamento de Ciencias de la Computación, Instituto de Investigaciones en Matemáticas Aplicadas y en Sistemas, Universidad Nacional Autónoma de México, Apdo. 20-726, 01000 México D.F., México. e-mail:

This article was submitted to Frontiers in Plant Genetics and Genomics, a specialty of Frontiers in Plant Science.

This is an open-access article distributed under the terms of the

Model checking is a well-established technique for automatically verifying complex systems. Recently, model checkers have appeared in computer tools for the analysis of biochemical (and gene regulatory) networks. We survey several such tools to assess the potential of model checking in computational biology. Next, our overview focuses on direct applications of existing model checkers, as well as on algorithms for biochemical network analysis influenced by model checking, such as those using binary decision diagrams (BDDs) or Boolean-satisfiability solvers. We conclude with advantages and drawbacks of model checking for the analysis of biochemical networks.

A basic conviction in computational biology is that it should be possible to create computational tools allowing us to considerably increase our understanding of the functional properties of living organisms.

Model checking is a verification technique allowing us to determine whether or not a system model meets a specification. As compared with other verification techniques, model checking has a number of features making it an industrial-strength methodology. Model checking is, for example, routinely used in the design of integrated circuits. Model checking inventors, moreover, were distinguished with the A. M. Turing award in 2007.

The verification process in a model checker often uses a graph-search algorithm accumulating, in a set, system states having a desired property. Model checkers typically do not represent the elements of such sets explicitly, but implicitly, with techniques named

Graph-traversal in model checkers should be contrasted with that of simulators. A simulator visits system states in the

The possibility of having more than one successor for a state translates to

It might seem at first sight that model checking is an ordinary method for exhaustive graph-traversal. Such methods are often shunned as they are subject to the “state-explosion problem” and can readily become intolerably inefficient as the size of a network increases. Although model checkers do perform exhaustive search (Emerson,

Model checking is most often employed in the analysis of state-transition systems. The reason is that initially model checking was applied to “Kripke structures,” which can be regarded as such systems. Variables in such structures are Boolean, and time is discrete. Model checking, nonetheless, has since been extended to numerous other kinds of models. For example, by adding probabilities, a Kripke structure can be regarded as either a discrete- or continuous-time Markov chain or even a Markov decision process. Logics and algorithms have been developed for model-checking such processes (Hansson and Jonsson,

Therefore, although the most direct use of model checking in biochemical networks would be in discrete models, other kinds of model are also potentially amenable to be model-checked.

In the sequel, we will encounter model checking employed in various ways for the analysis of biochemical networks. Perhaps the kind of model that is most often used for such networks is a set of differential equations. If ordinary model checking is chosen, however, the gap between such continuous models and discrete state-transition systems must be bridged.

On a different dimension, we will see that many computer tools for biochemical network analysis employ model checking for verifying that a model has a desired property (as is usually done in other domains, like digital circuits). Other systems, nevertheless, are able to extract more information from a model checker. For instance, by forcing a model checker to compute a counterexample, it is possible to obtain a path with a certain property. Another way is to have the model checker report all the states having a specified property.

Finally, we will see examples of less usual kinds of model checking, like probabilistic and hybrid model checking.

After reviewing models of biochemical networks in section 2, we turn our attention to model checking in section 3. Section 4 summarizes the use of model checking in tools for biochemical network analysis, while section 5 is devoted to works reporting direct applications of model checkers. Section 6, by contrast, describes computer tools or isolated algorithms that do not necessarily employ full-fledged model checking, but do use a symbolic technique. Section 7 draws some conclusions.

From the point of view of model checking, there is no essential difference, in terms of analysis with this technique, between various kinds of either biochemical networks or GRNs. We, therefore, use “biochemical network” to refer to several families of networks, such as gene, metabolic, signal-transduction, and cell-cycle networks (Deville et al.,

A GRN is a collection of DNA fragments indirectly interacting with each other and controlling the transcription of genes into mRNA. In the study of GRNs, analytical approaches represent the more realistic end of the model spectrum. Such models consist of nonlinear systems of ordinary differential equations (ODEs), where each variable denotes the concentration of a different gene product. Non-linearities, often modeled with sigmoids, appear from the fact that often the concentration of a product changes non-linearly with respect to another one. These non-linearities create mathematical difficulties, even for finding the set of attractors. A simplification of such models approximates sigmoids by step functions, giving rise to stepwise-linear equations (Gouzé and Sari, _{1}, _{2}, … , _{nx} at time _{1}, _{2}, … , _{nx} at time

Metabolic pathways are series of chemical reactions catalized by enzymes, often employing vitamins and other dietary substances, termed metabolites. Metabolites are modified through formation and dissolution of chemical bonds. Non-probabilistic models can be adequate because unstable equilibria are rare and large numbers of molecules are present (Bower and Bolouri,

Signal transduction refers to the transfer of information (called signals) from the extracellular medium, first to the cell membrane, and then to the intracellular medium, causing a response. By comparison with metabilic pathways, signal transduction pathways present a more complex dynamics with small numbers of relevant molecules. This makes probabilistic models more appropriate for such networks (Bower and Bolouri,

The cell cycle is the series of phenomena happening when a cell grows, divides, and duplicates. Models of cell-cycle networks also range from differential equations (Chen et al.,

Model checking (Clarke and Emerson,

Unlike other verification methods, model checking is totally automatic, and the specification is formulated in mathematical (temporal) logic. In addition, model checking not only deals with correctness but also with incorrectness, often providing a

Expressiveness of logics used in model checking, however, should be limited to achieve good performance. Hence, (temporal) logic is sometimes restricted so that only a partial behavior of the system may be specified. In this sense, model checking is a weak version of the verification problem. In spite of this, restricted temporal logics can express, among others, liveness properties, e.g., “every request will eventually be granted,” and safety properties, e.g., “certain state will never be reached” (Emerson,

An important breakthrough in model checking was the development of “symbolic” techniques, where states are represented implicitly. The first of these techniques was the introduction (Burch et al.,

The main problem model checking faces is that of “state explosion,” as the size of the model increases

It must be emphasized that not all systems with, say 10^{90}, states can be handled (Emerson,

We illustrate model checking for a logic ℒ, by using the case of ℒ = CTL. More thorough treatments are in: (Clarke et al.,

Truth of CTL formulas is defined in Kripke structures (also called Kripke models). Figure _{0}, … , _{3}}), each labeled with a subset of _{0} is an infinite sequence _{0} _{1} … of states such that _{i} and _{i+1} are related by the accessibility relation.

CTL formulas can have Boolean operators, such as

Some path (i.e., there Exists a path) | |

All paths | |

NeXt state (i.e., immediate future) | |

Some state either in the present or in the Future | |

All states in the present and in the future (Global) |

A CTL

Often more temporal operators are included in CTL. For example, a generalization of “_{i}” holds only at a state _{i}, then “_{1}) _{2}]” expresses that it is necessary to go through a state _{1} to reach a state _{2}. Such a formula asserts (equivalently) that there does not exist a path that can reach _{2} without reaching _{1}.

Normally, an ordinary CTL model checker follows the “state-labeling” algorithm (Clarke et al.,

In addition to CTL, linear-time temporal logic (LTL) is often employed in practice. LTL also uses Kripke structures, but the temporal operators of this logic lack the

When model checking is applied to biochemical network analysis, there are often interesting properties that cannot be expressed in CTL or LTL. For example, neither of these logics can specify the states from which it is possible for a Boolean variable to oscillate (i.e., to switch infinitely many times back and forth between 0 and 1). To be sure, there exists a CTL formula, namely “

It is also possible to apply model checking to other kinds of models. A probabilistic version of CTL (PCTL) has been developed (Hansson and Jonsson,

Biocham and Genetic Network Analyzer (GNA) are perhaps the computer tools for biochemical network analysis most extensively using model checking. We thus start with these two systems, and proceed with SMBioNet, Pathway Logic, Antelope, and XSSYS.

Biocham (BIOCHemical Abstract Machine) (Fages and Soliman,

Biocham's models are specified with a set of reaction rules of the form “_{i} _{i} => _{i},” _{1}, … , _{m}, where _{i} is a kinetic expression involving the concentration of molecules, _{i} is a set of molecules with their stoichiometric coefficient, and _{i} is the transformed set of molecules. Examples of kinetic expressions are: (1) the mass action law kinetics, (2) the Michaelis–Menten kinetics, and (3) the Hill kinetics. A set of such rules defines a (hyper) graph which can be interpreted by Biocham under different semantics.

In the case of a system ODEs semantics, Biocham can simulate, using the Runge–Kutta method or Rosenbrock's method, such systems of equations. In addition, Biocham can interpret rules with a stochastic semantics as a continuous-time Markov chain where the kinetic expressions are transition rates. Simulations in this case can be performed with Gillespie's algorithm (Gillespie,

These abstractions, which start from a reaction model and proceed to stochastic, to discrete, to Boolean networks, overapproximate the Boolean semantics obtained from the quantitative semantics. Hence, the non-existence of a behavior in the Boolean semantics implies its non-existence in the quantitative semantics of the rules (Fages and Soliman,

Finally, Fages and Soliman (

Using the NuSMV model checker (Cimatti et al.,

Biocham can use model checking in other forms. First, an extension of ordinary LTL with constraints over reals allows to analyze traces obtained from simulations (Rizk et al.,

Next, probabilistic model checking is also provided. Biocham estimates the probability of an LTL formula holding by sampling stochastic simulations (Fages and Soliman,

Finally, Biocham has an “update” component for automatically modifying a network that does not satisfy a given CTL formula. The algorithm of this component is based on counterexamples computed by NuSMV. Although incomplete (in the sense of sometimes not being able to find the appropriate changes to networks), such a component is useful because of being able to handle large networks (Chabrier-Rivier et al.,

Biocham has been applied (by its developers) to a budding yeast cell cycle model, to the Mitogen-Activated Protein Kinase (MAPK) cascades, and to the mammalian cell-cycle control. This last network involves 732 reactions over 165 proteins and genes, and 532 variables (implying 2^{532} ≃ 10^{160} states) (Chabrier-Rivier et al.,

GNA (de Jong et al.,

The equations of this kind of model have the form:
_{i}, _{i} the rate of synthesis of _{i}, and _{i} the rate of degradation of _{i}. The rate of synthesis is defined as:
_{il} is a rate parameter (κ_{il} > 0), _{il} is a _{i} is strictly positive. Observe that these equations are piecewise-linear.

A regulation function _{il} can be defined as an expression of step functions:
_{j} is a threshold. These thresholds divide the _{i} = θ^{j}_{i} (Gouzé and Sari,

GNA performs an abstraction of a system of equations of the form (1) by associating such a system with a state-transition graph. In such a graph, each domain of dimension

Instead of having to give precise numerical values of the threshold and rate parameters, it is possible to supplement the state equations with inequality constraints on such values. GNA is then able to perform a “qualitative simulation” on the resulting model. Such a simulation results in a state-transition graph consisting of qualitative states and transitions between qualitative states. It is then possible to search for steady states, for example.

GNA is able to perform model checking with NuSMV (Cimatti et al.,

Through the CADP toolbox, more complex properties can be analyzed (Monteiro et al.,

In addition, the GNA team has developed computation-tree regular logic (CTRL) (Mateescu et al.,

GNA has been used (by its developers) for analyzing the GRN controlling the carbon starvation response of

SMBioNet (Selection of Models of Biological Networks) (Bernot et al.,

Thomas' models of GRNs can be viewed either as an abstraction of a special case of piecewise-linear differential equations or as a generalization of a restriction of Boolean GRNs. These models are multi-valued state-transition systems, where concentrations are represented with discrete variables. In addition, time is viewed as proceeding in discrete steps. The value of every gene _{1}, _{2}, … , _{nx} at time

Thomas and his colleagues developed a method establishing a mapping from an “interaction” graph into a set of multi-valued state-transition systems. An

Thomas' method associates state-transition systems with an interaction graph as follows. First, an instantiated interaction graph is obtained by associating a set of possible “levels” with each gene, and a “threshold” with each edge. Each gene _{x} + 1 levels, where _{x} is the number of genes influenced by _{x} ≤ _{x}. In addition, with each interaction

_{x} = {0, 1, 2} and _{y} = {0, 1}.

The set of possible states is _{1} × · × _{n}. The level of _{1}, … , _{n}) ∈ _{x} ∈ _{x}. The set of ^{−1}_{x} denotes the set of predecessors of

Next, Thomas maps each ω_{x} (_{x} (ω_{x} (_{x} (the value toward which ^{−1}_{x}, _{x} (_{x} is an integer satisfying the following

If

If

The following table shows, for each state, the effective regulators of each gene, as well as possible values toward which each gene tends.

_{x} |
_{y} |
ω_{x} ( |
ω_{y}( |
_{x} (ω_{x} ( |
_{y} (ω_{y} ( |
---|---|---|---|---|---|

0 | 0 | ∅ | ∅ | 1 | 1 |

0 | 1 | ∅ | { |
1 | 1 |

1 | 0 | { |
∅ | 2 | 1 |

1 | 1 | { |
{ |
2 | 1 |

2 | 0 | { |
{ |
2 | 0 |

2 | 1 | { |
{ |
2 | 1 |

The

A state-transition system _{x1} (_{xn}(_{i} (_{i} for at most one

SMBioNet extends Thomas' formalism with processes. This extension enables the incorporation of biological information possibly constraining the set of state-transition systems associated with an interaction graph. Essentially, processes constrain the regulators of a gene with Boolean functions over inequalities. For example, instead of having

SMBioNet takes as input an interaction graph with processes

To build the expected output, SMBioNet exhaustively enumerates all the possible values of the sets of parameters and, by using NuSMV, retains the corresponding state graphs satisfying the given CTL formula. Processes and restrictions on the parameters help to reduce the number of parameter sets to be processed.

The number of states for a state graph corresponding to a Boolean network with ^{n}. Each of the 2^{n} states has at most ^{n}). Since each of the ^{n} possible transitions may or may not be present in an asynchronous state graph, the number of asynchronous state graphs for a Boolean network with ^{(n2n)}. Thus, if the input of SMBioNet is a property ^{(n2n)}. Therefore, SMBioNet only works in general for small values of

SMBioNet has been applied to the tail resorption in tadpole metamorphosis (Khalis et al.,

Pathway Logic (Eker et al.,

Similarly to Biocham, the interpretation of a system in Pathway Logic is qualitative and binary (Talcott,

Pathway Logic can perform forward search (i.e., simulation), as well as LTL model checking (Eker et al.,

In addition, Pathway Logic provides metalevel analysis: Rules can be abstracted into families, each family corresponding, for example, to a particular type of reaction, such as activation, inhibition, or translocation. It is thus possible, for instance, to find all rules involving a given protein.

Pathway Logic has been applied to analyze the MAPK pathway (Eker et al.,

Antelope (Analysis of Networks through TEmporal-LOgic sPEcifications) (Arellano et al.,

As with other systems, Antelope has to face the fact that ordinary logics for model checking such as CTL and LTL are not expressive enough for many biological applications. Antelope employs an extension of CTL with “hybrid” operators (not to be confused with “hybrid model checking”, having both discrete and continuous variables). The additional expressiveness of Hybrid CTL essentially consists in formulas being able to refer to particular states explicitly. Hybrid CTL can express many interesting properties such as oscillations and multistability.

Antelope encourages the use of branching time beyond asynchrony, for incompletely specified behavior and environment interaction. The authors exemplify these other uses of branching time in the development of a Boolean GRN of the

Antoniotti et al. (

This section is devoted to examples of model checking applied to biochemical networks analysis, without necessarily involving the development of a specialized tool.

In (Fisher et al.,

The proposal of Fisher et al. (

Finally, it is interesting to mention the high number of states having many successors. A consequence is the existence of approximately 10^{36} possible executions of the model, and about 92,000 different reachable states. Thus, a simulation approach in this context would not be feasible.

In (Ahmad et al.,

We now review three works employing the PRISM model checker (Kwiatkowska et al.,

In the first work, starting from a kinetics with non-linear ODEs, Calder et al. (

In the next work, Heath et al. (

The authors' model consists of a PRISM module for each component of the pathway (e.g., FGF, FGFR, Src, etc.), and also a module for each possible compound and receptor residue. Module synchronization allows describing interactions involving multiple elements.

For example, two queries in PRISM are:

Two classes of state-reduction techniques are described: exact and approximate. The exact approaches group equivalent states in the underlying CTMC, and are (1) “lumpability” (Derisavi et al.,

The approximate approaches are applicable to networks where the proteins and the receptors have multiple docking sites and engage multiple downstream signaling proteins. The first approximate approach is based on identifying and removing “micro-states” in the network. When the model's reactions differ in orders of magnitude, it is possible to separate “fast” from “slow” reactions. Similarly, it is possible to model molecules' concentrations with abstract quantities, such as “low” and “high.” A final reduction is that of abstraction, involving manual grouping of states.

By using approximate techniques, Heath et al. (

In the final work, Ciocchetta et al. (

In this section, we survey some systems and algorithms that do not necessarily utilize full-fledged model checking, but are related to model checking because of employing either temporal logic or a symbolic technique.

Building upon SMBioNet, Fromentin et al. (

Another system also using both Thomas' formalism and temporal logic is that by Mateus et al. (

A system also based on Thomas' formalism is GINsim (Gene Interaction Network simulation) (Chaouiya et al.,

There are close connections between Boolean satisfiability (SAT) and Boolean networks (Milano and Roli, _{0}, … , _{n−1}) and _{0}, … , _{n−1}). Next, ^{k}. Any assignment satisfying ^{k} is a finite path of length

We encountered model checking used in numerous and varied ways for biochemical network analysis. This verification technique has been applied to many kinds of biochemical models, ranging from Boolean networks, to Thomas' formalism, to hybrid and timed automata, to CTMC (see Table

Biocham | ODEs, stochastic, discrete, Boolean | CTL, LTL + num. constr., PLTL | NuSMV, PLTL, violation-degree |

GNA | piecewise-linear eq., Boolean | CTL, variant μ-calculus, CTRL | NuSMV, CADP, CTRL |

SMBioNet | Thomas' | CTL | NuSMV |

Pathway logic | rewrite rules, Petri, Boolean | LTL | LoLA |

Antelope | Boolean | Hybrid CTL | Antelope's |

Simpathica, XSSYS | ODEs | variant LTL | XSSYS |

Fisher et al. ( |
reactive modules | Alternating-time temp. logic (ATL) | Mocha |

Ahmad et al. ( |
LHA | “while” language | HyTech |

Calder et al. ( |
continuous-time Markov chains | CSL | PRISM |

Heath et al. ( |
continuous-time Markov chains | CSL | PRISM |

Ciocchetta et al. ( |
continuous-time Markov chains | CSL | PRISM |

It is clear, on the other hand, that the application of model checkers for biochemical network analysis is still incipient. Many tools we reviewed have only been used by their developers. Two relevant exceptions, however, are Biocham, used e.g., by Bellé et al. (

In our opinion, there are two situations that could be improved to further the applicability of model checking in this area. First, there is often a mismatch between the kind of model that can be checked and the type of biochemical network model built under uncontrolled conditions, such as the available data. Nevertheless, Biocham and GNA employ two different ways of performing such a link.

Second, writing a formula in the logic underlying the utilized model checker is usually difficult. To be sure, Biocham provides syntactic sugar abbreviating certain common formulas and GNA has a pattern-based query language, but the difficulty persists.

In addition to model checking, there are other computational techniques employed in biochemical network analysis. The potential of simulation and constraint-solving, for instance, should also be assessed. Although perhaps the most direct tool, the relevance of simulation cannot be exaggerated. Constraint-solving, in turn, has been successfully employed, as illustrated in Devloo et al. (

We devoted this work to biochemical pathways, but model checking has also been applied to other problems in computational biology. An example is (Grosu et al.,

Our main interest was that of exploring model-checking contributions to biochemical network analysis. We saw, nevertheless, contributions in the other direction as well: the development of model-checking results motivated by biochemical problems. We can mention the update component of Biocham (Chabrier-Rivier et al.,

We thus believe that model checking is ready for advancing substantial contributions to biochemical network analysis in particular, and to computational biology in general.

The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

We acknowledge the facilities provided by IIMAS, UNAM. We are also grateful to Elena Alvarez-Buylla, Eugenio Azpeitia, Julio Collado-Vides, Elizabeth Ortiz, and Nathan Weinstein, with whom we had fruitful discussions. Members of the Biocham and GNA teams generously gave us helpful information. Finally, we are grateful to the reviewers, who gave us useful suggestions.

Pedro A. Góngora was supported by Conacyt.

^{20}states and beyond