^{1}

^{*}

^{2}

^{1}

^{1}

^{2}

Edited by: Philippe Monneveux, International Potato Center, Peru

Reviewed by: Jin Chen, Michigan State University, USA; Pawel Krajewski, Institute of Plant Genetics, Poland; John Doonan, Aberystwyth University, UK

*Correspondence: Marcos Malosetti, Biometris - Applied Statistics, Department of Plant Science, Wageningen University, PO Box 100, 6700 AA, Wageningen, Netherlands. e-mail:

This article was submitted to Frontiers in Plant Physiology, a specialty of Frontiers in Physiology.

This is an open-access article distributed under the terms of the Creative Commons Attribution License, which permits use, distribution and reproduction in other forums, provided the original authors and source are credited and subject to any copyright notices concerning any third-party graphics etc.

Genotype-by-environment interaction (GEI) is an important phenomenon in plant breeding. This paper presents a series of models for describing, exploring, understanding, and predicting GEI. All models depart from a two-way table of genotype by environment means. First, a series of descriptive and explorative models/approaches are presented: Finlay–Wilkinson model, AMMI model, GGE biplot. All of these approaches have in common that they merely try to group genotypes and environments and do not use other information than the two-way table of means. Next, factorial regression is introduced as an approach to explicitly introduce genotypic and environmental covariates for describing and explaining GEI. Finally, QTL modeling is presented as a natural extension of factorial regression, where marker information is translated into genetic predictors. Tests for regression coefficients corresponding to these genetic predictors are tests for main effect QTL expression and QTL by environment interaction (QEI). QTL models for which QEI depends on environmental covariables form an interesting model class for predicting GEI for new genotypes and new environments. For realistic modeling of genotypic differences across multiple environments, sophisticated mixed models are necessary to allow for heterogeneity of genetic variances and correlations across environments. The use and interpretation of all models is illustrated by an example data set from the CIMMYT maize breeding program, containing environments differing in drought and nitrogen stress. To help readers to carry out the statistical analyses, GenStat® programs, 15th Edition and Discovery® version, are presented as “Appendix.”

The success of a plant breeding program depends on its ability to provide farmers with genotypes with guaranteed superior performance (phenotype) in terms of yield and/or quality across a range of environmental conditions. To achieve this aim, it is necessary to have an understanding of the factors leading to a good phenotype.

Usually the phenotype is the value for a trait at the end of the growing season. The reason is that we are primarily interested in phenotypes like yield or grain weight at maturity and not, or less, in yield or grain weight at earlier stages. The final state of a trait is the cumulative result of a number of causal interactions between the genetic make-up of the plant (the genotype) and the conditions in which that plant developed (the environment). Plants differ in the efficiency and adequacy with which they capture and convert environmental inputs and stimuli into the biomass and organs that constitute a final product. The capture and conversion abilities of a plant are determined by its particular ensemble of genes. Environments differ in the amount and quality of inputs and stimuli that they convey to plants including, e.g., the amount of water, nutrients or incoming radiation. A primary objective in plant breeding is to match genotypes and environments in such a way that improved phenotypes are obtained. For example, a breeder might be interested in selecting genotypes that do well under water stress conditions.

While there can be genotypes that do well across a wide range of conditions (widely adapted genotypes), there are also genotypes that do relatively better than others exclusively under a restricted set of conditions (specifically adapted genotypes). Specific adaptation of genotypes is closely related to the phenomenon of genotype-by-environment interaction (GEI). GEI exists whenever the relative phenotypic performance of genotypes depends on the environment, or in other words, when the difference in reactions of genotypes varies in dependence on the environment.

To illustrate the phenomenon of GEI, we can consider two different genotypes that differ in the genetic machinery involved in tolerance to water-limited conditions, while being equal for all other characteristics. If these two genotypes are exposed to a poorly watered environment, their performance will differ depending on the genetic properties related to tolerance for water-limited conditions. However, this genotypic difference will disappear in an environment that provides the right amount of water. So, the difference in performance between the two genotypes depends on the environment, through the amount of water that it provides.

Some scenarios that can occur when comparing the performances of pairs of genotypes across environments are presented in Figure

GEI was introduced in terms of the relative difference between genotypic means. GEI can also be regarded in terms of heterogeneity of genetic variance and covariance, or correlation. As a consequence of GEI, the magnitude of the genetic variance as observed within individual environments will change from one environment to the next. Often, the genetic variance tends to be larger in better environments than in poorer environments, although the opposite can be observed as well (Przystalski et al.,

GEI has also consequences for the correlations between genotypic performances in different environments. When GEI is large, the observed performance of a set of genotypes in one environment may not be very informative for the performance of the same genotypes in another environment. Environments with similar characteristics will induce corresponding responses in plants and will lead to strong genetic correlations. Figure

In conclusion, given the complexity of the mechanisms and processes underlying the phenotypic response across diverse and changing environmental conditions—frequently in an unpredictable way—it is necessary to develop analytical tools to help breeders understand GEI. The use of adequate strategies to analyze GEI is a first and important step toward more informed breeding decisions. Good analytical methods are a prerequisite for predicting the performance of genotypes as accurately as possible. This paper explores several strategies to model GEI, starting with simple methods that have been historically popular within the plant breeding community. It then moves to more elaborate models in which additional information is used in the form of explicit environmental characterization to model GEI. A final section is devoted to the integration of molecular marker information into GEI models, leading to the detection of quantitative trait loci (QTLs) and more specifically, to the modeling of QTL by environment interaction (QEI). The statistical methodology is illustrated using a maize data set obtained from a series of drought and nitrogen stress trials from the maize breeding program at Centro Internacional de Mejoramiento de Maíz y Trigo (CIMMYT; the International Maize and Wheat Improvement Center; Ribaut et al.,

An obvious first step to investigate GEI is to obtain phenotypic observations on a set of genotypes exposed to a range of environmental conditions. The set of genotypes can include advanced lines of a breeding program, cultivars, and segregating offspring from a specific cross such as F_{2}, a backcross, or a recombinant inbred line (RIL) population.

Genotypes can be tested under different management regimes that represent increasing levels of a particular stress, or a combination of stresses. This type of experiment is called a “managed stress trial” and is appropriate when the researcher wishes to focus on a particular type of stress. When performing managed stress trials, it is important to control the system in such a way that all other factors influencing the phenotype are as homogenous as possible.

Managed stress trials are not a default option in plant breeding, because stress type and level can be difficult to implement and because the relationship between phenotype and stress is complex, with genes and environmental stress(es) interacting throughout the various developmental phases. In those situations, a common way for plant breeders to screen for genotypic reactions to environmental factors is by “multi-environment trials” (METs). In a MET, a number of genotypes are evaluated at a number of geographical locations for a number of years in the hope that the pattern of stresses that the genotypes experience is representative of future growing environments.

A convenient way to summarize data from managed stress trials and METs is in the form of two-way tables of means, with genotypes in the rows and environments in the columns. Each cell of such a table contains an estimate of the performance (adjusted mean) of a particular genotype in a specific environment. To identify genotypes and environments unequivocally, we use indices, the letter

The models in the following sections will assume as a starting point a genotype-by-environment table of means. These models are used in a so-called two-stage strategy for analyzing MET data. In the first stage, individual trials are analyzed with models including terms for design features and spatial variation. From these individual trial analyses, adjusted means and weights, usually reciprocals of the variances of the means, are carried forward to the second stage, where a model is fitted to the genotype by environment means, using either no weights or weights estimated in the first stage. Various choices can be made for the weights in a two stage analysis (Mohring and Piepho,

The models to be presented here are illustrated using data produced by the maize drought stress breeding program of CIMMYT. A brief description of the data is given here, a more detailed description is available in the original publications (Ribaut et al., _{2} population was generated by crossing a drought tolerant parent (P_{1}) with a drought susceptible one (P_{2}). Seeds harvested from each of 211 F_{2} plants formed F_{3} families, which were stored for further evaluation. The F_{3} families were evaluated in managed stress trials in 1992, 1994, and 1996. In the winter of 1992, a managed water stress trial was conducted in Mexico, including no stress (NS), intermediate stress (IS), and severe stress (SS). In the winter of 1994, a similar trial was conducted, but it only included the IS and SS treatments. In the summer of 1996, the families were tested in a nitrogen stress trial with two levels: low (LN) and high nitrogen (HN). An extra LN trial was conducted in the winter of the same year. In total, the families were evaluated in eight different environments, each environment characterized by year, stress type and intensity, and management factors. DNA was extracted from each of the 211 F_{2} plants to produce a total of 132 restriction fragment length polymorphism (RFLP) markers covering the 10 maize chromosomes.

The phenomenon of GEI is of primary interest in plant breeding, and has resulted in a large body of literature on models and strategies for analysis of GEI [see, for example, the reviews in Cooper and Hammer (_{ij}, as the result of the common fixed intercept term μ, a fixed genotypic main effect corresponding to genotype _{i}, plus a fixed environmental main effect corresponding to environment _{j}, and finally the random term, _{ij}, representing the error term, typically assumed normally distributed, with a mean of zero and constant variance, σ^{2}; _{ij} ~ N(0, σ^{2}).

Model 1 predicts that for any genotype the difference means between any two environments ^{*} will be equal to the difference in the environmental main effects: E_{j}–E_{j*}. Consequently, the norms of reaction of genotypes will be parallel (Figure

The results of the fit of an additive model to the maize data set are presented in Table

E | 7 | 5679 | 811.2 | 1466.5 | <0.001 |

G | 210 | 614 | 2.9 | 5.3 | <0.001 |

1470 | 813 | 0.6 | |||

Total | 1687 | 7106 | 4.2 |

There are two reasons for the disagreement between the predicted values from an additive model and the observed means for environment-specific genotypic performances: (1) an effect proper to the particular combination of genotype and environment; and (2) experimental error. Model 1 can be extended with an effect that is specific for genotype-by-environment combinations, GEI, or a double-indexed term GEI_{ij}:

A more attractive alternative is to extend the additive model (model 1) by incorporating terms that explain as much as possible of the GEI. A popular strategy in plant breeding is that proposed by Finlay and Wilkinson (_{i} = G′_{i} and E_{j} + b_{i}E_{j} = (1 + b_{i})E_{j} = b′_{i}E_{j}. Model 3b is easier to interpret because it looks as a set of regression lines; each genotype has a linear reaction norm with intercept G′_{i} and slope b′_{i}. The explanatory environmental variable in these reaction norms is simply the environmental main effect E_{j}. Model 3a shows more clearly how GEI is captured by a regression on the environmental main effect, with the hope that as much as possible of the GEI signal will be retained by the term b_{i}E_{j}.

In the regression on the mean model, GEI is explained in terms of differential sensitivities to the improvement of the environment, with some genotypes (the ones with larger values of b_{i}) benefiting more than others from an increase in environmental quality. Note that in model 3a, Σb_{i} = 0, so that the average slope value is zero, while in model 3b the average value of b′ is 1, meaning that b′ > 1 for genotypes with a higher than average sensitivity, and b′ < 1 for genotypes that are less sensitive than average.

Table _{ε} = 813) has been divided into a part explained by genotypic sensitivities to environmental quality (SS_{b} = 230), and a residual (SS_{ε} = 583).

E | 7 | 5679 | 811.2 | 1752.3 | <0.001 |

G | 210 | 614 | 2.9 | 6.3 | <0.001 |

Heterogeneity of slopes | 210 | 230 | 1.1 | 2.4 | <0.001 |

1260 | 583 | 0.5 | |||

Total | 1687 | 7106 | 4.2 |

By way of example, the fitted reaction norms of five genotypes (out of the full set of 211 genotypes) are given in Figure _{G025} = 1.27 > b′_{G045} = 0.99). A similar observation can be made for G008 vs. G012 and G016. While G008 does relatively better in low quality environments, it is clearly surpassed by G012 and G016 in the best environments, since it is not capable of profiting from the better environmental conditions (b′_{G008} = 0.65, which is the lowest sensitivity among the five genotypes).

In summary, the regression on the mean model describes GEI in terms of parameters that can be given some biological meaning. In addition, and in contrast with the full interaction model (model 2), model 3 can be used to predict the performance of genotypes in environments that were not present in the MET, as long as the environment for which predictions are required can reasonably be placed within the range of environments used in the original MET. Nevertheless, the regression on the mean model suffers from the fact that the environmental characterization is based on a single dimension. Environmental quality can be hard to summarize within a single explanatory variable. Therefore, a substantial amount of GEI can remain unexplained. In the next section, the regression on the mean model will be extended by including multidimensional environmental characterizations in the statistical model for the genotype-by-environment data.

The limitation of a single dimension in environmental characterization can be removed by employing a more flexible model, in which more than one environmental quality variable is allowed. A popular model of this type is the additive main effects and multiplicative interaction (AMMI) model (Gollob, _{ik} (genotypic score) and a hypothetical environmental characterization z_{jk} (environmental score). Although genotypic and environmental scores are deemed to represent genetic and environmental qualities, they come from a mathematical procedure, a principal components analysis on the GEI (Gabriel, _{PCA1} = 242), the second one explains a little less (SS_{PCA2} = 173), with a total explained sum of squares for GEI of 242 + 173 = 415, an improvement over the explained sum of squares in the regression on the mean model (SS_{b} = 230).

E | 7 | 5679 | 811.2 | 1752.3 | <0.001 |

G | 210 | 614 | 2.9 | 6.3 | <0.001 |

PCA1 | 216 | 242 | 1.1 | 2.8 | <0.001 |

PCA2 | 214 | 173 | 0.8 | 2.0 | <0.001 |

1040 | 398 | 0.4 | |||

Total | 1687 | 7106 | 4.2 |

A desirable property of the AMMI model is that the genotypic and environmental scores can be used to construct powerful graphical representations called biplots (Gabriel,

Biplots facilitate the exploration of relationships between genotypes and/or environments. Genotypes that are more similar to each other are closer to each other in the plot than genotypes that are less similar. The same is true for environments. Genotypes/environments that are alike tend to cluster together. The angle between environmental axes is related to the correlation between the environments. An acute angle indicates positive correlation (e.g., between LN96a and LN96b), a right angle indicates no correlation (e.g., between HN96b and NS92a), and an obtuse angle indicates negative correlation (e.g., NS92a and LN96a). The projection of a genotype onto an environmental axis reflects the performance of that genotype in that environment (for GEI). For example, genotype G091 projects on the NS92a axis above the origin, indicating a positive interaction with that environment i.e., the relative performance (GEI part) of G091 in NS92a is above the average of all genotypes in NS92a. Conversely, genotype G041 (on the right hand side of the plot) projects below the origin on the same axis, which points to a negative interaction with environment NS92a (i.e., G041 performs worse than average). Following a similar procedure it is possible to conclude that while genotype G091 showed positive adaptation to environment NS92a, it is not well adapted to environments LN96a and LN96b (the projection of G091 on the LN96a and LN96b axes falls below the origin). Biplots are useful tools to investigate patterns in GEI, because they can help to identify interesting genotypes that are adapted to particular environments, and to classify environments in groups.

Plant breeders are interested in the total genetic variation and not exclusively in the GEI part. For that reason, it is useful to have a modification of model 4 that considers the joint effects of the genotypic main effect and the GEI as a sum of multiplicative terms. Effectively, the two-way table of genotype-by-environment means is exposed to a standard principal components analysis, with genotypes as objects and environments as variables (Yan et al.,

The models discussed so far assumed that we do not have explicit information about the environments. While such models can be useful to explain GEI, the biological interpretation of their results is not always obvious. What do hypothetical environmental variables, as in AMMI, mean in terms of quantifiable environmental characteristics such as temperature, water, nutrients etc? A straightforward approach is to correlate environmental scores with environmental covariables. However, if we do have explicit information about the environment, the information can be used directly in the model by including it in the form of explanatory variables. GEI is then described as differential genotypic sensitivity to explicit environmental factors such as temperature, precipitation, water availability etc. Such models are known as factorial regression models (Denis, _{j} represents an explicit environmental covariable and not a hypothetical environmental covariable as in models 3a and 4 (note that Z is capitalized to highlight this difference). This distinction is critical since the interpretation of the GEI in models 6a and 6b is automatically placed into a biological context. Instead of describing GEI as differential reactions to hypothetical environmental covariables, factorial regression models help to identify genotypes that are differentially sensitive to changes in identified environmental quality components, for example, in a particular nutrient, or in water availability.

Table

E | 7 | 5679 | 811.2 | 1752.3 | <0.001 |

G | 210 | 614 | 2.9 | 6.3 | <0.001 |

G.minTF | 210 | 172 | 0.8 | 1.7 | <0.001 |

G.radiationGF | 210 | 124 | 0.6 | 1.2 | ≤0.038 |

1050 | 517 | 0.5 | |||

Total | 1687 | 7106 | 4.2 |

In the introduction, it was mentioned that GEI can be regarded both in terms of differential mean responses across environments and in terms of heterogeneity of genetic variation and covariation between environments. While the models considered so far focus on modeling the mean response, the models in this section focus on the modeling of GEI in terms of heterogeneity of variances and covariances. This section switches to the framework of so-called mixed models. We concentrate on the main characteristics of a few, relatively simple yet powerful, mixed models that can be used to model GEI in terms of heterogeneity of variance and covariance. A more detailed description of mixed models can be found in the literature elsewhere (Verbeke and Molenberghs,

The models discussed in the previous sections were all examples of fixed effects models, with all terms except the residual term fixed. However, genotypes can be regarded as a random sample from a larger population (especially easy when the number of genotypes is large, say more than 10), in which case genotypes are an extra source of random variation. This situation calls for a mixed model, with genotypes taken as random term. A review of the use of mixed models to analyse complex data sets in plant breeding can be found in Smith et al. (_{i} is underlined to indicate that it is a random term; its distribution needs to be specified, and usually is taken to be normal, with zero mean and a variance specific to the term. Model 7 contains two variance components, one corresponds to the random genotypic main effects, σ^{2}_{G}, and a second one, σ^{2}_{ε}, corresponds to the residual (which includes true GEI and error). An important consequence of including genotypes as random is that automatically genetic covariances and correlations between performances in different environments are imposed. The total variance for individual genotypic observations in a particular environment ^{2}_{j}, is the sum of two sources of variation: σ^{2}_{j} = σ^{2}_{G} + σ^{2}_{ε}. The covariance between observations for a particular genotype in environments ^{*}, σ_{jj*}, following from model 7 is: σ_{jj*} = σ^{2}_{G}. For observations on different genotypes σ_{jj*} = 0. In model 7, similarities (or covariation, and therefore correlation) between observations made on the same genotype in different environments are assumed to be positive, but covariation between observations on different genotypes (regardless whether the observation is done in the same or in different environments) is assumed to be zero. Model 7 is referred as the compound symmetry model (Verbeke and Molenberghs,

The general definition for a correlation between two traits, or two environments, ^{*} (for clarity, we write Env_{j} and Env_{j*} when referring to those environments), being equal to:

10265.3 (7) | <0.001 | 9759.4 (7) | <0.001 | 6268.8 (7) | <0.001 | |||
---|---|---|---|---|---|---|---|---|

σ^{2}_{G} |
0.297 | 0.036 | σ^{2}_{G} |
0.125 | 0.017 | σ^{2}_{C1} |
0.439 | 0.053 |

σ^{2}_{ε} |
0.553 | 0.020 | σ^{2}_{ε1} |
0.551 | 0.057 | σ^{2}_{C2} |
1 | – |

σ^{2}_{ε2} |
0.692 | 0.071 | σ^{2}_{C3} |
0.042 | 0.013 | |||

σ^{2}_{ε3} |
1.399 | 0.140 | σ_{C1C2} |
0.551 | 0.077 | |||

σ^{2}_{ε4} |
0.672 | 0.069 | σ_{C1C3} |
0.109 | 0.019 | |||

σ^{2}_{ε5} |
0.704 | 0.072 | σ_{C2C3} |
0.115 | 0.032 | |||

σ^{2}_{ε6} |
0.135 | 0.018 | σ^{2}_{ε1} |
0.446 | 0.051 | |||

σ^{2}_{ε7} |
0.152 | 0.019 | σ^{2}_{ε2} |
0.445 | 0.052 | |||

σ^{2}_{ε8} |
0.761 | 0.078 | σ^{2}_{ε3} |
0.736 | 0.169 | |||

σ^{2}_{ε4} |
0.428 | 0.050 | ||||||

σ^{2}_{ε5} |
0.508 | 0.057 | ||||||

σ^{2}_{ε6} |
0.145 | 0.018 | ||||||

σ^{2}_{ε7} |
0.138 | 0.017 | ||||||

σ^{2}_{ε8} |
0.740 | 0.080 | ||||||

Table

The estimates of the two parameters associated to the random terms in the model: σ^{2}_{G} = 0.297 and σ^{2}_{ε} = 0.553 are given in the second part of Table ^{2}_{G}) in relation to the sum of GEI and error (σ^{2}_{ε}). The genetic correlation between any two environments is estimated as:

Model 7 assumes a constant genetic variance and correlation between pairs of environments. For METs, the assumption of constant genetic variance and genetic correlation across environments is unrealistic (Figure _{ij} that includes GEI and error, is assumed to depend on the environment (i.e., the variance component σ^{2}_{εj} is indexed by ^{2}_{G} = 0.125), and eight corresponding to a form of GEI for each of the eight environments (for convenience, we assume constant errors). The heterogeneity of variance for _{ij} reflects that in some environments there is a larger variation (e.g., in environment 3, which is the high-yielding NS92a) than in other environments (e.g., in environments 6 and 7, which are low-yielding, LN96a and LN96b). The heterogeneity of variance leads to heterogeneous genetic correlations between environments. For example, the correlation between environments 6 and 7 is:

The deviance for model 8 is 838.4 with 1671 DF, which is much lower than the one for model 7 (deviance 1077.9 with 1678 DF). The deviance has dropped, but at the expense of having to estimate more parameters (nine instead of two parameters). Is the decrease in deviance large enough to consider model 8 a significant improvement over model 7? Because model 7 and 8 are nested models (model 7 is a special case of model 8 when the σ^{2}_{εj} are equal for all

In cases where the models are not nested, the comparison can be done by the Akaike Information Criterion (AIC) (Akaike,

Model 8 assumes heterogeneous variances across environments, in combination with a constant covariance between environments. This latter assumption can be relaxed by also allowing the genetic covariance between environments to be heterogeneous. A possibility is to estimate a covariance parameter for each pair of environments, producing a variance-covariance model that is referred to as the “unstructured model” (Verbeke and Molenberghs, _{c} that consists of group specific genetic variances, with σ^{2}_{cj} for group _{cjj*}between groups ^{*}, on the off-diagonals. Model 9 retains the residual heterogeneity of model 8, which means that environment specific genotypic effects are added to group specific genotypic effects. To illustrate model 9, using the maize example, and based on Figure _{C} will contain on the diagonal the genetic variances for groups 1, 2, and 3 (σ^{2}_{c1}, σ^{2}_{c2}, and σ^{2}_{c3} respectively), and on the off-diagonals the covariances between the groups (σ_{c12}, σ_{c13}, and σ_{c23}). The full covariance matrix can be written as:
_{C} can be found.

The diagonals of Σ_{C} show that, on average, the genetic variation is lower in group 1 (the group of nitrogen stress environments) than in group 2. It should be noted that because group 3 is composed of a single environment, the genetic variation cannot be partitioned into a component due to the group and a residual, so σ^{2}_{c3} is not estimated but arbitrarily fixed to 1. The total variance in each of the environments is equal to the sum of the group's variance plus the environment-specific variance. For example, the variance in environment 1 is equal to 0.885, which is the sum of the variance of group 1, i.e., σ^{2}_{c1} = 0.439, and σ^{2}_{ε1} = 0.446. Recalling that the covariance between environments within the same group is given by σ^{2}_{c1}, σ^{2}_{c2} and σ^{2}_{c3}, and the covariance between environments in different groups by σ_{c1c2}, σ_{c1c3}, and σ_{c2c3}, the correlation between any pair of environments can be estimated. For example, the correlation between environments 1 and 2 is:

The deviance for model 9 is 619.9 with 1667 DF, and the difference in deviance with model 8 is 218.5, with four extra parameters. The associated

We have presented different mixed model formulations to model GEI in terms of heterogeneity of variance and covariance between environments. The compound symmetry model, which is the commonly used default model when fitting a mixed model to a two–way table of means, forces variances and covariances to be constant across environments. Two alternative models accommodated either heterogeneity of genetic variances across environments, or heterogeneity of genetic variances and covariances across environments. There are other useful variance-covariance models such as the factor analytic (Malosetti et al.,

The analysis of a data set is an iterative process consisting of fitting and comparing alternative models to identify a good model for the data under study. That process has been illustrated with a maize data set. The next section goes one step further in the modeling process by including molecular marker information, with the ultimate objective of identifying genomic regions, QTLs, that underlie genetic variation of quantitative traits. Within the context of METs, the use of such models is a powerful tool to identify and understand the genetic basis of GEI, that is, QEI.

So far, we discussed models that use either implicit or explicit environmental characterizations to understand GEI. We switch in this section to the use of explicit

Elaborating upon factorial regression ideas, the following section presents mixed models that can accommodate explicit genotypic information to describe GEI in terms of QTL and QEI effects (Malosetti et al.,

While here we focus in this paper on mixed model QTL detection, this is certainly not the only method for multi-environment QTL mapping. A well known and common alternative is to use mixture model approaches (Jiang and Zeng,

Most populations in QTL mapping originate from crosses between pairs of inbred lines. A segregating offspring population can be produced from an F_{1} hybrid after one generation of selfing (F_{2}), after several generations of self-pollination (recombinant inbred lines or RIL), or after crossing the F_{1} with one of the parental lines (backcross). In addition, by chromosome doubling of F_{1} gametes, a population of doubled haploid lines can be generated. In all of these cases, two alleles at most will segregate at each locus. For a locus _{1}_{1}M_{1}, M_{1}m_{1}, or m_{1}m_{1}, with M_{1} the allele that comes from the paternal line, and m_{1} the allele that comes from the maternal line. By convention the locus names are given in italics (so for example _{1}_{1} and m_{1} refer to the paternal and maternal alleles at locus 1, respectively). The relative frequency of the genotypes in the offspring population depend on the type of population; for example, in an F_{2} the expected frequencies are ¼, ½, and ¼ for M_{1}M_{1}, M_{1}m_{1}, and m_{1}m_{1}, respectively.

With the help of molecular markers, it can be revealed whether a particular individual is of the M_{1}M_{1}, M_{1}m_{1}, or m_{1}m_{1} type. To detect QTLs and estimate their effects, it is necessary to translate the marker information into explanatory variables or genetic predictors. A straightforward way of constructing genetic predictors is to create an explanatory variable that contains the number of copies of one of the alleles, for example, the M_{1} allele. The genetic predictor will then take the value 2 whenever an individual has two paternal alleles (M_{1}M_{1}), the value 1 when the offspring individual is M_{1}m_{1}, and 0 when it is m_{1}m_{1}. Using a simple regression model, the slope for the regression of the genotypic means on a genetic predictor defined by the number of M_{1} alleles corresponds to the effect of a substitution of an m_{1} allele by an M_{1} allele at the given locus (Lynch and Walsh, _{1}M_{1} or m_{1}m_{1}, and value 1 whenever it is M_{1}m_{1}.

With complete information on the marker genotypes, i.e., codominant markers without missing values, the construction of genetic predictors at marker positions consists of simply counting the number of alleles coming from a particular parent. For genomic positions in between marker loci (putative QTL positions), for dominant markers, and for markers with missing values, the construction of genetic predictors requires more effort. In a general formulation, the value for the additive genetic predictor, X^{add}, for an offspring individual can be defined as the expected number of alleles coming from the paternal line, the number of M_{1} alleles:
_{1}M_{1}|all markers), Pr(M_{1}m_{1}|all markers), and Pr(m_{1}m_{1}|all markers) the conditional probabilities of the individual being of the M_{1}M_{1}, M_{1}m_{1}, or m_{1}m_{1} type, respectively given the observed marker information. Note that in the case of complete information, the individual's genotype is known, so one of Pr(M_{1}M_{1}|markers), Pr(M_{1}m_{1}|markers) and Pr(m_{1}m_{1}|markers) will be equal to 1, while the others will be 0.

In the case of incomplete information, although the genotype for a locus of an individual may not be known with certainty, information can be obtained from nearby markers to estimate the probability of the offspring individual being of a particular genotype. This probability is a function of the observed genotypes at neighboring markers and the expected recombination occurring between those marker loci and the locus under evaluation (Lynch and Walsh,

With the estimated conditional probabilities, the genetic predictors at positions where no or partial marker information is available can be calculated by using the conditional probabilities in expression 10a. An analogous reasoning holds for the estimation of dominance genetic predictors:

The inclusion of genetic predictors in a GEI model allows testing the hypothesis that the DNA at a particular genome position has an effect on a phenotypic trait, and whether that effect is environment dependent or not. A basic GEI phenotypic model, as the one discussed in the previous sections, can be extended to accommodate two new terms, one for the additive genetic effect of a possible QTL (X^{add}_{i}α_{j}), and a second for the dominance effect of the same locus (X^{dom}_{i}δ_{j}):
^{add}_{i}, and X^{dom}_{i} stand for the values of the additive and dominance genetic predictors of individual _{j} and δ_{j} represent the additive and dominance effects of this QTL. In model 11, both types of QTL effects are indexed by _{i}, and residual GEI (residual QEI) contributes to _{ij}. The conclusion about the presence of a QTL at a particular position is based on a Wald test (Verbeke and Molenberghs, _{j} = 0, and Ho: δ_{j} = 0,

For the maize data, Table _{j}) are partitioned into an additive main effect (α^{Q}) and QEI effects (α^{QEI}_{j}), leading to the following model:

_{j}_{j}) QTL effect

E | 10875.5 | 7 | <0.001 |

Additive effect (α_{j}) |
100.9 | 8 | <0.001 |

α^{Q} |
12.8 | 1 | <0.001 |

α^{QEI}_{j} |
88 | 7 | <0.001 |

Dominance effect (δ_{j}) |
13.5 | 8 | ≤0.097 |

^{Q}), and a QEI effect (α^{QEI}_{j})

If required, a similar partitioning of the QTL effects may be carried out for the dominance effects. As a result of the partitioning of the environment-specific QTL effects, there is a Wald test for QTL main effect and a Wald test for QEI (Table

The preceding section presented a number of models that can be useful in the detection of QTLs for MET data. The present section discusses a strategy for a genome-wide scan for QTLs. QTL mapping can be regarded as a model selection process aiming to identify a model that describes the phenotypic response in terms of QTL effects. Since a priori neither the number of QTLs nor their effects are known, we need a strategy that allows to explore the vast range of possible models. There is no unique way of performing this search, but an effective strategy is presented here consisting of the following steps: (1) find a good model for the phenotypic data; (2) perform a genome–wide scan for QTLs by simple interval mapping (SIM); (3) perform one or more rounds of composite interval mapping (CIM) starting with cofactors selected from the SIM step; and (4) fit a final multi–QTL model to estimate QTL effects. Each step is illustrated using the maize example data. An example code that performs the different steps in GenStat® (VSN International,

A number of models can be fitted (for example models 7 to 9 plus the unstructured model), and compared based on the AIC values. The selected mixed model will be the starting point from which to develop a QTL model. Table

Model 7 | 1077.9 | 1678 | – | – | – | 4170 |

Model 8 | 838.4 | 1671 | 239.5 | 7 | <0.001 | 3944 |

Model 9 | 619.9 | 1667 | 218.5 | 4 | <0.001 | 3736 |

Unstructured | 548.7 | 1644 | 71.2 | 23 | <0.001 | 3708 |

After choosing the phenotypic model, a genome-wide scan is performed by fitting single QTL models across the genome at marker and in between marker positions, i.e., SIM. To perform SIM, we need to estimate genetic predictors that cover the genome. For most population types and population sizes of a few hundred individuals, calculating the genetic predictors every 5–10 cM is sufficient. The genetic predictors are used to test for QTL effect at the predictor location. The unstructured model was selected for the maize data set, so the SIM scan can be done by fitting the following model at every genetic predictor position (only additive effects are tested as a previous analysis showed little dominance):
_{10} scale) for the effect of a QTL along the chromosomes. The horizontal line indicates a threshold value, above which the null hypothesis of no QTL is rejected. The profile shows evidence of QTLs on chromosomes 1, 3, 4, 6, and 10. The two largest QTLs are the ones on chromosome 1 and on chromosome 10. The lower panel shows an indication of the magnitude of the QTL effects in each of the environments at a particular chromosome position. The type of color points to the parent that contributes the high value allele (blue = maternal line, red = paternal line), and the color intensity to the magnitude of the effect. QEI is reflected in this plot by changes in color at a particular chromosome position (cross-over interaction) or by changes in intensity of the color (convergence-divergence). For example, the large QTL on chromosome 1 not only shows changes in magnitude of the effects between environments (different color intensities), but also shows change of colors. For example, while in HN96b the allele increasing yield comes from the mother (blue), in IS92a, IS94a, NS92a, SS92a, and SS94a the allele increasing yield comes from the father (red). This is an example of cross-over interaction. The large QTL on chromosome 10 shows only differences in magnitude of the QTL effect (from largest in HN96b to no effect in LN96a, LN96b, and SS92a), but always with the allele from the father contributing to higher yield.

Scanning the results across the full set of chromosomes produces a list of putative QTL positions that can be used as cofactors at the following stage of the QTL mapping.

SIM implies performing multiple tests along the genome, one test at each putative QTL position. For example, for the maize data genetic predictors were calculated at 246 chromosome positions, which means that model 13 was fitted 246 times. When performing multiple tests, the probability of at least one false positive (i.e., falsely rejecting the null hypothesis) increases according to the expression 1 − (1 − α)^{n}, with α the test level for a single test and n the number of tests. A simple correction method is the Bonferroni correction that uses α/

Modifications to the Bonferroni correction in the context of QTL mapping have been proposed by Cheverud (^{*}) instead of the actual number of tests (^{*} = 81, which gives a larger threshold ^{*} and uses it to set the corresponding significance threshold.

The power of QTL detection can be improved by reducing the background noise caused by QTLs outside the region under test. This is the principle of the CIM approach, simultaneously proposed by Jansen and Stam (_{if}c_{jf} accounts for the effects of QTLs outside the region that is being tested (X^{add}_{i}), reducing the error variation and thereby improving the power for QTL detection. Various strategies exist for the selection of a set of cofactors, but a pragmatic approach is to use the results from the SIM scan, including the positions indicative of QTLs by SIM as cofactors.

Another issue that needs to be addressed is that when testing in a region close to a cofactor, it is necessary to exclude the particular cofactor from the model to avoid colinearity with the tested position. A popular solution is to choose a window around an evaluation position such that if a cofactor falls inside that window, then the cofactor is excluded from the model. Window size affects the results of a CIM scan, and there are no clear–cut recommendations about which window size to use. For the present example, all cofactors that are on the chromosome being evaluated are excluded, a strategy known as restricted CIM.

The results of the restricted CIM scan for the maize data are presented in Figure

In a subsequent modeling step, the QTLs for all positions that were found significant in the restricted CIM scan are included simultaneously in the mixed model:
^{Q}_{q}) and QEI effects (α^{QEI}_{q}), it was possible to investigate whether QTL effects were consistent across environments or not. All QTLs but the one at the end of chromosome 3, had significant QEI (

The estimated QTL effects are given in Table ^{*}s.e., with s.e. the average standard error obtained from the REML analysis). Results for the large QTL on chromosome 1 (QTL_{1,141}) showed that the QTL had a significant effect of 0.469 ton·ha^{−1} in environment SS92a, which means that for each replacement of the maternal allele by a paternal allele, a yield increase of about half a ton is expected. The effect of the same QTL in environment HN96b had a negative sign (−0.232 ton·ha^{−1}), which means that rather than an increase, a decrease in yield is expected for the same allele substitution. The effects of QTL_{1,141} are inconsistent across environments not only in terms of the size of the effects, but also in terms of the sign of the effect. Inconsistency in size and sign of QTL effects underlies crossover interactions, the most important case of GEI (recall Figure _{10,67}) showed changes of the sizes of the effects but not of their signs, indicating that the favorable allele came always from the paternal line. The size of the QTL effect was largest in HN96b (0.564 ton·ha^{−1}), around 0.300 ton·ha^{−1} in IS92a, IS94a, NS92a, and SS94a, and not significant in LN96a, LN96b, and SS92a. Despite changes in effect sizes, in this case, selection will always be for the paternal allele. In contrast to these two QTLs, the QTL at 217 cM on chromosome 3 (QTL_{3,217}) showed a consistent effect across all environments (−0.129 ton·ha^{−1}) with the maternal allele as the yield increasing allele. The other QTLs showed different degrees of interaction with the environment, involving crossovers (QTL_{2,36} and QTL_{6,125}) or only differences in magnitude of effects (QTL_{1,252}, QTL_{3,38}, QTL_{4,136}, and QTL_{9,97}). The QTL effect information is useful at the moment of selecting complementary lines that combine in future crosses the favorable alleles coming from the maternal and paternal line.

^{−1}) for individual environments

QTL_{1, 141} |
0.469^{*} |
0.351^{*} |
0.370^{*} |
0.370^{*} |
0.214^{*} |
−0.005 | −0.002 | −0.232^{*} |

QTL_{1, 252} |
−0.026 | −0.078 | −0.292^{*} |
−0.061 | 0.182 | −0.05 | −0.106^{*} |
0.093 |

QTL_{2, 36} |
−0.123 | −0.304^{*} |
−0.329^{*} |
−0.026 | −0.091 | −0.003 | 0.131^{*} |
0.106 |

QTL_{3, 38} |
0.224^{*} |
0.236^{*} |
0.035 | 0.323^{*} |
0.241^{*} |
−0.007 | 0.152^{*} |
0.480^{*} |

QTL_{3, 217} |
−0.129^{*} |
−0.129^{*} |
−0.129^{*} |
−0.129^{*} |
−0.129^{*} |
−0.129^{*} |
−0.129^{*} |
−0.129^{*} |

QTL_{4, 136} |
−0.272^{*} |
−0.344^{*} |
−0.456^{*} |
−0.147 | −0.293^{*} |
−0.093^{*} |
−0.107^{*} |
−0.262^{*} |

QTL_{6, 125} |
−0.006 | 0.015 | −0.332^{*} |
0.061 | 0.004 | −0.096^{*} |
0.116^{*} |
−0.155 |

QTL_{9, 97} |
0.187^{*} |
0.251^{*} |
0.386^{*} |
0.016 | 0.023 | 0.026 | −0.018 | 0.021 |

QTL_{10, 67} |
0.056 | 0.324^{*} |
0.258^{*} |
0.251^{*} |
0.322^{*} |
0.072 | 0.054 | 0.564^{*} |

An interesting possibility with the QTL models presented here is that they allow the inclusion of environmental information to explain QTL effects in terms of sensitivities to environmental factors. Similarly to GEI models in which environmental information can be integrated to describe GEI effects, QEI models can integrate environmental information to describe QEI effects. Expressing QTL effects in terms of sensitivities to a particular environmental factor allows prediction of the effect of the QTL under any condition within the range of the original experiments. In addition, the inclusion of environmental information can help unravel the physiological mechanisms that are behind the action of a particular QTL.

The final QTL model for the maize example data consisted of nine QTLs. It can now be investigated as to whether the variation in effects of those QTLs is related to changes in one or more external environmental variables (There exists a strong analogy with the factorial regression models discussed for GEI, models 6a and 6b). Figure _{1,141} effects across environments vs. the minimum temperature during flowering time. The plot shows a negative relationship between the QTL effect and temperature.

Assuming a simple linear relationship between the effect of a QTL and a given environmental covariable, it is possible to test for that relationship using the following model:
^{*}). However, the procedure can be applied equally well to other QTLs with environment–specific effects. In model 16, the effect of the QTL is expressed in relation to an environmental covariable (Z), where the effect of the QTL is equal to: α_{jq*} = α_{q*} + β_{q*}Z_{j} + _{jq*}. Z_{j} represents the value of the covariable Z for environment _{j} is centered around zero, the parameters of the QTL effects can be interpreted as follows: α_{q*} corresponds to the effect of QTL in the average environment (that is, when Z = 0); β_{q*} corresponds to the change of the QTL effect per unit of change of the covariable's value; and the random term _{jq*} corresponds to the residual (unexplained) QTL effect, with _{jq*} ~ N(0, σ^{2}_{aq*}). For example applying model 16 to QTL_{1,141}, and with minimum temperature during flowering time as covariable, showed a significant reaction of QTL_{1,141} to changes in the minimum temperature during flowering, with β estimate equal to −0.040 ton ha^{−1} °C^{−1}. We can interpret this result saying that when the maternal allele is replaced by the paternal allele, we expect a yield decrease of 0.040 ton ha^{−1} for each degree Celsius of increase in the minimum temperature during flowering.

The example assumed a simple linear relationship between the QTL effect and a single environmental covariable, but more complex explanatory models can be constructed. For example, it is possible to include higher order terms to model the response curve (e.g., a quadratic term), to use spline formulations, or to include more than one environmental covariable in the model. It is important to mention that a close interaction with physiologists is crucial to explore and select biologically sound models.

We have discussed a suite of statistical models that are useful to plant breeding practitioners who are dealing with GEI. What all models have in common is that they make an attempt to replace the ANOVA GEI_{ij} term by product terms of genotypic parameters/covariates and environmental parameters/covariates, with as examples b_{i}z_{j} (FW, AMMI, and GGE), b_{i}Z_{j} (factorial regression), and X_{i}α_{j} (QTL mapping). For some models no other information than the two-way table of means is required (FW, AMMI, and GGE), others require explicit environmental (factorial regression) and/or genotypic information (QTL models). For exploring patterns of GEI, FW, AMMI, and GGE are very useful. For prediction and understanding, factorial regression and QTL models are more appropriate.

The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

We acknowledge the support of the Generation Challenge Program—Integrated Breeding Platform projects 2.2.1 and 3.2.4 for supporting the work presented here. We also thank three anonymous reviewers for helping us to improve the manuscript.

The Supplementary Material for this article can be found online at: