^{*}

Edited by: Jean-Baptiste Poline, University of California, Berkeley, United States

Reviewed by: Gang Chen, National Institutes of Health (NIH), United States; Martin A. Lindquist, Johns Hopkins University, United States

This article was submitted to Brain Imaging Methods, a section of the journal Frontiers in Neuroscience

This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.

Group-level repeated measurements are common in neuroimaging, yet their analysis remains complex. Although a variety of specialized tools now exist, it is surprising that to-date there has been no clear discussion of how repeated-measurements can be analyzed appropriately using the standard general linear model approach, as implemented in software such as SPM and FSL. This is particularly surprising given that these implementations necessitate the use of multiple models, even for seemingly conventional analyses, and that without care it is very easy to specify contrasts that do not correctly test the effects of interest. Despite this, interest in fitting these types of models using conventional tools has been growing in the neuroimaging community. As such it has become even more important to elucidate the correct means of doing so. To begin, this paper will discuss the key concept of the

The modeling of group-level repeated measurements is a common, yet complex, topic in neuroimaging. Although a variety of specialized tools are now available (e.g., Chen et al., ^{1}

For researchers who are less familiar with classical linear model theory, some of the requirements of repeated-measurement models can seem esoteric. However, these models are based on the key statistical concept of

In order to begin understanding the requirements of repeated measurement models when implemented in the GLM, the concept of EMS in ANOVA models must be understood. A basic aim of any ANOVA model is to split the data into different sources of variation. These sources of variation are formalized in terms of the calculation of

To understand the use of EMS in deriving suitable

where μ is the overall mean, α_{i} is the effect of the _{j} is the _{ij} is the _{ijk} is random error (

For this basic ANOVA design, the correct error term for the omnibus main effects and interaction is the model variance σ^{2}. To see why this is the case, we can calculate the EMS. Although possible to derive the EMS formally through the use of the expectation operator, a more practical approach involves following some basic rules. In this paper, the rules given by Kutner et al. (

Arithmetic for the derivation of the EMS in a 2-way between-subjects ANOVA model, using the method of Kutner et al. (

_{A} |
_{B} |
_{AB} |
_{E} |
|||||
---|---|---|---|---|---|---|---|---|

α_{i} |
0 | 0 | 0 | 0 | ||||

β_{j} |
0 | 0 | 0 | 0 | ||||

(αβ)_{ij} |
0 | 0 | 0 | 0 | 0 | |||

ϵ_{k(ij)} |
1 | 1 | 1 | σ^{2} |
1 | 1 | 1 | 1 |

Using _{A} by _{E}. As such, a test for the effect of A can be constructed using the ratio of the mean square of A (MS_{A}) and the mean square of the errors (MS_{E}). Continuing in this fashion, suitable

The numerator and denominator mean squares from Equation (2) used to form appropriate

Factor A | MS_{A}/MS_{E} |

Factor B | MS_{B}/MS_{E} |

A × B | MS_{AB}/MS_{E} |

For between-subject designs containing only fixed-effects it is rarely necessary to calculate the EMS as a suitable denominator for the

To begin, consider a basic mixed-measures design containing a single within-subject factor and a single between-subjects factor. The model is

where μ is the overall mean, α_{i} is the effect of the _{j} is the effect of the _{ij} is the _{k(j)} is the random effect of the _{ijk} is random error assumed _{k(j)} indicates that the

One of the key differences between the model in Equation (1) and the model in Equation (3) is the inclusion of the random subject effects _{k(j)}. Although possible to forego the subject effects and work with a pooled error term (Penny and Henson, _{k(j)} allows one to

where the splitting of the singular error term is now more explicit. The complication for the traditional neuroimaging GLM framework is that the error term used as the denominator for the test statistics is derived implicitly from the difference between the data and the model prediction. This means that for the model in Equation (4), only

As with before, the breakdown of the arithmetic in _{S} as the denominator instead. As discussed above, only MS_{E} will be used as the denominator in neuroimaging software implementing the traditional GLM approach. Testing of the effect of B therefore requires specifying a separate model where the final error term is forced to become MS_{S}. This can be achieved by averaging the raw data over the levels of the within-subject factor, and will be discussed in more detail in the example analysis given at the end of this paper.

Arithmetic for the derivation of the EMS in a 2-way mixed ANOVA with a single within-subject and a single between-subjects factor.

_{A} |
_{B} |
_{AB} |
_{S} |
_{E} |
|||||
---|---|---|---|---|---|---|---|---|---|

α_{i} |
0 | 0 | 0 | 0 | 0 | ||||

β_{j} |
0 | 0 | 0 | 0 | 0 | ||||

(αβ)_{ij} |
0 | 0 | 0 | 0 | 0 | 0 | |||

_{k(j)} |
1 | 1 | 0 | 0 | 0 | ||||

ϵ_{k(ij)} |
1 | 1 | 1 | σ^{2} |
1 | 1 | 1 | 1 | 1 |

The EMS ratios used to form appropriate

A | MS_{A}/MS_{E} |

B | MS_{B}/MS_{S} |

A × B | MS_{AB} / MS_{E} |

The situation with multiple error terms becomes more complex as the number of within-subject factors increases. Consider adding another within-subject factor to the model in Equation (3). This produces a 3-way mixed-measures ANOVA model, which can be written as

where β_{j} is now the effect of the _{k} is the effect of the _{il(k)} and the interaction with the second within-subject factor (_{jl(k)}. Because these are interactions with a random factor, these effects are also considered random-effects and thus represent a further partitioning of the error term. Although it may initially appear as though a 3-way interaction with subject could also be included, this is not possible as it would be perfectly collinear with the errors. This is a clue to the fact that the error term in this model

As this is a much larger model than the previous example, the derivation of the EMS is more lengthy process. As with before, the arithmetic is presented in _{S}, MS_{SA}, MS_{SB}, and MS_{E}, respectively. As indicated above, MS_{E} could equivalently be written as MS_{SAB} to denote the equivalence with the highest-order interaction between the subjects and within-subject factors. Suitable tests for the model effects are given in

Arithmetic for the derivation of the EMS in the 3-way mixed-measures ANOVA with two within-subject and one between-subjects factor.

_{A} |
_{B} |
_{C} |
_{AB} |
_{AC} |
_{BC} |
_{ABC} |
_{S} |
_{SA} |
_{SB} |
_{E} |
||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|

α_{i} |
0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | |||||

β_{j} |
0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | |||||

γ_{k} |
0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | |||||

(αβ)_{ij} |
0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | ||||

(αγ)_{ik} |
0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | ||||

(βγ)_{jk} |
0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | ||||

(α_{ijk} |
0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | |||

_{l(k)} |
1 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | |||||

(_{il(k)} |
0 | 1 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | |||||

(_{jl(k)} |
0 | 1 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | |||||

ϵ_{l(ijk)} |
1 | 1 | 1 | 1 | σ^{2} |
1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 |

The EMS ratios used to form appropriate

A | MS_{A}/MS_{SA} |

B | MS_{B}/MS_{SB} |

C | MS_{C}/MS_{S} |

A × B | MS_{AB}/MS_{E} |

A × C | MS_{AC}/MS_{SA} |

B × C | MS_{BC}/MS_{SB} |

A × B × C | MS_{ABC}/MS_{E} |

EMS are a necessary concept in ANOVA models in order to define suitable tests for the model effects. In purely fixed-effects models it is rarely necessary to explicitly calculate the EMS as a suitable denominator for each test is always given by the overall error term. When random effects are included, such as in repeated-measurements models with partitioned errors, complications arise in the derivation of suitable tests. As a minimum, models with a single within-subject factor have a choice of two error terms to form tests, whereas those with multiple within-subject factors have multiple possibilities when forming tests. It is precisely this issue of specifying the correct error term that leads to problems when using neuroimaging software designed to only use a single error term. Unless the EMS are taken into consideration it is entirely possible to end up with _{B} from Equation (3) using MS_{E} would not result in a test of the between-subject effect, but a test of the between-subject effect _{E} as the denominator of the

In the previous section we saw the importance of using EMS to derive suitable error terms in ANOVA models. Although this represents the core issue at the heart of implementing repeated measurement models in the GLM, it is also worth considering the practical question of how questions can be asked of these models in the form of contrast weights. Although the contrast framework is a well-established aspect of hypothesis testing in the neuroimaging GLM (e.g., Poline et al.,

Consider the overparameterized design matrix for a 2 × 2 mixed-measures ANOVA with

which tells us the combination of parameters needed to calculate the prediction for the A1B1 cell for subject 1, defining an estimable function of the parameters (see McFarquhar,

which is a combination of both the fixed and random model effects. Although the subject effects are not of interest, their inclusion in the design matrix means we cannot simply give them a weight of 0 when calculating cell or marginal means as this would define a non-estimable function. As such, calculation of the ANOVA effects from cell and marginal means will include the subject terms. For certain ANOVA effects, the subject terms will cancel in the numerator, whereas for others they will not. For those where they do not, an error term must be selected such that the subject effects are also present in the denominator. This is simply a re-statement of the general approach to constructing

As an example, consider deriving the weights for testing the main effect of the within-subject factor A. To do so, we can first average the rows in Equation (9) which code the first level of factor A and then average the rows which code the second level of factor A^{2}

providing the weights for calculating the marginal means of factor A. Notice that these weights are non-zero for the subject effects. The weights for the main effect are then formed from the subtraction of the weights for the marginal means, giving

where we can see that the subject effects have canceled. Now consider deriving the weights for testing the main effect of the between-subject factor B. Taking a similar approach to above we find

which provide the weights for the marginal means of factor B which again contain non-zero weights for the subject effects. Subtracting these weights gives

which notably is still non-zero for the subject terms. This has direct correspondence with the definitions of the EMS from earlier where EMS_{B} in Equation (3) contains

To see how the discussions in the previous section are readily applicable to non-overparameterized models (such as those used in FSL FEAT), consider the design matrices given in Equation (16). These are both constrained versions of the matrix from Equation (9), with ^{(B)} using “treatment” coding and ^{(C)} using “sigma-restricted” coding (see McFarquhar,

Application of the earlier approach to deriving contrasts leads to the weights for the effect of within-subject factor A in model B (

which, as with before, do not contain weights for the subject effects. Similarly, the contrast for between-subjects factor B can be derived for both alternative codings as shown in Equation (18).

which again contain weights for the subject effects and are therefore inappropriate when dealing with software that only implements a single error term.

Another aspect of hypothesis testing in ANOVA models is the use of ^{2} when rank (

which contains no weights for the subject effects and so can be tested with the overall error term of the model. Alternatively, if we wanted to examine the effect of the between-subjects factor B at the first level of the within-subject factor A, the weights would be

which does contains non-zero values for the subject effects. This is perhaps not surprising given that this simple main effect is a between-subject comparison, constrained to only use the estimates from the first level of factor A. Nevertheless, it demonstrates that the error term for the omnibus test may not always be appropriate for testing the simple effects. If one did wish to test this effect, another between-subjects model would need to be specified containing only the data from the first level of the within-subject factor, adding further complication to the approach necessitated by the implementation of the GLM in common neuroimaging packages.

Although contrast weights are a familiar concept for hypothesis testing in the GLM, the inclusion of the random subject effects can make their derivation more difficult depending on the design matrix coding options available. A general approach has been given whereby weights can always be reliably derived using the rows of the design matrix. In addition, this section has shown how contrast weights can provide a complimentary perspective on the issue of suitable error terms. In particular, weights that are derived correctly but contain non-zero values for the subject effects are not suitable for testing with the overall error-term of the model. This provides a useful rule-of-thumb for neuroimaging researchers, particularly when it comes to follow-up tests of interactions, where extra care must be taken given that a suitable error term is not necessarily the same as the error term used for the omnibus effect.

Now that the core theoretical concepts of repeated measurements models have been described, we turn to the practical aspect of specifying partitioned-error ANOVA models in neuroimaging software. Based on the discussions in the preceding sections, four generic steps for correctly specifying these models are:

Calculate the EMS for the complete model. The number of error terms corresponds to the number of separate models that need to be estimated.

For each model, identify which within-subject factors are

Use contrasts at the 1st-level to average-over the various factors identified above.

Specify the 2nd-level models using the 1st-level contrasts created in the previous step and then derive the contrast weights using the design matrices.

These steps can be used with any software implementing the mass-univariate GLM approach to modeling group-level neuroimaging datasets. To make these steps clear, an example will now be provided of specifying a 3-way mixed-measures ANOVA using the Flexible Factorial module in SPM12.

The example data set comes from a previously reported fMRI study by Trotter et al. (

The model we wish to fit is given in Equation (6) and is repeated below

In the context of the example dataset, α_{i} is the _{j} is the _{k} is the _{k} where _{1} = 14 and _{2} = 16. As discussed earlier, the

The EMS for this design have already been derived using

ANOVA table for the example 3-way mixed-measures model.

Drink | 1 |

Error: Subject(Drink) | 28 |

Location | 1 |

Location × Drink | 1 |

Error: Subject(Drink) × Location | 28 |

Texture | 2 |

Texture × Drink | 2 |

Error: Subject(Drink) × Texture | 56 |

Texture × Location | 2 |

Texture × Location × Drink | 2 |

Error: Subject(Drink) × Location × Texture | 56 |

Based on the tests given in

To understand why averaging over different effects produces the correct error term, consider the model where we wish to enforce

Clearly we cannot include any of the terms containing α_{i} from the full model, but notice that we cannot include (_{jl(k)} either. This is because this term would now be _{jl(k)}. This connects directly with the EMS from Equation (7) where the averaged model has effectively removed the σ^{2} term from EMS_{B} and EMS_{BC}, leaving only

The basic 1st-level models for this dataset contain boxcar regressors for each of the

Example design matrices for the different models, as specified in SPM12 using the Flexible Factorial module^{3}^{4}

where columns 1 and 2 code the levels of

Comparison of the design matrices produced by SPM12 for the different error terms.

The results from this model for a single voxel are shown in both SPM and SPSS version 23 (

Comparison of the results produced by SPM and SPSS 23 for data from a single voxel. Equivalence of the

It is worth noting that although the

In the previous sections we have seen how complex mixed-measures models can be specified using the GLM framework, as implemented in standard neuroimaging software packages. However, the discussion has so far neglected the formation of ANCOVA models by the inclusion of continuous covariates. Putting aside issues of whether it is meaningful to use certain covariates to “control” for concomitant factors in quasi-experimental situations (see Miller and Chapman,

The extension of the basic mixed-measures ANOVA model from Equation (3) to a mixed-measures ANCOVA model is given by Federer and King (

where _{ik(j)} is the raw covariate value for repeated measurement _{1} gives the between-subject regression slope and β_{2} gives the within-subject regression slope. The inclusion of both regression coefficients is in-line with the recommendations of Federer and King (_{1} and the within-subject effects are adjusted for β_{2}. The model in Equation (22) is therefore the basis for any mixed-measures model that contains a continuous covariate. Implementation will largely depend on whether the covariate in question is measured between-subjects or within-subject, as will be discussed below.

A between-subjects covariate (also known as _{ik(j)} term redundant and the model can be simplified to only contain the _{.k(j)} term. Note that when using the multiple-model approach advocated in the previous section, the covariate must be tested within the same model as the other between-subject effects and interactions (i.e., the

A within-subject covariate (also known as _{ik(j)}. In a similar vein to testing the traditional ANOVA effects, the parameter associated with _{ik(j)} should be tested using the within-subject error term.

As a more involved example, consider an ANCOVA for the complete 2 × 3 × 2 mixed-measures design from section 4. Assuming there is a covariate value per-cell of the design, extension of the

where _{ijl(k)} is the raw covariate value,

ANOVA table for the example 3-way mixed-measures model including a within-subject covariate.

Covariate ( |
1 |

Drink | 1 |

Error: Subject(Drink) | 27 |

Covariate ( |
1 |

Location | 1 |

Location × Drink | 1 |

Error: Subject(Drink) × Location | 27 |

Covariate ( |
1 |

Texture | 2 |

Texture × Drink | 2 |

Error: Subject(Drink) × Texture | 54 |

Covariate (_{ijl(k)}) |
1 |

Texture × Location | 2 |

Texture × Location × Drink | 2 |

Error: Subject(Drink) × Location × Texture | 54 |

An additional complication arises when one of the covariates in Equation (24) is associated with one within-subject factor, but is constant over the other. In this situation there will be redundancies in the definitions of the four covariates. For instance, if one of the within-subject factors was time and a value was measured only once per-visit, the value would be constant across any other within-subject variables. Implementation of this design would then be similar in spirit to the use of between-subject covariates in Equation (22), insofar as all covariates terms would attempt to enter each model, but would then be dropped wherever redundancies are found.

One final important issue to discuss is the much-maligned sphericity assumption of the traditional repeated-measures ANOVA. In brief, the validity of the

which indicates that for all pairs of measurements the variance of their differences are identical. This is therefore a similar (but less restrictive) case of compound symmetry (Davis,

Traditionally, departures from sphericity are assessed using hypothesis tests, such as described by Mauchly (

This paper has discussed the modeling of group-level repeated measurements in neuroimaging, using the traditional GLM framework. The core statistical concept of the EMS has been discussed and from these discussion a set of steps for implementing these forms of models in software have been given. Additional considerations, such as covariates and the assumption of sphericity, have also been discussed. The main conclusion from this paper is that if one wishes to use traditional neuroimaging analysis tools for this purpose, great care must be taken to correctly derive the tests from the EMS and then to carefully implement the multiple models necessitated by the GLM framework. In doing so, it is important to carefully consider the contrasts used and the error-terms employed, especially for follow-up tests from interaction effects. To that end, the oft-quoted advice given by Gläscher and Gitelman (

In terms of a more general conclusion from this paper, it is important for developers of neuroimaging analysis packages to recognize that the onus of correctly specifying these models should not be placed on the users. Indeed, considering the methods outlined in this paper it would seem wholly unfair to expect that users would know to perform the given steps without any clear guidance. Instead, software developers should strive for improved usability and clarity in their implemented methods. Although usability is not always considered as carefully compared with commercial software, it is hopefully clear that only by providing user-friendly and well-documented software can the neuroimaging community be confident in the accuracy of their methods. This is particularly true given recent concerns about the accuracy of common neuroimaging analysis approaches (Eklund et al.,

The author confirms being the sole contributor of this work and has approved it for publication.

The author declares that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

The author would like to thank Dr. Paula Trotter for kindly allowing the use of her dataset in this paper.

^{1}It is notable that AFNI (

^{2}Be aware that this approach will not work with unbalanced design matrices. See section 4.2.4 for how this procedure can be adjusted for those cases.

^{3}For the current release of SPM12, it is necessary to alter the

^{4}Removal of the subject blocks allow for derivation of Type III sums-of-squares in unbalanced designs by reducing the design matrix to a balanced unique-row form. This is based on assuming that the effect in question is being testing within an appropriate model and thus the weights on the subject effects will be zero. If this is not the case, then a non-estimable contrast will be returned.