^{1}

^{2}

^{*}

^{3}

^{1}

^{2}

^{1}

^{2}

^{1}

^{2}

^{4}

^{1}

^{2}

^{1}

^{1}

^{2}

^{3}

^{4}

Edited by: Holmes Finch, Ball State University, United States

Reviewed by: Akihito Kamata, Southern Methodist University, United States; Ke-Hai Yuan, University of Notre Dame, United States

This article was submitted to Quantitative Psychology and Measurement, a section of the journal Frontiers in Psychology

This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.

When analyzing complex longitudinal data, especially data from different educational settings, researchers generally focus only on the mean part (i.e., the regression coefficients), ignoring the equally important random part (i.e., the random effect variances) of the model. By using Project English Language and Literacy Acquisition (ELLA) data, we demonstrated the importance of taking the complex data structure into account by carefully specifying the random part of the model, showing that not only can it affect the variance estimates, the standard errors, and the tests of significance of the regression coefficients, it also can offer different perspectives of the data, such as information related to the developmental process. We used xxM (Mehta,

Educational researchers have always involved complex data structure. For example, in cross-sectional studies, students are likely nested within classrooms and schools at a particular time point (i.e., a strictly hierarchical structure), and while they may come from different neighborhoods, neighborhoods and schools are not nested but crossed with each other (i.e., a cross-classified structure). Similarly, for longitudinal data, repeated measures (e.g., reading achievement test scores collected at different grade levels from the same student) are nested within students while the students are likely to change classrooms over the course of study. A change of classroom results in a non-strictly hierarchical, but cross-classified structure, with repeated measures now nested within both students and classrooms, while students and classrooms are crossed with each other (see Figure

Although most educational researchers realize the importance of taking into account the complex data structure when they analyze their data, they may not be aware of how to

The purpose of this paper was to demonstrate how to analyze this type of complex data structure with the use of data from the Project English Language and Literacy Acquisition (ELLA), a large-scale longitudinal study. The researchers intervened with and followed English language learners (ELLs) from kindergarten to third grade, which was funded by the U.S. Department of Education (Grant Number:

We first provide a brief review of the Project ELLA and the data derived from it. We, then, analyze the data with the commonly used hierarchical linear model [HLM] approach. We subsequently move from this HLM model to the more complex cross-classified random effect model (CCREM) which addresses the complex data structure issue by taking into account the classroom effect. However, the CCREM has its own limitations and is unable to address some of the important features of longitudinal data (which is representative of the dataset from Project ELLA), such as the potential carryover effect (i.e., the effect from the previous grade level on the later time measures). To address this special feature, we used the xxM software (Mehta,

Project ELLA (Lara-Alecio,

Texas state law (Texas Education Code,

In the current study, we used a partial data set from the original data. This data set included scores on the English version of the Woodcock Language Proficiency Battery–Picture Vocabulary subtest (EWPV) of 876 students at five time points: Time 1 = beginning of kindergarten (2004), Time 2 = end of kindergarten (2005), Time 3 = end of first grade (2006), Time 4 = end of second grade (2007), and Time 5 = end of third grade (2008).

As shown in Table

Descriptive statistics.

Male | 470 (53.65%) | 470 (53.71%) | 343 (53.34%) | 231 |
191 (51.21%) |

Female | 403 (46.00%) | 402 (45.94%) | 297 (46.19%) | 206 |
179 (47.99%) |

Age (months) | 59.72 (5.08) | 71.72 (5.08) | 83.84 (5.01) | 95.67 |
107.92 (4.64) |

Control | 390 (44.52%) | 390 (44.57%) | 295 (45.88%) | 222 (50.45%) | 192 (51.47%) |

Treatment | 486 (55.48%) | 485 (55.43%) | 348 (54.12%) | 218 (49.55%) | 181 (48.53%) |

We present three models, of which the first two are commonly used in educational research; namely, the hierarchical linear model (HLM) and the cross-classified random effect model (CCREM). The third, the xxM-UN1 model, is a more advanced and flexible model, which not only takes into account the complex data structure but also provides new modeling feature that allows researchers to examine such effects as potential carryover in longitudinal analysis. The results from these analytic approaches are compared, and the advantages and disadvantages of each model are discussed.

Even though the analyses have been conducted under both multilevel modeling (MLM; i.e., hierarchical linear modeling, HLM) and structural equation modeling (SEM) frameworks, we prefer using the multilevel modeling framework to present the models for our analyses, given its simplicity for comprehension and the equivalence between the two models (Curran,

Unlike the cross-sectional multilevel model, there is always an important predictor for longitudinal analysis: time. Researchers are particularly interested in examining the average trend of an outcome variable (in this paper, the Woodcock Language Proficiency Battery–Picture Vocabulary subtest; EWPV) over time. Nevertheless, many longitudinal and developmental phenomena are not linear in nature. In other words, the change of the outcome variable will not happen at a constant rate over time. For example, we may have a simple linear time-predicted model, Math = B0 + B1 Time + e, where Math is the math achievement outcome variable, Time is the time predictor with grade year as the unit, and e as the error. B0 is intercept, B1 (positive and significantly larger than zero) is the regression coefficient, which can be explained as one unit changes in time or one grade year passes, and B1 points change in the math achievement score. More importantly, this model implies the constant improvement in math achievement (with B1 points per grade year regardless of the actual grade year in which the students are located). Hence, fitting a nonlinear model rather than assuming a linear trend is common in analyzing longitudinal data (Kwok et al.,

A relatively, more simple way to capture a nonlinear trend is using a piecewise model (Bryk and Raudenbush,

For our current demonstration, given the data collection time frame, we determined to use a piecewise model containing two pieces to capture the potential nonlinear trend, with the first piece containing the first two time measures (i.e., beginning and end of kindergarten) and the second piece containing the rest of the three time measures (i.e., end of first grade, end of second grade, and end of third grade). As described previously, we proposed analyzing the data with a piecewise model containing two pieces (a.k.a. a two-piece model). By using the traditional HLM, which assumes a strictly hierarchical structure, we have analyzed our data as a three-level model with repeated measures (level 1) nested within students (level 2) and students further nested within their corresponding kindergarten classrooms (level 3) without considering their mobility (i.e., change of classroom in later time points). The corresponding model equations are presented as follows:

where EWPV is the target outcome variable for the t-th repeated measure from the i-th student of the j-th

We used the following coding scheme:

with piece1 coded as (0,1,1,1,1) and piece2 coded as (0,0,1,2,3) for the five repeated measures. π_{0ij} is the intercept (or the baseline/predicted EWPV score at the beginning of kindergarten) based on the repeated measures from the i-th student of the j-th kindergarten classroom. Similarly, π_{1ij} is the linear rate of change of the first piece (i.e., from the beginning of kindergarten to the end of kindergarten) while π_{2ij} is the linear rate of change of the second piece (i.e., from the end of first to the end of third grade) from the i-th student of the j-th kindergarten classroom. Given that we had 876 students in the data, and we used the repeated measures from each student to fit the above two-piece model, we should have 876 sets of regression coefficients (i.e., π_{0ij}, π_{1ij}, & π_{2ij}), which can be written into the following equations:

where β_{0j} is the average intercept coefficient across all the students within the j-th kindergarten classroom; β_{1j} is the average piece1 regression coefficient across all the students within the j-th kindergarten classroom, and β_{2j} is the average piece2 regression coefficient across all the students within the j-th kindergarten classroom.

We further obtained the corresponding average coefficient estimates across all kindergarten classrooms, as presented^{1}

where γ_{00}, γ_{10}, and γ_{20} are the average intercept, piece1 and piece2 coefficients across all kindergarten classrooms assuming a nonsignificant treatment effect.

As stated previously, one of the main purposes of the Project ELLA was to examine the effectiveness of the enhanced practice setting (i.e., the treatment condition) on EWPV. To examine this treatment effect, we included the treatment variable in the level-3 equations, given that the randomization was at the classroom/school level. In other words, students from the same kindergarten classroom received the exact same treatment or control materials. _{j} is a dummy-coded variable with treatment condition coded as 1 and control condition coded as 0. Hence, if there is a significant treatment effect at intercept, we expect that γ_{01} will not be zero and the intercept for the control condition will be γ_{00} whereas the intercept for the treatment condition will be (γ_{00} + γ_{01}). Similarly, if there are significant treatment effects at both piece1 and piece2, we would expect that both γ_{11}and γ_{21}will not be zero and the average piece1 coefficient will be γ_{10} for the control condition and γ_{10} + γ_{11} for the treatment condition, the same as the average piece2 coefficient with γ_{20} for the control condition and γ_{20} + γ_{21} for the treatment condition.

By substituting Equations (2) and (3) back into equation (1), we can get the following overall

The corresponding random effect variances that capture the variation at different levels are as follows:

V(_{tij}) = σ^{2} (within-student-level variance with the identity structure assumption)

V(_{0ij}) = τ_{00} (between-student-level intercept variance)

V(_{1ij}) = τ_{11} (between-student-level piece1 variance)

V(_{2ij}) = τ_{22} (between-student-level piece2 variance)

V(_{0ij}) = θ^{2} (kindergarten classroom-level variance). We used the R package xxM (Mehta,

As presented in Table _{01} (i.e., the treatment effect at the beginning of kindergarten). Hence, the overall average piecewise model for the control group (i.e., _{j} = 0) was:

whereas the overall average piecewise model for the treatment group (i.e., _{j} = 1) was:

which could be further reduced to:

Summary of 3-Level HLM, CCREM, and xxM-UN1 model results.

Intercept (γ_{00}) |
435.60^{*} |
[432.91, 438.28] | 436.99^{*} |
[434.31, 439.67] | 437.07^{*} |
[434.16, 440.00] |

Piece 1 (γ_{10}) |
13.75^{*} |
[11.96, 15.54] | 13.15^{*} |
[11.36, 14.94] | 13.12^{*} |
[11.51, 14.72] |

Piece 2 (γ_{20}) |
9.64^{*} |
[8.95, 10.34] | 9.47^{*} |
[8.23, 10.71] | 9.66^{*} |
[8.90, 10.44] |

Treatment (γ_{01} |
−2.43 | [−7.41, 2.61] | −3.12 | [−7.20, 1.03] | −7.06^{*} |
[−11.96, -1.94] |

P1 × Treat (γ_{11} |
2.41^{*} |
[0.01, 4.80] | 3.42^{*} |
[1.04, 5.81] | 3.46^{*} |
[1.32, 5.63] |

P2 × Treat (γ_{21} |
1.60^{*} |
[0.59, 2.62] | 0.59 | [−1.36, 2.53] | 1.42^{*} |
[0.23, 2.57] |

Student | ||||||

Intercept (τ_{00} |
137.36 | 157.04 | 193.44 | |||

P1 (τ_{11} |
2.13 | 3.52 | 29.37 | |||

Cov(Int, P1) | −17.10 | −23.50 | −39.80 | |||

P2 (τ_{22} |
0.08 | 0.16 | 3.66 | |||

Cov(Int, P2) | −3.22 | −5.02 | −13.56 | |||

Cov(P1, P2) | 0.40 | 0.75 | 8.27 | |||

Class (θ^{2}/ψ^{2}) |
97.39 | 64.49 | – | |||

K | – | – | 185.37 | |||

Grade 1 | – | – | 12.32 | |||

Grade 2 | – | – | 10.17 | |||

Grade 3 | – | – | 5.63 | |||

Within (σ^{2}) |
165.12 | 145.92 | – | |||

Deviance | 25,959 | 25,889 | 25,423 | |||

AIC | 25,987 | 25,917 | 25,479 | |||

BIC | 26,071 | 26,002 | 25,648 |

Based on the average models as presented above and in Table 2, we have learned that the average EWPV for the control group at the beginning of kindergarten was 435.6 whereas the average EWPV score for the treatment group was slightly (but not significantly) lower (2.43 points lower). We have also learned that the average growth rate (or change) in EWPV was not a linear trend given that the regression coefficients of the two pieces were quite different from each other for both treatment and control groups (i.e., _{tij} _{tij} for the control condition and _{tij} _{tij} for the treatment condition). That is, we found a faster growth or improvement rate of EWPV within the kindergarten grade year and a slower growth rate of EWPV after kindergarten (i.e., from first to third grade) for both conditions, except that the students in the treatment condition, on average, showed greater improvement at the end of the kindergarten (16.16 points for the treatment condition vs. 13.75 points for the control condition) as well as at the end of first to third grade (11.24 points for the treatment condition vs. 9.64 points for the control condition). These differences in growth rates show the effectiveness of the Project ELLA enhanced materials and practice on improving the students' EWPV over time.

In general, researchers are more interested in the significance of the mean part (i.e., the regression coefficients) and pay less attention to the variance part of the model. Nevertheless, the variance part carries as much important information as the mean part (e.g., treatment effect is sometimes found in the variance part instead of the mean part of the model (Hedeker and Mermelstein,

Given that we analyzed the data as a three-level, strictly hierarchical model, the corresponding variance estimates for the different levels are presented in Table ^{2} = 165.12 (within-student-level variance with the identity structure assumption), τ_{00} = 137.36 (between-student-level intercept variance), τ_{11} = 2.13 (between-student-level piece1 variance), τ_{22} = .08 (between-student-level piece2 variance), and θ^{2} = 97.39 (

All these variances were statistically significant, which indicates a significant amount of variation within students across all the repeated measures and between students across all kindergarten classrooms. Consistent with many previous longitudinal studies using multilevel models, we found that the intercept variance (i.e., τ_{00} = 137.36) was, in general, substantial larger than the variances of the two growth pieces (i.e., τ_{11} = 2.13 and τ_{22} = 0.08).

There are a couple of limitations to this model. First, it only partially takes into account the classroom effect (i.e., only kindergarten), which may lead to biased estimation of both regression coefficients and the random effect variances. Moreover, only modeling the kindergarten effect restricts the possibility of modeling the other grade-level effects, such as the potential carryover effect from previous grade levels (e.g., first grade) to later EWPV score (e.g., measured at third grade).

Another way to analyze this longitudinal data set is to apply the cross-classified random effect model (CCREM; Luo and Kwok,

where EWPV is the target outcome variable for the t-th repeated measure from the i-th student of the j-th classroom and piece1 and piece2 are the time variables with the exact same coding scheme. The major difference between this model and Model 1 is the presentation and meaning of the subscript (specifically the “j” subscript). Unlike in Model 1 where the j subscript is only for a particular kindergarten classroom, the j subscript in Model 2 represents a particular classroom of any grade level (i.e., from kindergarten to third grade). That is, the students are no longer nested only within the kindergarten classrooms, as shown in Figure _{11}, O_{12} and O_{13}), as does Student S2 (O_{21}, O_{22} and O_{23}). Students S1 and S2 are in different kindergarten classrooms (KC_{1} for S1 and KC_{2} for S2) but are in the same classroom in first grade (G1C_{1}) and are assigned to different classrooms second grade (G2C_{1} for S1 and G2C_{2} for S2). Hence, the repeated measures (i.e., Os) are nested both within students (S1 and S2) and classrooms (KC_{1}, KC_{2}, G1C_{1}, G2C_{1} and G2C_{2}), whereas students and classrooms are crossed instead of nested.

Given that student and classrooms are crossed with each other, the level-2 model in CCREM includes both students and classrooms simultaneously as presented below:

where γ_{00}, γ_{10}, and γ_{20} are the average intercept, piece1 and piece2 coefficients across all classrooms, assuming the non-significant treatment effect. On the other hand, given that the randomization was at the classroom level, we included the dummy-coded treatment variable, treatment_{j}, in the level-2 equations. Hence, if there is a significant treatment effect at intercept, γ_{00}will be the intercept for the control condition whereas γ_{00} + γ_{01} will be the intercept for the treatment condition. Similarly, if there are significant treatment effects at both piece1 and piece2, the average piece1 coefficient will be γ_{10} for the control condition and γ_{10} + γ_{11} for the treatment condition; the same holds for the average piece2 coefficient, with γ_{20} for the control condition and γ_{20} + γ_{21} for the treatment condition.

By substituting Equation (6) back into Equation (5), we obtained the following overall

The corresponding random effect variances are as follows:

V(_{t(ij)}) = σ^{2} (within-student-level variance with the identity structure assumption)

V(_{0i}) = τ_{00} (between-student-level intercept variance)

V(_{1i}) = τ_{11} (between-student-level piece1 variance)

V(_{2i}) = τ_{22} (between-student-level piece2 variance)

V(_{0j}) = ψ^{2} (between-classroom-level variance).

The major difference between this CCREM model and Model 1 is with regard to the random effect part; specifically, the classroom effect _{0j} with the corresponding variance equal to ψ^{2}. Even though it seems like only a slight change in the combined equation (from _{0j} of the kindergarten random effects in Model 1 to _{0j} of all classroom random effects in Model 2), the actual implication and the parameter estimates of Model 2 can be very different from those of Model 1 due to the variance redistribution mechanism (Luo and Kwok,

The results are presented in Table _{j} ^{*} _{t(ij)} interaction effect is no longer significant in Model 2 (γ_{21} =.64 with the 95% CI covered zero) compared with Model 1. This nonsignificant interaction effect indicates that the rate of change or improvement in the EWPV was the same for both treatment and control groups after kindergarten.

In addition to the regression coefficient, some of the estimates of the random effect variances were quite different between the two models: Model 2 had a larger intercept variance (τ_{00} = 157.04 compared with Model 1 τ_{00} = 137.36), a smaller classroom variance (ψ^{2} = 64.49 compared with Model 1 θ^{2} = 97.39), and a smaller within-student variance (σ^{2} = 145.92 compared with Model 1 σ^{2} = 165.12). These differences in the variance estimates between the two models are likely the result of the variance redistribution mechanism (Luo and Kwok,

Regarding the limitation of this model, unlike Model 1 which only takes into account the kindergarten classroom effect, Model 2 is able to fully take the classroom effect into account. However, it does assume an acute classroom effect (i.e., it will not carry over in later grades). In other words, once a student changes grade (i.e., classroom), he/she will get a new classroom effect. The classroom effect at kindergarten is independent of the classroom effect at grade 1, for example. Also, all classroom effects regardless the grade (or time) have exactly the same variance given that they are treated as a whole or a single crossed factor, even though conceptually the classrooms at different grades/times may have different effects on the EWPV scores.

Ideally, we wanted to analyze this data set with four classroom crossed factors but, in reality, the specification for this model is not straightforward, especially when using the common MLM packages. Moreover, the model estimates only the variance for the classroom factors, not the other effects, such as the potential carryover effect from the previous classrooms on later EWPV scores.

Whereas the nesting relationship holds in cross-sectional data, in longitudinal settings the relationship between students and classrooms is not pure. To make things more complicated, students' scores at a given time point, say second grade, are not only influenced by the classroom effect at second grade, but also potentially by the classroom effects at both kindergarten and first grade. Furthermore, the effect of the classroom may diminish, such that the impact of first grade may have a stronger effect on the second-grade scores than at third grade. Such a model would include five crossed random effects (i.e., one at the student level and four at the classroom level, including kindergarten, first-, second-, and third-grade random effects) and would need to allow the classroom effects to vary across time. None of the default models from the standard statistical packages can fully capture the key feature of this model.

Similar to Model 1, Model 3 (also see Figure

Path diagram for the model accommodating carryover classroom effects with five levels. y_{1}, beginning of kindergarten; y_{2}, end of kindergarten; y_{3}, end of first grade; y_{4}, end of second grade; y_{5}, end of third grade. The rounded-corner boxes: Student, Student level; K, Kindergarten classroom level; Grade 1, Grade 1 classroom level; Grade 2, Grade 2 classroom level; Grade 3 classroom level.

The first limitation can be addressed by specifying a different residual covariance structure than the default one (see Kwok et al., ^{2}

The model specification in xxM requires a combination of multilevel and SEM conventions. Due to its complexity, we only discuss the portions that are relevant to our model. First, it requires the longitudinal data to be in the wide rather than the long format (Kwok et al., ^{(1, 1)} is a fixed pattern matrix for our piecewise growth model, ^{(1)} and variance-covariance matrix ^{(1, 1)}, and ^{(1)} and ^{(1, 1)} denote a student-level model. At the classroom-level there is one latent variable ^{(2)} = 0 and variance ^{(2)}, and with direct paths on ^{(1, 2)} = [1, 1, 1, 1, 1]^{T}.

Because of the complexity associated with using xxM, we skip the model equations here to focus more on the conceptual formulation instead. The R code for fitting the model is presented in Appendix

At the student level, we have a piecewise latent growth model for the five EWPV measurement occasions, which is equivalent to the piecewise growth model with random intercept and random coefficients for both piece1 and piece2, as opposed to lme4 (Bates et al., ^{2} over time (i.e., an identity (ID) structure) as follows:

In xxM, we can model many other kinds of structure, such as freely estimating the residual variances for different time points (i.e., the first-order unstructured [UN1] structure), as presented here in which the residual variances vary across time measures.

This seems to be a more realistic choice than the ID structure. The treatment condition that was assigned at kindergarten or as a class-K level variable predicts the intercept and the two piecewise growth factors (P1 and P2 in Figure _{01} (_{j}), γ_{11}(_{j} ^{*} _{t(ij)}) and γ_{21}(_{j} ^{*} _{t(ij)}) in the previous two models.

At the class-K level, we have a random intercept factor η^{(K)} that accounts for the variance at all five time points due to clustering at kindergarten. We let the effect of such clustering differ across time points, which is achieved by allowing the direct paths (or factor loadings) from η^{(K)} to be different on the five measurement occasions. It is reasonable to expect that the effect will diminish across time, which means that the factor loadings should be decreasing. At the class-G1 level, we again have a random intercept factor η^{(G1)} that accounts for the clustering at first grade. Because classroom effect at first grade cannot affect prior performance (i.e., at kindergarten), the factor loadings from ^{(G2)} and η^{(G3)}, as shown in Figure

Given that the interpretation of the coefficients of the average or mean model is exactly the same as in the previous two models, we will focus more on the differences between Model 3 and the other two models. First, as shown in Table _{01} = −7.06) and Piece2 × Treatment (γ_{21} = 1.42) became significant.

Figure

Mean trajectories of EWPV scores by the two treatment conditions. Value labels for the time axis: 0, beginning of kindergarten; 1, end of kindergarten; 2, end of first grade; 3, end of second grade; 4, end of third grade.

Another major difference between Model 3 and the previous two models is found in the variance part of the model: not only does Model 3 contain more random effects (i.e., four different classroom effects for the four different grades), but the sizes of the variance estimates (i.e., τ_{00}, τ_{11}, and τ_{22}) are quite different from Models 1 and 2. As shown in Table

A closer analysis of these classroom variances reveals that the kindergarten variance was the largest whereas the third-grade variance was the smallest. This trend and the substantial differences across grades may partly be the result of missing data—the missing data rate increased as time passed, and with fewer students at the later time points or grades, it is not surprising to see the diminished variance estimates. Other potential reasons may include the developmental process (i.e., students learn more when they grow older) and plausible treatment effect (e.g., students become more homogeneous/similar to each other when they respond to the treatment materials). Further investigation of this issue is needed.

For the same random effect variances (i.e., τ_{00}, τ_{11}, and τ_{22}), Model 3 had substantially larger estimates than the other two models. This again may be the result of the variance redistribution mechanism (Luo and Kwok, _{01} & γ_{21}) in Model 3 are likely the results of these different variance estimates, which can directly affect the tests of significance of these coefficients.

In addition to the fixed and random effect estimates commonly found in the traditional multilevel models and presented in Table

We found a similar pattern for the first-grade factor (i.e., larger direct path coefficient to the immediate post measure followed by smaller coefficient to later time measures), even though the direct path coefficients were not all significant, possibly as a result of the smaller sample sizes at this grade and the later grade levels. Similar non-significant direct effects were also found for the second-grade factor.

These significant and non-significant carryover effects at different grade levels had some important and practical implications. For example, the many significant carryover effects from kindergarten may reflect the importance of the timing (i.e., the start of the intervention) and the potential longitudinal effect of the intervention. In other words, we may not see the same treatment effect if the intervention starts at another grade level as opposed to the beginning of kindergarten. Moreover, the significant paths from kindergarten to later-grade EWPV scores may reveal the importance of the kindergarten classroom experience, which may relate to ELL students' reading performance in the later grades, and further examination of this will be needed.

We compared the three models by using information criteria; namely, the Akaike information criterion (AIC) and the Bayesian information criterion (BIC). Certain guidelines apply to interpreting the absolute difference of the information criteria (i.e., ΔIC) between two competing models. For example, Burnham and Anderson (

In this study, we first described the complexity of the educational data, especially in longitudinal settings, which can result in data with a non-strictly hierarchical but more complex multilevel structure. With the use of the ELLA data, we demonstrated the importance of capturing the complex data structure by examining three different models with different random effect specification.

As stated, researchers are generally interested in the overall average model (or the mean part of the model containing the regression coefficients), but they fail to pay close attention to the variance part of the model. Yet, the variance part also carries important information, such as the implication of the developmental process. We have discussed and shown the importance of carefully specifying the random part of the model, which could affect estimation of the random effect variances and further affect estimates of the standard errors of the regression coefficients and the corresponding significance tests of these coefficients. For example, we found that both Models 1 and 3 had significant treatment by pieces interaction effects whereas Model 2 only had significant treatment by piece1 interaction effect and only contained some but not all significant coefficients. This finding provides evidence that only partially addressing the complex data structure may result in lower statistical power and loss of some important findings such as the treatment by growth piece (i.e. piece2 covering changes from the end of first to end of third grade) interaction effect.

Another advantage of modeling the classroom effect by grade levels separately (i.e., Model 3) instead of as a whole (e.g., Model 2 using CCREM) is that it allows researchers to investigate interesting phenomena that cannot be captured by the mean part of the model. For example, the decreasing classroom or grade variances over time may reflect the important developmental process. For example, the high heterogeneity (or variation) among students at the beginning of kindergarten may be the result of the diverse backgrounds and experiences the students have before they entered formal schooling. Once they are exposed to the formal grade-school curriculum in addition to their natural cognitive development, the variation among the students may become smaller, which in turn, may lead to a reduction in grade-level variances over time.

This is a plausible explanation, but further systematic investigation on the change in the variances is needed to validate this interpretation. Again, researchers should not only focus on the mean part of the model (i.e., the significance of the regression coefficients), but also, they should examine different random effect structure, which may provide different perspectives and even lead to new research questions for the target phenomena.

Moreover, we have shown how to incorporate the carryover effect in the model via the xxM program. The pattern of the carryover effect has shed light on some important and practical design issues, such as the timing of the study and the potential longitudinal impact of the intervention. For example, the only significant carryover effects from the kindergarten factor to the later time measures may suggest the importance of starting this type of intervention at kindergarten (rather than at other/later grade level). In fact, such carryover impact was also supported by empirical evidence on Project ELLA students' subsequent learning as they matriculated to grade 5 (e.g., Tong et al.,

Despite the important results presented here, there are a few limitations to the study. First, even though xxM is a very powerful software for very complex multilevel data, its lack of model-fit indices (e.g., RMSEA and CFI) restricts researchers to evaluate their models only based on the deviance statistic and the information criteria. Similarly, an appropriate standardized effect size measure for this type of complex data structure has not yet been developed. Another major limitation is that we only used real data for the demonstration. Thus, the actual impact of various factors such as the magnitude of the data dependency (or intra-class correlation) and the missing data rate over time can only be further examined by thoughtfully planned simulation studies. Moreover, the carry-over effects found in Model 3 (also see Figure

When analyzing complex longitudinal data, especially those from different educational settings, researchers generally focus only on the mean part (i.e., the regression coefficients) while ignoring the equally important random part (i.e., the random effect variances) of the model. Throughout this paper, we have addressed the importance of adequately taking the complex data structure into account by carefully specifying the random part of the model—not only can it affect the variance estimates, the standard errors, and the tests of significance of the regression coefficients, it can also offer additional information such as the potential developmental process and the carryover effect. We used xxM, which allowed us to estimate different grade level variances (i.e., from kindergarten to third grade, separately) and the potential carryover effect from each grade factor to the later time measures of the EWPV scores. In closing, we encourage researchers to look beyond the mean part of the model (i.e., the regression coefficients) and explore the variance part of the model that may lead them to different perspectives or even new information of the phenomena they are studying.

O-MK and ML are the lead authors who wrote most of the manuscript and conducted all the analyses. Other coauthors contributed on providing the data and related information FT, RL-A, and BI) and offering constructive feedback to the manuscript FT, RL-A, BI, MY, and Y-CY).

The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

The Supplementary Material for this article can be found online at:

^{1}The reason of including only one random effect (i.e., _{0j}) at the classroom level in equation (3) is to have a simpler model (in terms of the number of random effects) to avoid the potential convergence issue due to the large number of variance and covariance estimates of the random effects. Additionally, according to our experience, the variance estimates of the higher level non-intercept random effects are generally very small and non-significant and trying to estimate these tiny (and possibly non-significant) random effect variances will likely lead to non-converged result.

^{2}To our best knowledge, across all the commercial SEM related software, only M