^{*}

Edited by: Prathiba Natesan, University of North Texas, United States

Reviewed by: Lietta Marie Scott, Arizona Department of Education, United States; Ehri Ryu, Boston College, United States; Joshua N. Pritikin, Virginia Commonwealth University, United States

This article was submitted to Quantitative Psychology and Measurement, a section of the journal Frontiers in Psychology

This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.

In the social and behavioral sciences, it is recommended that effect sizes and their sampling variances be reported. Formulas for common effect sizes such as standardized and raw mean differences, correlation coefficients, and odds ratios are well known and have been well studied. However, the statistical properties of multivariate effect sizes have received less attention in the literature. This study shows how structural equation modeling (SEM) can be used to compute multivariate effect sizes and their sampling covariance matrices. We focus on the standardized mean difference (multiple-treatment and multiple-endpoint studies) with or without the assumption of the homogeneity of variances (or covariance matrices) in this study. Empirical examples were used to illustrate the procedures in R. Two computer simulation studies were used to evaluate the empirical performance of the SEM approach. The findings suggest that in multiple-treatment and multiple-endpoint studies, when the assumption of the homogeneity of variances (or covariance matrices) is questionable, it is preferable not to impose this assumption when estimating the effect sizes. Implications and further directions are discussed.

In the social and behavioral sciences, it is recommended that effect sizes and their sampling variances be reported (e.g., Cohen,

There are two key ingredients for a meta-analysis. The first one is the effect size that quantifies the strength of the effect in the studies. Effect sizes can be either unstandardized or standardized (e.g., Kelley and Preacher,

Besides the effect sizes, we also need the standard error (

In applied research, however, more than one effect size may be involved. For example, there may be more than one treatment group compared to a control group. The use of multiple treatment groups allows researchers to address the phenomenon under different levels of manipulation. By using the same control group in the comparisons, researchers minimize the cost of collecting multiple control groups (Kim and Becker,

Since the effect sizes are not independent, researchers have to calculate the sampling covariances among the effect sizes. Gleser and Olkin (

Although Gleser and Olkin (

Structural equation modeling (SEM) is a favorite tool to use in analyzing multivariate data. It has been used to calculate

The SEM approach provides a graphical model of means, standard deviations, and correlations. The effect sizes are defined as functions of these parameters. Readers can get a better understanding of what these effect sizes mean. Second, assumptions of the homogeneity of variances, covariances, or correlations can be imposed or relaxed by the use of equality constraints on the parameters. By using the delta method built into the SEM packages, appropriate sampling covariance matrices can be automatically derived. Third, it is feasible to extend the SEM approach to more complicated situations. For example, the SEM approach can be used to calculate the effect sizes and their sampling covariance matrix for a combination of multiple-treatment and multiple-endpoint studies^{1}

The rest of this article is structured as follows. The next section contains a brief introduction on how to compute the effect sizes and their sampling covariance matrices for the multiple-treatment and multiple-endpoint designs in SEM. Two empirical examples are used to illustrate how to conduct the analyses using the metaSEM package (Cheung,

Cheung (

Finally, the effect sizes are defined as functions of the means and standard deviations (

Suppose that we measure the mathematics score in a control group and two treatment groups (_{(C)},_{(T1)}, and _{(T2)}). Figure _{(C)}, treatment 1 μ_{(T1)}, and treatment 2 μ_{(T2)}, respectively. The variances of the variables in the control and treatments 1 and 2 are represented by

The structural equation model for the multiple-treatment studies.

When no constraint is imposed, the above means and variances are the same as those of the sample statistics. Under the assumption of the homogeneity of variances, we may impose the constraint _{Common} as the denominator:

One unit of SMD indicates that the mean of the treatment group is one common _{MTS1} and _{MTS2} share the same parameters μ_{(C)} and σ_{Common}, they are correlated. Instead of using the analytic solutions provided by Gleser and Olkin (

When the assumption of the homogeneity of variances is questionable, it may not be appropriate to use σ_{Common} in the denominator. This is because σ_{Common} is not estimating any of the population _{(C)} as the standardizer in calculating the effect sizes (Glass et al.,

which does not rely on the assumption of the homogeneity of variances. Now, one unit of SMD indicates that the mean of the treatment group is one

Now suppose that there are two effect sizes on the mathematics and language scores _{1} and _{2}. Figure _{1} and η_{2}, with their variances fixed at one, to represent the standardized scores of _{1} and _{2}. σ_{1} and σ_{2} now represent the _{1} and _{2}. The same model representation is often used to standardize the variables in SEM (e.g., Cheung and Chan,

The structural equation model for the multiple-endpoint studies.

We may assume that the correlations are homogeneous by imposing the constraint _{0}:ρ_{Common} = ρ_{(C)} = ρ_{(T)}. An _{0}:ρ_{Common} = ρ_{(C)} = ρ_{(T)}, _{0}:σ_{1Common} = σ_{1(C)} = σ_{1(T)}, and _{0}:σ_{2Common} = σ_{2(C)} = σ_{2(T)}. Under the null hypothesis, the test statistic on comparing the models with and without the constraints follows a chi-square distribution with 3

Regardless of whether we have imposed the above constraints, the effect sizes for the multiple-endpoint study are defined as:

where σ_{1} and σ_{2} are the standard deviations for _{1} and _{2}. We do not put the subscript in the formulas because what σ_{1} and σ_{2} actually are depends on whether constraints have been imposed on them. If we impose the equality constraints on the _{1Common} and σ_{2Common} are used as the standardizers in Equation (3). If we do not assume that the covariance matrices are homogeneous, the _{1(C)} and σ_{2(C)}) are used as the standardizers. Once we have defined the appropriate effect sizes, the sampling covariance matrix between _{MES1} and _{MES2} can be obtained from the SEM packages with numerical methods.

Gleser and Olkin (

Table 22.2 in Gleser and Olkin (_{MTS} of the three treatment groups compared to the control group are −1.17, −1.90, and −2.00, respectively. The sampling covariance matrix is _{MTS} are −0.79, −1.29, and −1.36, respectively. The sampling covariance matrix is

Table 22.4 in Gleser and Olkin (_{MES} on Math and Verbal are 1.19 and 0.61 with _{MES} on Math and Verbal are 1.30 and 0.56 with

The above illustrations show that the effect sizes with and without the assumption of homogeneity may be very different depending on whether the homogeneity assumption holds. It remains unclear how these effect size estimates would work empirically in simulated data. The following computer simulation clarifies the empirical performance of these estimators.

Two computer simulation studies were conducted to evaluate the empirical performance of the SEM approach. All of the simulations were performed with the metaSEM package (Cheung,

Before moving on to details of the simulation studies, it is essential to clarify the meanings of “with and without the homogeneity of variances (or covariance matrices)” in the simulation studies. The data are generated from either equal or unequal population variances (see the conditions of the Population Variances). Regardless of whether or not the population variances are equal, two sets of effect sizes are calculated from the same set of data—one assumes the homogeneity of variances, and the other does not.

When the data are generated from populations with equal variances, the effect sizes both with and without the homogeneity assumption should be correct. By assuming that the variances are homogeneous, which is correct in the generated data, the sampling variances of the effect sizes with the homogeneity assumption are usually smaller than those effect sizes without the homogeneity assumption. When the data are generated from unequal population variances, the effect sizes without the homogeneity assumption should still be correct. However, the effect sizes with the homogeneity assumption are likely to be biased because the model is misspecified. The present simulation studies evaluated the empirical performance of the computed effect sizes with and without the homogeneity assumption.

For the multiple-treatment studies, multivariate normal data were generated from the known data structures with or without the assumption of the homogeneity of variances.

In this simulation study, there was a control group with two treatment groups. Several factors were manipulated in the simulation study:

The population mean of the control group was fixed at 0 for reference. Six levels were used for the simulation study. The population means for the two treatment groups were (0.2, 0.2), (0.2, 0.5), (0.2, 0.8), (0.5, 0.5), (0.5, 0.8), and (0.8, 0.8).

The population variance of the control group was fixed at 1 for reference. Three levels were selected for the simulation. The population variances for the two treatment groups were (1, 1), (0.75, 1.25), and (0.5, 1.5). When the population variance was (1, 1) in the two treatment groups, the homogeneity of variances was assumed. In the other levels, the population variances were heterogeneous. As the population variance of the control group was fixed at 1, the population effect size was calculated by the difference in means between the treatment groups and the control group divided by 1. Thus, the effect sizes were 0.2, 0.5, and 0.8, which represent the typical values observed in the social and behavioral sciences.

The design was assumed to be balanced. Three levels of sample sizes were selected, namely, 30, 50, and 100. These levels should be representative of typical research settings.

Thus, there were a total of 6 × 3 × 3 = 54 conditions. One thousand replications were repeated for each condition.

Since the population mean and variance of the control were set at 0 and 1, respectively, the population effect sizes were defined as the mean differences between the treatment 1 (or 2) to the control group. The relative percentage bias of each effect size was computed as

where θ is the population effect size and

When there is only one effect size, we may quantify the accuracy of its uncertainty by the use of the relative bias of the ^{2}) and covariance as the measure of uncertainty,

where ^{2} or sampling covariance across 1,000 replications. Since there were three biases for the two effect sizes and their covariance, we reported the average of their absolute biases.

where ^{2}, not ^{2} – 1)≈20% as an indicator of good performance in estimating the sampling covariance matrix.

In the review process, one reviewer suggested displaying the individual parameter estimates

The results were summarized in the heat maps, which provide an easy way to visualize the performance of the statistics. The x- and y-axes represent the population means and population variances separated by the sample sizes. A lighter color indicates a smaller bias than values with a darker color. When the bias is larger than the cut-off point (5% for the mean and 20% for the sampling variances or covariances), the color becomes gray.

Figures

Relative bias of the average of the parameter estimates for the multiple treatment studies with the assumption of homogeneity of variances.

Relative bias of the average of parameter estimates for the multiple treatment studies without the assumption of homogeneity of variances.

Figure

Relative bias of the average of the sampling variances and covariance for the multiple treatment studies with the assumption of homogeneity of variances.

Relative bias of the average of the sampling variances and covariance for the multiple treatment studies without the assumption of homogeneity of variances.

As a whole, the findings indicate that the effect sizes for the multiple treatment studies are estimated to be unbiased regardless of whether or not the homogeneity of variances is assumed in the calculations, given that the average of the treatment group variances are similar to that of the control group variance. However, the sampling variances and covariances are likely biased when the population variances are heterogeneous.

The patterns for the individual parameters in Supplementary Material

The design was similar to those in the multiple-treatment studies. Two effect sizes were used in the simulation study, with one control group and one intervention group.

The population means and variances of the control group were fixed at 0 and 1, respectively, for reference. The population correlation between these two outcomes was set at 0.3, which is considered moderate in psychological research.

Six levels were used in the simulation study. The means for the two outcome variables in the intervention group were (0.2, 0.2), (0.2, 0.5), (0.2, 0.8), (0.5, 0.5), (0.5, 0.8), and (0.8, 0.8).

Five levels for the intervention group were selected for the simulation. They were (1, 1), (0.5, 0.5), (0.75, 0.75), (1.25, 1.25), and (1.5, 1.5). When the population variance of the intervention group is (1, 1), the homogeneity of covariance matrices between studies is assumed; the assumption of the homogeneity of variances does not hold in the population.

The design was assumed to be balanced. Three levels of sample sizes were selected, namely, 30, 50, and 100.

Therefore, there were a total of 6 × 5 × 3 = 90 conditions. One thousand replications were repeated for each condition.

The assessment was the same as those used in multiple-treatment studies. The average of the relative percentage bias of the effect size was used to evaluate the bias of the effect size. The average of the relative percentage bias of the sampling variances and covariances was used to assess the bias of the sampling covariance matrices. In the heat maps, 5 and 20% were used as the cutoff points.

Similar to the simulation studies in the multiple-treatment studies, we followed the advice of one reviewer by displaying the results of the individual effect sizes. The results are shown in Supplementary Materials

Figure

Relative bias of the average of the parameter estimates for the multiple-endpoint studies with the assumption of homogeneity of covariance matrices.

Relative bias of the average of parameter estimates for the multiple-endpoint studies without the assumption of homogeneity of covariance matrices.

Figure

Relative bias of the average of the sampling variances and covariance for the multiple-endpoint studies with the assumption of homogeneity of covariance matrices.

Relative bias of the average of the sampling variances and covariance for the multiple-endpoint studies without the assumption of homogeneity of covariance matrices.

The patterns of the individual parameters displayed in Supplementary Materials

To summarize, the estimated effect sizes are quite sensitive to the assumption of the homogeneity of covariance matrices. If the data are not homogeneous in covariance matrices and we incorrectly assume that they are, the estimated effect sizes are likely to be biased. On the other hand, the sampling covariance matrices are generally similar regardless of whether or not we have imposed the assumption of the homogeneity of covariance matrices.

This study shows that multivariate effect sizes for multiple-treatment and multiple-endpoint studies can easily be obtained using the SEM approach. Researchers may impose equality constraints on the variances and covariances, and the SEM packages will report the effect sizes and their sampling covariance matrices.

For multiple-treatment studies, the estimated effect sizes are unbiased regardless of whether or not we assume that the variances are homogeneous when calculating the effect sizes when the common

For multiple-endpoint studies, the estimated effect sizes are biased when the covariance matrices are different, but we mistakenly assume that the covariance matrices are homogeneous. On the other hand, the sample covariance matrices are similar regardless of whether or not we have imposed the assumption of the homogeneity of covariance matrices when estimating the effect sizes.

The findings indicate that researchers should always check the assumptions before calculating the effect sizes. Researchers may also check the robustness of the findings by dropping these assumptions. By comparing the results with and without the assumption of the homogeneity of variances or covariance matrices, researchers may have a better idea of whether their substantive findings depend on these assumptions. Based on the simulation studies, it can be seen that the results are similar for the approaches with and without the assumption of the homogeneity of variances (or covariance matrices) when the data actually have the same variances (or covariance matrices). Therefore, the loss of efficiency from dropping the assumption of the homogeneity of variances (or covariance matrices) is small.

It should be noted that only a few factors were studied in the simulation studies. Further simulation studies may address the question of whether the findings are consistent in other conditions such as in those of unbalanced data and data with non-normal distributions. Another possible direction of research is to study how the assumption of the homogeneity of variances or covariance matrices impacts the actual parameter estimates in a meta-analysis. Such a study may provide stronger evidence to guide researchers on the issue of whether or not to report effect sizes with the assumption of homogeneity.

To conclude, it seems reasonable not to assume the homogeneity of variances (or covariance matrices) when calculating effect sizes for multiple-treatment and multiple-endpoint studies. The SEM approach provides a convenient device to calculate these effect sizes.

The author confirms being the sole contributor of this work and approved it for publication.

The author declares that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest. The reviewer, JP, declared a past co-authorship with the author, MC, to the handling Editor.

The Supplementary Material for this article can be found online at:

^{1}