^{1}

^{*}

^{2}

^{3}

^{1}

^{2}

^{3}

Edited by: Christoph Koenig, Goethe University Frankfurt, Germany

Reviewed by: Esther Ulitzsch, University of Kiel, Germany; Alexander Naumann, Leibniz Institute for Research and Information in Education (DIPF), Germany

This article was submitted to Quantitative Psychology and Measurement, a section of the journal Frontiers in Psychology

This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.

Bayesian approaches for estimating multilevel latent variable models can be beneficial in small samples. Prior distributions can be used to overcome small sample problems, for example, when priors that increase the accuracy of estimation are chosen. This article discusses two different but not mutually exclusive approaches for specifying priors. Both approaches aim at stabilizing estimators in such a way that the Mean Squared Error (MSE) of the estimator of the between-group slope will be small. In the first approach, the MSE is decreased by specifying a slightly informative prior for the group-level variance of the predictor variable, whereas in the second approach, the decrease is achieved directly by using a slightly informative prior for the slope. Mathematical and graphical inspections suggest that both approaches can be effective for reducing the MSE in small samples, thus rendering them attractive in these situations. The article also discusses how these approaches can be implemented in M

As van de Schoot et al. (

The choice of prior has received a lot of attention in the methodological literature (e.g., Natarajan and Kass, ^{2} + variability. One such approach was proposed by Zitzmann et al. (

In the present article, we mathematically work out the idea behind the direct approach for a simple multilevel latent variable model, and we contrast this approach with the indirect approach and with ML. Then, we graphically show the benefits that both approaches have over ML when the sample size is small. Finally, we discuss how these approaches can be implemented in M

Before we go into detail, we present an example model that we will use later to illustrate the different strategies. The model was suggested by Lüdtke et al. (

More specifically, the individual-level predictor _{b}, which is the latent group mean, and a within-group part _{w}, which is the individual deviation from _{b}. For a person

_{b,j} is distributed around μ_{X} with variance _{w,ij} has variance

Applying Raudenbush and Bryk's (

where β_{w} is the (fixed) within-group slope that describes the relationship between the predictor and the dependent variable at the individual level, and the ε_{ij} are normally distributed residuals. The residual variance is _{0j} is regressed on _{b}:

where α is the overall intercept, and β_{b} is the between-group slope (i.e., the relationship between _{j} are normally distributed residuals with variance

Here, we focus on the between-group slope (β_{b}), which is of great interest in many applications of multilevel models (e.g., in the analysis of contextual effects). When the data are balanced (i.e., equal numbers of persons per group), the ML estimator of β_{b} is given by:

where

Some statistical properties of the ML estimator in Equation 4 need to be discussed first to be able to compare this estimator with the Bayesian estimators later on. First, by using the first-order Taylor expansion (e.g., Casella and Berger,

where ^{1}

where

this measure will be small as well. Taken together, the more information the data provide, the more the overall accuracy of the estimator improves.

Whereas the asymptotic properties are favorable, the ML estimator tends to be biased in small samples, and it has high variability and thus a large MSE in these situations (e.g., McNeish,

We refer to the first strategy as the _{b}). To illustrate, we assume a normal prior, which can be formalized as:

which should be read as “β_{b} is normally distributed with mean _{0} and

As we will show, β_{0} and ν_{0} can be meaningfully interpreted.

One way of expressing the likelihood for the slope is:

where

which is also a normal distribution. The mean of this distribution defines the Bayesian Expected A Posteriori (EAP) estimator, which is the standard choice for a point estimator in Bayesian estimation (Note that the Bayes module in M

As can be seen from the equation, the estimator is simply the weighted average of the mean of the prior (β_{0}) and _{0} as the _{0} as the _{0}, the smaller _{0}. Less technically speaking, when we are more confident in β_{0}, the prior will gain more weight, and the posterior will shift to the mean of the prior. However, when we choose ν_{0} to be very small, _{b}. Rather, it could be set to a value that is much smaller than what previous studies have suggested and also much smaller than the parameter in the population. However, such an “incorrect” prior guess might still be beneficial, particularly when the sample size is small.

To be able to compare the properties of the Bayesian estimator with the ML estimator and with the Bayesian estimator from the second strategy of specifying the prior, we again use the Taylor expansion, and we ignore terms involving higher order factors. A rough approximation of the bias of

Similar to the ML estimator, _{0} is set to a value close to 0, the bias will become similar to the bias of

The variability of

With a very large _{0}, ν_{0}) when the sample size is small, and how must they be chosen such that the MSE will be smaller than the MSE of ML? Before we compare the different choices for (β_{0}, ν_{0}), we present another strategy for specifying the prior. Alternatively to specifying the prior directly for the between-group slope, one can also specify a prior for the group-level variance of the predictor, thereby also modifying the estimator of the slope. We call this strategy the

The principle that underlies the indirect strategy was discovered in the early years of Structural Equation Modeling (SEM), where models were fit on the basis of the variances and covariances of variables. One observation was that when the sample size was small, covariance matrices tended to be on the border of positive definiteness (e.g., a variance estimate close to 0, correlations close to −1 or 1; e.g., van Driel,

Rather than beginning with the assumption of a normal prior for the between-group slope, we begin with a gamma prior for the inverse of the group-level variance of the predictor variable (

where ^{2}

where, as we will show, _{0} have interpretations similar to those of the parameters of the (reparameterized) normal prior.

The likelihood for the inverse of the group-level variance can be written as:

where

As Zitzmann et al. (

where _{0} can be thought of as the prior guess and the prior sample size, respectively (see Hoff,

Adding a prior for _{b}). Replacing the denominator in Equation (4) (

This new estimator is indicated by a tilde (~) in order to better differentiate it from the ML estimator and from the Bayesian estimator that results from the direct strategy of specifying the prior (Equation 12).

To derive some properties of

where ^{3}_{0} become very small.

The variability of

Similar to the previous equation, we can easily infer that when _{0} must be chosen such that the MSE will be reduced in comparison with ML in small samples.

In this section, we investigate the MSE of the different strategies for specifying priors in small samples for different choices of the prior parameters, using the example model from above to simulate data that are typical in psychology. Because it is difficult to infer from the equations how the MSEs compare with each other, they were plotted against the sample size to allow for graphical comparisons.

In accordance with Lüdtke et al. (_{b}) was 0.7 in the population. Moreover, we set the number of persons per group (

_{b} (i.e., the prior guess, β_{0}, equals the parameter in the population). Because β_{b} was 0.7 in the population, a correct prior for β_{b} was specified by setting β_{0} equal to this value. The third estimator (blue dotted line) also resulted from the direct strategy. However, β_{0} was set to 0 (and thus well below 0.7) in order to shrink estimates that were too large toward zero. The fourth estimator (red dashed line) resulted from the indirect strategy with a correct prior for the group-level variance of the predictor (^{4}_{0} was set to 1, which was above the parameter in the population. Thus, estimates of the variance were pulled away from zero, and, therefore, the estimates of the slope were shrunken. The three different panels of _{0}: 0.1 (upper left), 1.0 (upper right), and 5.0 (lower left). The first two values can be considered choices that are only slightly informative, whereas the latter is more informative and was used here to illustrate what happens to the RMSE when the priors become more informative.

The analytically derived Root Mean Squared Error (RMSE) in estimating the between-group slope for the direct and the indirect approach as a function of the sample size at the group level (_{0}, prior sample size.

As can be seen in the _{0} = 0) than with a correct prior (i.e., β_{0} = 0.7). Moreover, the choice of a larger ν_{0} was associated with a smaller RMSE. However, the smallest RMSEs emerged when the indirect strategy was used with an incorrect prior (i.e., _{0} = 1, the RMSE was reduced relative to a ν_{0} of 0.1. However, setting ν_{0} to 5 did not yield an RMSE that was even smaller. Rather, the RMSE was slightly larger than with a ν_{0} of 1 because the bias induced by the prior outweighed the variability in the computation of the RMSE. Additional results are presented in the

To sum up, both strategies for specifying the prior offer attractive ways to obtain more accurate estimators of the between-group slope in small samples when used with slightly informative priors. Especially when no previous knowledge exists about the parameters, the choice of a relatively small prior guess for the between-group slope or a relatively large prior guess for the group-level variance of the predictor could be useful when these choices are combined with a small ν_{0} in the low one-digit range. Although somewhat biased, the resulting Bayesian estimators of the slope were found to be more accurate than ML when the sample size was small.

It has been argued that Bayesian approaches can be beneficial when the sample size is small because prior distributions can be used to increase estimation accuracy. In the present article, we focused on the between-group slope because this parameter is often of interest in multilevel latent variable modeling. Two approaches for specifying priors can be distinguished, both of which are aimed at reducing the MSE of the estimator of the between-group slope: In the first approach, a slightly informative prior is specified directly for the slope, whereas in the indirect approach, the MSE is reduced by using a slightly informative prior for the group-level variance of the predictor variable. In the present article, we worked out the former approach mathematically and compared it with the indirect approach and with ML. Graphical inspections suggested that both approaches can be very effective in reducing the MSE compared with ML in small samples, rendering them attractive for researchers. We would like to add that these approaches are not mutually exclusive and that researchers can also apply them simultaneously by specifying slightly informative priors for the slope as well as for the group-level variance of the predictor variable. To provide initial information about how such a simultaneous application of the two approaches performs, we conducted an additional simulation study with 20 to 60 groups, 5 persons per group, and an ICC of the predictor variable of 0.1.

The simulated Root Mean Squared Error (RMSE) in estimating the between-group slope for the combined approach as a function of the sample size at the group level (_{0}, prior sample size.

Although our findings were generally favorable and could be considered a successful “proof of concept,” a word of caution is nevertheless needed. Our demonstrations were very limited. For example, the specific conditions we studied do not completely reflect real data. Future research should consider a wider range of conditions for more conclusive findings. Moreover, the example model we used was overly simple. Realistic models typically involve more than one predictor and also multiple indicators per construct. However, one can derive the Bayesian estimators analogously in this more general multivariate case. Zitzmann (

Before we come to M

M_{0}) and the prior sample size (ν_{0}) equaling 0 and 1, respectively, users must compute _{0} and

Our findings suggest that this prior increases the accuracy of estimation in small samples. Choosing an even smaller value for _{0} to 1 results in

For a standardized predictor, this prior is quite effective when the sample size is small. Specifying somewhat larger values (e.g., by setting ν_{0} = 2) might increase estimation accuracy even further (Depaoli and Clifton,

To conclude, we worked out and discussed Bayesian approaches that perform better than ML in small samples, and we offered some practical guidance on how to implement these approaches with M

The original contributions presented in the study are included in the article/supplementary material, further inquiries can be directed to the corresponding author/s.

SZ: writing, mathematical derivations, and graphic design. CH: writing. MH: writing and lead. All authors contributed to the article and approved the submitted version.

The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

We acknowledge support by Open Access Publishing Fund of University of Tübingen.

The analytically derived Root Mean Squared Error (RMSE) in estimating the between-group slope for the direct and the indirect approach as a function of the sample size at the group level (_{0}, prior sample size.

The analytically derived Root Mean Squared Error (RMSE) in estimating the between-group slope for the direct and the indirect approach as a function of the sample size at the group level (_{0}, prior sample size.

^{1}The ICC quantifies the amount of the total variance that can be attributed to differences between the groups (e.g., Snijders and Bosker,

^{2}The inverse of a variance is sometimes also referred to as the precision in the statistical literature (e.g., Hoff,

^{3}We would like to state that we recognized a typo in the bias formula of Zitzmann et al.'s (

^{4}Because of the standardization,