^{1}

^{*}

^{2}

^{3}

^{1}

^{4}

^{1}

^{2}

^{3}

^{4}

Edited by: Alessandro Giuliani, Istituto Superiore di Sanità (ISS), Italy

Reviewed by: Francesco Bartolucci, University of Perugia, Italy; Holmes Finch, Ball State University, United States

This article was submitted to Quantitative Psychology and Measurement, a section of the journal Frontiers in Psychology

This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.

Many clinical and psychological constructs are conceptualized to have multivariate higher-order constructs that give rise to multidimensional lower-order traits. Although recent measurement models and computing algorithms can accommodate item response data with a higher-order structure, there are few measurement models and computing techniques that can be employed in the context of complex research synthesis, such as meta-analysis of individual participant data or integrative data analysis. The current study was aimed at modeling complex item responses that can arise when underlying domain-specific, lower-order traits are hierarchically related to multiple higher-order traits for individual participant data from multiple studies. We formulated a multi-group, multivariate higher-order item response theory (HO-IRT) model from a Bayesian perspective and developed a new Markov chain Monte Carlo (MCMC) algorithm to simultaneously estimate the (a) structural parameters of the first- and second-order latent traits across multiple groups and (b) item parameters of the model. Results from a simulation study support the feasibility of the MCMC algorithm. From the analysis of real data, we found that a bivariate HO-IRT model with different correlation/covariance structures for different studies fit the data best, compared to a univariate HO-IRT model or other alternate models with unreasonable assumptions (i.e., the same means and covariances across studies). Although more work is needed to further develop the method and to disseminate it, the multi-group multivariate HO-IRT model holds promise to derive a common metric for individual participant data from multiple studies in research synthesis studies for robust inference and for new discoveries.

Item response theory (IRT; Hambleton and Swaminathan,

Prospectively, it is possible to link different items from multiple surveys and questionnaires by testing them using the single group design or the anchor test design (Streiner et al.,

Traditionally, IRT involves a unidimensional underlying trait, denoted as θ. The two most basic and common unidimensional IRT models are the Rasch model and the two-parameter logistic (2PL) model. There are two major assumptions involved in IRT—unidimensionality and local independence. The unidimensionality assumption of IRT requires that a single latent dimension θ substantially accounts for the way participants respond to items. The local independence assumption describes that items should be conditionally independent given θ. The IRT models for binary item response can be extended to accommodate polytomous item response in a number of related models (Bacci et al.,

Further, as an extension of the unidimensional IRT model, multidimensional IRT (MIRT) models can accommodate joint modeling of multiple dimensions, typically expressed as

More broadly, there are a number of new IRT models and factor analysis (FA) models and software programs that can account for multidimensional data with a higher-order structure (e.g., Sheng and Wikle,

Despite the advances noted above, it is challenging to apply existing IRT or FA models and associated computing algorithms to item-level IPD obtained from independently conducted studies (see Huo et al.,

We focus on psychological constructs with a multivariate higher-order structure, which give rise to multidimensional lower-order traits in the present study. A multivariate higher-order item response theory (HO-IRT) model is developed to estimate trait scores of participants from multiple studies and tested using a Markov chain Monte Carlo (MCMC) estimation approach. We use the term “multidimensional” to indicate that multiple, possibly related traits give rise to observed item response data. A “hierarchical structure” or “higher-order structure” is said to exist when multiple lower-order traits can be expressed as function of an overall, higher-order trait. When there are two or more higher-order traits, a “multivariate” higher-order structure exists.

It has been noted that a lack of new methods and application examples to address the challenges of analyzing data from multiple studies has hindered the broader adoption of IDA by applied researchers despite its promise (Curran et al.,

In sum, the present study is aimed at addressing the aforementioned gaps in available methods by developing a multivariate HO-IRT model and associated computing algorithm. Our rationale for the development of the multivariate HO-IRT model is two-fold. First, many clinical and psychological constructs have been conceptualized to have multivariate higher-order constructs that give rise to multidimensional lower-order traits; yet most of the available measurement models for the purpose of analyzing existing data from multiple studies are unidimensional and non-hierarchical. Second, the multivariate HO-IRT model may be appealing for certain research applications because it estimates multiple trait scores at the higher-order, as well as lower-order, levels, thereby achieving data reduction. Having more options in our methodological tool box can empower researchers in the field of psychology in their pursuit of fully maximizing available data for new discoveries. In the following sections, we describe the multivariate HO-IRT model, MCMC estimation, results from a simulation study, and application results from a motivating data example.

In univariate HO-IRT models (de la Torre and Song, _{(d)} is the domain-specific, first-order latent trait for the _{i(d)} = λ_{(d)}ω_{i}+ε_{i(d)}, where the subscript [_{(d)} is the regression coefficient, and ε_{i(d)} is the residual term conditioned on ω_{i}. However,

The higher-order structure of the bivariate HO-IRT model for multiple groups. The second-order latent trait for group _{(g)}, is bivariate with the correlation matrix _{ω(g)}, whereas the first-order latent trait, _{(g)}, is _{(g)}_{(1)}, are related to the first _{(g)}_{(2)}, to the remaining (_{(g)} are assumed to be independent conditional on the second-order traits _{(g)}.

In this paper, we use the hierarchical, multi-unidimensional two-parameter logistic item response theory (2PL-MUIRT) model as the item response function (Huo et al.,

where _{i(g)j(d)} is the response of respondent _{i(g)}_{(d)} is the _{ig} = {θ_{i(g)}_{(d)}}; α_{j(d)} and β_{j(d)} are the discrimination and difficulty parameters, respectively, of the _{g} (

To take multiple groups into account, the following function can be used to connect _{i(g)}_{(d)} = λ_{(g)}_{(d)}ω_{i(g)}_{(h)}+ε_{i(g)}_{(d)}. The subscript

The parameters of the bivariate HO-IRT model for multiple groups. Lower-order trait scores _{(g)}_{(d)} can be seen as a direct function of higher-order latent traits _{(g)}_{(h)} and characterized by regression coefficients _{(g)}_{(d)} that relate _{(g)}_{(d)} to _{(g)}_{(h)} as well as by the mean vector and covariance matrices _{(g)}_{(d)} and _{j(d)} and _{j(d)} are also displayed.

We use the hierarchical Bayesian formulation with the following prior distributions:

where _{H} was set to _{H}

where 4_{ω(g)} needs to be transformed from _{ω(g)}, which we describe later in this section. Here,

The priors were chosen based on previous studies (e.g., de la Torre and Patz,

The joint posterior distribution of interest is as follows:

which cannot be fully simplified into an explicit distribution from which samples can be drawn directly. Therefore, we decompose the joint posterior distribution into several full conditional distributions for samples to be drawn more easily by using either the direct sampling approach (i.e., Gibbs sampling; Casella and George, _{(g)} is

The full conditional distribution of _{(g)} is

The full conditional distribution of _{(g)} is _{1(g)}, _{1(g)}), where the parameters can explicitly be expressed as

Samples can be drawn directly from this distribution using Gibbs sampling. The full conditional distribution of

and similarly the full conditional distribution of _{(g)} is

As previously noted, _{ω(g)} needs to be drawn from _{ω(g)}, which has a full conditional distribution of

and can be directly sampled using Gibbs sampling. To draw _{ω(g)}, first, samples are drawn from the conditional distribution of _{ω(g)}, as in the Inverse Wishart distribution (see Gelman et al., _{ω(g)} is then transformed into the corresponding provisional _{ω(g)}, which is evaluated by the M-H acceptance criteria. This method was originally developed by Liu (

For the full conditional distributions for _{(g)}, the M-H algorithm can be used to indirectly draw samples from the corresponding distributions. The multivariate HO-IRT model with multiple groups formulated above is a full model containing the multivariate higher-order latent traits for multiple groups. Details of the MCMC algorithms are shown in the

We note two unique challenges to address when estimating a multivariate HO-IRT model for multiple groups. The first challenge involves addressing estimation indeterminacy. We set constraints on the latent distributions by selecting an anchor group, setting its mean to be

The second challenge is in constructing the correlation/covariance structures of _{(d)}|_{(d)}. Given that the covariance matrix of the anchor group is constrained, the covariance matrix θ_{(d)} for the anchor group is equivalent to the correlation matrix because _{d} and _{(d)}σ_{(d′)}λ_{(d)}λ_{(d′)}. If two θs do not share the same ω, the covariances can be estimated as σ_{(d)}σ_{(d′)}λ_{(d)}λ_{(d′)}ρ_{ω}.

A simulation study was conducted to evaluate the feasibility of the MCMC algorithms for the bivariate HO-IRT model with multiple groups. In this simulation study we examined the bivariate HO-IRT model with multiple groups in the saturated form, that is, different means and covariance structures for bivariate second-order latent traits across groups. In other words, the feasibility of the most comprehensive MCMC algorithm for a saturated (full) model was examined because any reduced models are special cases of the saturated model with constraints. More specifically, for the simulation design, three groups with each having 1,000 participants were specified. The second-order latent trait, _{(g)}, for the three groups was set to be identical. Specifically, the underlying distribution for the second-order latent trait was bivariate normal with _{i(g)}_{(d)} were generated from ^{2}, and covariance matrices are presented in

True parameter values of ^{2} for the first-order traits _{(d)} in the simulation study.

^{2} |
||||
---|---|---|---|---|

1 | ||||

2 | ||||

3 |

The complete item responses were generated based on

Item parameters used in the simulation study (First 30 items).

1 | 1.288 | 0.193 | 16 | 1.554 | 0.693 |

2 | 1.320 | −0.080 | 17 | 1.390 | 1.076 |

3 | 1.260 | 0.881 | 18 | 0.930 | −0.668 |

4 | 1.092 | 1.300 | 19 | 0.906 | −0.028 |

5 | 1.120 | 0.164 | 20 | 1.366 | 1.852 |

6 | 0.995 | 1.096 | 21 | 1.258 | 1.821 |

7 | 1.010 | 0.562 | 22 | 0.919 | 1.797 |

8 | 1.366 | 1.488 | 23 | 0.944 | 1.751 |

9 | 1.110 | −1.351 | 24 | 1.253 | −0.654 |

10 | 0.956 | 1.557 | 25 | 0.910 | −1.013 |

11 | 1.050 | 0.134 | 26 | 0.977 | −0.942 |

12 | 0.937 | −0.408 | 27 | 0.974 | −0.244 |

13 | 0.682 | 1.503 | 28 | 1.231 | −0.604 |

14 | 1.125 | 1.504 | 29 | 0.780 | −1.236 |

15 | 1.105 | 1.746 | 30 | 1.099 | −1.162 |

Four MCMC chains were simultaneously run to monitor their convergence. Each chain had 25,000 iterations and the first 10,000 iterations were discarded as burn-in. The Gelman–Rubin (G–R) diagnostic statistics (Gelman and Rubin, ^{2}s were also quite close to their respective true values, and the estimated correlations between the two second-order traits were very close to the true correlation (0.5) in all three groups.

Estimated parameters and

0.500 | 0.013 | 0.496 | 0.014 | 0.497 | 0.009 | |

1 | 0.831 | 0.010 | 0.835 | 0.012 | 0.833 | 0.014 |

2 | 0.838 | 0.012 | 0.833 | 0.012 | 0.840 | 0.010 |

3 | 0.833 | 0.010 | 0.831 | 0.014 | 0.836 | 0.011 |

4 | 0.827 | 0.015 | 0.837 | 0.012 | 0.834 | 0.019 |

5 | 0.838 | 0.012 | 0.834 | 0.017 | 0.836 | 0.015 |

1 | NA | NA | 0.300 | 0.014 | −0.305 | 0.016 |

2 | NA | NA | 0.403 | 0.016 | −0.404 | 0.021 |

3 | NA | NA | 0.505 | 0.015 | −0.498 | 0.021 |

4 | NA | NA | 0.605 | 0.013 | −0.600 | 0.021 |

5 | NA | NA | 0.704 | 0.013 | −0.702 | 0.019 |

1 | NA | NA | 0.759 | 0.027 | 1.283 | 0.047 |

2 | NA | NA | 0.768 | 0.027 | 1.278 | 0.051 |

3 | NA | NA | 0.757 | 0.025 | 1.273 | 0.044 |

4 | NA | NA | 0.742 | 0.026 | 1.270 | 0.055 |

5 | NA | NA | 0.758 | 0.027 | 1.267 | 0.042 |

Bias and RMSE of the discrimination and difficulty parameter estimates of the 150 items.

1 | −0.002 | 0.003 | 0.013 | 0.013 |

2 | −0.008 | 0.004 | 0.014 | 0.014 |

3 | −0.005 | 0.005 | 0.012 | 0.011 |

4 | 0.003 | 0.005 | 0.013 | 0.009 |

5 | −0.001 | 0.004 | 0.010 | 0.012 |

Overall | −0.002 | 0.004 | 0.012 | 0.012 |

Scatter plots of the true and estimated first-order latent trait scores from the simulation study.

Scatter plots of the true and estimated second-order latent trait scores from the simulation study.

Although the simulation study illustrated the MCMC algorithms for the bivariate HO-IRT model in the saturated form, the MCMC algorithms can flexibly be adapted to fit reduced (i.e., simpler) models. For example, a reduced model may be a bivariate HO-IRT model with multiple groups with different means and a common covariance structure across groups. This reduced model may be needed for studies with small samples or sparse data, because the estimation of different covariance structures across groups may result in unreliable estimates under challenging data situations. With a common covariance structure assumed among multiple groups, it is straightforward to estimate the reduced models. In this case, only one set of the common ^{2} needs to be specified in the model. Moreover, additional model constraints can be imposed to the anchor group to avoid estimation indeterminacy, which renders the common covariance matrix being equivalent to the common correlation matrix. The implied correlation between any two θs sharing the same second-order ω simplifies to λ_{(d)} × λ_{(d′)}, and between any two θs stemming from two different ωs to λ_{(d)} × λ_{(d′)} ×

We focus on the Alcohol Expectancies and Drinking Motives constructs from Project INTEGRATE (Mun et al.,

The entire item data pool consisted of a total of 126 items assessed in 19 studies from several questionnaires and items. These items were originally from the Comprehensive Effects of Alcohol Questionnaire (CEOA; Fromme et al.,

Analyzing the entire data set in a single analysis was challenging because the combined data were very sparse (i.e., a high rate of missing data at the item level due to different study designs), which is typical in the analysis of existing IPD from multiple studies. We followed the same strategies used in Huo et al. (

We analyzed the data using four different HO-IRT models: univariate HO-IRT model with a common correlation structure across groups (model 1); univariate HO-IRT model with different correlation/covariance structures across groups (model 2); bivariate HO-IRT model with a common correlation structure across groups (model 3); and bivariate HO-IRT model with different correlation/covariance structures across groups (model 4). In each model estimation, Group 1 was set as the anchor group because it had moderate mean responses on average, compared to the other two groups. A total of four chains with different starting values were implemented to monitor convergence. Each chain had 75,000 draws and the initial 10,000 iterations were considered as the burn-in. The G-R statistics (Gelman and Rubin,

We compared the model fit of the four models based on the deviance information criterion (DIC; Spiegelhalter et al.,

Derived correlation matrices from the bivariate HO-IRT model (lower off-diagonal) and from the univariate HO-IRT model (upper off-diagonal).

_{1} |
_{2} |
_{3} |
_{4} |
_{5} |
||
---|---|---|---|---|---|---|

θ_{1} |
1 | 0.813 | 0.847 | 0.442 | 0.364 | |

θ_{2} |
1 | 0.827 | 0.431 | 0.356 | ||

Group 1 | θ_{3} |
1 | 0.450 | 0.371 | ||

θ_{4} |
0.447 | 0.436 | 0.452 | 1 | 0.193 | |

θ_{5} |
0.372 | 0.362 | 0.375 | 1 | ||

θ_{1} |
1 | 0.783 | 0.645 | 0.615 | 0.506 | |

θ_{2} |
1 | 0.646 | 0.616 | 0.507 | ||

Group 2 | θ_{3} |
1 | 0.507 | 0.417 | ||

θ_{4} |
0.482 | 0.480 | 0.410 | 1 | 0.398 | |

θ_{5} |
0.455 | 0.454 | 0.387 | 1 | ||

θ_{1} |
1 | 0.864 | 0.859 | 0.737 | 0.685 | |

θ_{2} |
1 | 0.836 | 0.718 | 0.667 | ||

Group 3 | θ_{3} |
1 | 0.713 | 0.663 | ||

θ_{4} |
0.723 | 0.705 | 0.702 | 1 | 0.569 | |

θ_{5} |
0.689 | 0.671 | 0.668 | 1 |

_{1}, θ_{2}, θ_{3}) share the same second-order trait ω_{1} and the remaining two first-order trait scores θ_{4} and θ_{5} share ω_{2}. Underlined numbers indicate the higher-order relationships. Group-specific correlations between the two second-order latent traits were 0.961, 0.624, and 0.879, respectively for Groups 1–3

Finally, we implemented the posterior predictive model check (PPMC) procedure to evaluate the model fit. We used the proportion correct (i.e., the proportion of items endorsing the “correct,” “agree” or “true” response) as a discrepancy measure for this procedure. The posterior predictive

The current study provides findings from a simulation study as well as from real data analysis, which demonstrates the feasibility of the MCMC algorithms and potential utility of the multivariate HO-IRT model for multiple groups in connection with analysis of IPD from multiple studies. In recent years, a number of flexible IRT and FA programs have emerged for estimating unidimensional or multidimentional 1PL, 2PL, 3PL, graded response, partial credit, higher-order IRT, and bifactor models, including BMIRT II, a component of a Bayesian multivariate IRT (BMIRT) toolkit by Yao (

Despite the advances, there is an unmet need for additional tools to help address the challenges of analyzing item response data to combine and synthesize IPD from multiple studies. The existing approaches to establishing commensurate scores across studies have been limited to unidimensional and first-order item response data. For example, a two-parameter logistic IRT (2PL-IRT) model (Curran et al.,

The multivariate HO-IRT model reported in the current study may be appealing as more investigators attempt to combine IPD from different studies and to establish equivalent scores at the participant level for complex item response data. The multivariate HO-IRT model can correctly reflect a theoretical higher-order model for a given construct while estimating latent traits at different hierarchical levels across studies, providing greater flexibility. We demonstrated that the MCMC algorithm accurately estimated the item parameters, and first-order and second-order latent trait scores, as well as the parameters of the hierarchical structures (i.e., the means and regression coefficients) in the simulation study. From the real data application, we found that a bivariate HO-IRT model with different correlation/covariance structures for studies fit the data best as expected, compared with its counterpart univariate HO-IRT model or with the bivariate HO-IRT model with unreasonable constraints (i.e., the same means and covariances across studies).

Note that although we used a multi-group approach to reflect study-level differences, other approaches also exist, such as adding individual-level and study-by-individual level covariates into measurement models when deriving commensurate scores as in MNLFA (Bauer and Hussong,

The multivariate HO-IRT model and the MCMC algorithms in the present study were developed to address the measurement and computational challenges in the original project. However, the algorithms can be adapted to accommodate new features and converted into program codes (e.g., Stan; Bürkner,

With regard to specific areas for model refinement in future studies, first, the current study assumed that the means of the higher-order latent traits are 0 s and their variances are 1 s for model simplicity. Setting such model constraints is not the only solution to handle estimation indeterminacy. Depending on specific research requirements, different constraints can be imposed. In the future, different mean levels of higher-order latent traits, not necessarily 0 s, may be estimated to better understand how higher-order structural parameters function in multiple-group applications. Similarly, for even more complex situations, such as third-order latent traits subsuming second-order latent traits, more constraints may be considered to model data parsimoniously.

Second, we assumed that the same items administered in different studies had the same item parameters. This is the same assumption made for the hierarchical 2PL-MUIRT model (Huo et al.,

Finally, the simulation study was rather limited in scope. We focused on demonstrating the feasibility of the MCMC algorithms. It would be helpful to examine several key data conditions under which the MCMC algorithms perform well in a carefully designed simulation study. In sum, the multivariate HO-IRT model for multiple groups has room to further improve.

Having described the areas to improve, we now highlight the promise of this new method in broader terms. In the field of clinical research, it is increasingly important to share and link data across different systems measured in different time scales for data-driven discoveries to deliver faster and more individualized treatment decisions (i.e., the Precision Medicine Initiative; Collins and Varmus,

The North Texas Regional IRB reviewed and approved this study.

E-YM developed ideas and secured funding for the parent project, oversaw the entire analytical plan, and drafted the manuscript. YH developed the MCMC, analyzed data, drafted technical sections, and contributed to the writing of the manuscript. HRW contributed to the preparation of real data for analysis and edited the manuscript. SS edited the manuscript. JdlT developed and oversaw the analytical plan for this paper and contributed to the writing of this manuscript.

The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

We would like to thank the following data contributors in alphabetical order: John S. Baer, Department of Psychology, The University of Washington, and Veterans' Affairs Puget Sound Health Care System; Nancy P. Barnett, Center for Alcohol and Addiction Studies, Brown University; M. Dolores Cimini, University Counseling Center, The University at Albany, State University of New York; William R. Corbin, Department of Psychology, Arizona State University; Kim Fromme, Department of Psychology, The University of Texas, Austin; Joseph W. LaBrie, Department of Psychology, Loyola Marymount University; Mary E. Larimer, Department of Psychiatry and Behavioral Sciences, The University of Washington; Matthew P. Martens, Department of Educational, School, and Counseling Psychology, The University of Missouri; James G. Murphy, Department of Psychology, The University of Memphis; Scott T. Walters, Department of Health Behavior and Health Systems, The University of North Texas Health Science Center; and the late Mark D. Wood, Department of Psychology, The University of Rhode Island.

The Supplementary Material for this article can be found online at: