^{*}

Edited by: Hong Jiao, University of Maryland, College Park, United States

Reviewed by: Lietta Marie Scott, Arizona Department of Education, United States; Gongjun Xu, University of Michigan, United States

This article was submitted to Quantitative Psychology and Measurement, a section of the journal Frontiers in Psychology

This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.

Longitudinal diagnostic classification models (DCMs) with hierarchical attributes can characterize learning trajectories in terms of the transition between attribute profiles for formative assessment. A longitudinal DCM for hierarchical attributes was proposed by imposing model constraints on the transition DCM. To facilitate the applications of longitudinal DCMs, this paper explored the critical topic of the Q-matrix design with a simulation study. The results suggest that including the transpose of the R-matrix in the Q-matrix improved the classification accuracy. Moreover, 10-item tests measuring three linear attributes across three time points provided satisfactory classification accuracy for low-stakes assessment; lower classification rates were observed with independent or divergent attributes. Q-matrix design recommendations were provided for the short-test situation. Implications and future directions were discussed.

Diagnostic cognitive models (DCMs; or cognitive diagnostic models, CDMs) have received increasing attention because the latent variable modeling approach to diagnostic assessment can shed light on the learning process (Rupp et al.,

The transition DCM (TDCM), proposed by Madison and Bradshaw (

The Q-matrix design, as a core element of the DCM-based test design, has not been adequately addressed in the context of longitudinal DCMs, since existing research focuses on model development and applications of longitudinal DCMs (e.g., Kaya and Leite,

The identifiability conditions need to be satisfied for consistent estimation of the model parameters. Gu and Xu (

However, the Q-matrices that lead to identification may provide varying classification accuracy rates (DeCarlo,

When attribute hierarchies are involved, there has not been a consensus on the Q-matrix design regarding whether all q-vectors are eligible (Templin and Bradshaw, ^{K} − 1 distinct q-vectors. Consider a linear hierarchy with three attributes: α_{1} → α_{2} → α_{3}. Attribute α_{2} has direct relationships with the other two attributes while Attribute α_{1} and α_{3} have an indirect relationship. The reachability matrix or R-matrix can be used to capture both direct and indirect relationships (Tatsuoka, ^{K} − 1 = 7 q-vectors in the Q-matrix as in an independent-attribute situation (Liu and Huggins-Manley,

Example of R-matrix and Q-matrix for three linear attributes.

Tu et al. (^{T}. Liu et al. (

To sum up, the purposes of the current study are 2-fold: First, the H-TDCM was defined to incorporate hierarchical attributes in the longitudinal DCM. Second, different Q-matrix designs were explored for TDCM and H-TDCM with a Monte Carlo simulation study. Both longitudinal models are based on LCDM, which is a general framework without limitations of the model fit assumptions. The rest of the paper is organized as follows. The next section briefly introduces LCDM, HDCM, and TDCM before defining the H-TDCM. Then, previous studies on the Q-matrix design are reviewed, followed by a simulation study on Q-matrix designs for TDCM and H-TDCM. The paper is concluded with a discussion of the limitations and educational implications.

The LCDM (Henson et al.,

Examinee attribute profiles are denoted by vectors _{c} = (α_{c1}, …α_{ck}, …, α_{cK}), where _{ck} takes the value of 0 or 1, indicating the non-mastery or mastery, respectively, of the ^{K} attribute profiles assuming independent attributes. The number of attribute profiles decreases accordingly with hierarchical attributes.

For each item measured on a test, the LCDM item response function models the attributes mastery effects on the item response in terms of an intercept, the main effect for each attribute measured by the item, and the interaction term(s) that correspond to each possible combination of multiple attributes measured by the item. The general form of the LCDM item response function can be expressed as

where λ_{i,0} is the intercept parameter of item _{i} contains all other item parameters including the main effects and interaction terms for item _{i} denotes the q-vector of item _{c} and _{i}.

Templin and Bradshaw (_{c} in Equations (1) and (2) is replaced by ^{K} to

Madison and Bradshaw (

The proposed H-TDCM combined the features of HDCM and TDCM to deal with hierarchical attributes in longitudinal data. The attribute hierarchy is imposed on TDCM by constraining corresponding item parameters in the measurement model as in HDCM and the structural parameters that are specific to TDCM. Specifically, model parameters for the main effects of nested attributes and some interaction terms are constrained as zero in light of the prerequisite relationships among them. Also, similar constraints are set on the transition parameters and prevalence parameters.

Given the expression of LTA (Collins and Lanza,

where _{i} response categories; _{i, t} is the examinee's response to item _{i, t} = _{i, t}) is an indicator function that is equal to 1 when the response is _{i, t}, and equal to 0 otherwise; each sum ranges over each of the _{t} =

There are three types of parameters to be estimated (similar to the case of TDCM) in Equation (3). The first type includes HDCM item parameters λ_{i, 0} and _{i}. The second type is the probability of membership in attribute profile _{αc1}; and the third is the probability of transitioning between different attribute profiles (from _{ct−1} to _{ct}) between time point t−1 to time point t, denoted as τ_{αct}_{ct−1}, usually expressed as a multinomial regression model (e.g., Reboussin et al.,

We take for example a test measuring three linear attributes (α_{1} → α_{2} → α_{3}). The

Four item parameters are to be estimated including the intercept effect λ_{i,0}, the main effect λ_{i,1,(1)}, the second-order interaction effect λ_{i,2,(2(1))}, and the third-order interaction effect λ_{i,3,(3(2, 1))}:

Note that Equation (3) is a general form of the H-TDCM. The combination of LTA and any other specific hierarchical CDM can be realized by imposing parameter constraints. The H-TDCM, in turn, can be seen as a special case of TDCM, and the two models can be compared with a likelihood-ratio difference test (Collins and Lanza,

The simulation study aimed to explore the effects of different Q-matrices on the classifications of TDCM with or without an attribute hierarchy. There has been a need for short tests that measure a couple of fine-grained attributes in the classroom setting. The simulation conditions approximated a practical formative assessment over a learning period of 2–4 weeks. A limited number of attributes would be focused on within such a short period, and time for testing is also very limited so short sessions are preferred. This short test is supposed to be administered three times: at the beginning, in the middle, and approaching the end of the learning period. Therefore, the simulations only consider three-attribute tests administered over three time points. Three attribute hierarchies (independent, divergent, and linear) are considered. The three attribute hierarchies with three attributes and the associated R-matrices are presented in

Three attribute hierarchies with three attributes and their R-matrix.

As mentioned earlier, there are two general approaches to Q-matrix design with hierarchical attributes—the restricted and the unstructured Q-matrix approaches. The restricted Q-matrix approach only allows q-vectors in the transpose of the R-matrix, denoted as ^{T} (Leighton et al., ^{T}s in the Q-matrix to obtain acceptable classification accuracy (Tu et al., ^{T}s in the Q-matrix even though the unstructured approach was adopted. For each attribute hierarchy, three Q-matrix designs were used. The first Q-matrix design does not contain ^{T}, denoted as _{1}. The second and third Q-matrix designs include one or two ^{T}s, which are denoted as _{2} and _{3}, respectively. Crossing two factors (i.e., attribute hierarchy and Q-matrix design) led to a total of 9 conditions. The simulation study focused on the Q-matrix design; thus, all Q-matrices were assumed to be correctly specified.

The item parameters are assumed to be time-invariant for the attribute profiles to retain the same meaning over time. Previous studies have shown that the examinee sample size barely has an impact on the classification rates of DCMs (de la Torre et al.,

To avoid the effects of item quality, we fixed the item parameters over all conditions: The intercept effect was −1, the main effect was 2, and the interaction effect was 1. As a result, ^{3} attribute profiles: _{1} (0, 0, 0), _{2} (0, 0, 1), _{3} (0, 1, 0), _{4} (0, 1, 1), _{5} (1, 0, 0), _{6} (1, 0, 1), _{7} (1, 1, 0), and _{8} (1, 1, 1). The divergent hierarchy condition had _{1} (0, 0, 0), _{5} (1, 0, 0), _{6} (1, 0, 1), _{7} (1, 1, 0), and _{8} (1, 1, 1). Three linear attributes led to four attribute profiles: _{1} (0, 0, 0), _{5} (1, 0, 0), _{7} (1, 1, 0), and _{8} (1, 1, 1).

Mplus 7.4 (Muthén and Muthén,

The correct classification rates are presented in _{2}) increased the profile CCRs and marginal CCRs at each time point for independent, divergent, and linear hierarchies. Including one more transpose of the R-matrix (i.e., _{3}) further slightly increased the CCRs except for the linear hierarchy. Another interesting finding is that the profile CCRs tended to increase with time. The CCRs at Time 3 were the highest. This trend was found under each combination of attribute hierarchy and Q-matrix design. The increase with time was not found in the marginal CCRs for independent attributes. Within the divergent or linear hierarchy, the marginal CCRs of the highest-level attribute (i.e., α_{2} and α_{3} under the divergent hierarchy and α_{3} under the linear hierarchy) increased with time while the lowest-level attribute (i.e., α_{1}) had decreasing CCRs with time.

Classification rates of three Q-matrix designs.

Time 1 | 0.517 | 0.550 | 0.557 | 0.582 | 0.651 | 0.671 | 0.710 | 0.731 | 0.725 |

Time 2 | 0.522 | 0.553 | 0.556 | 0.595 | 0.667 | 0.681 | 0.725 | 0.749 | 0.736 |

Time 3 | 0.536 | 0.577 | 0.577 | 0.606 | 0.680 | 0.693 | 0.734 | 0.761 | 0.744 |

Mean | 0.525 | 0.560 | 0.563 | 0.594 | 0.666 | 0.682 | 0.723 | 0.747 | 0.735 |

Time 1 α_{1} |
0.723 | 0.784 | 0.821 | 0.938 | 0.937 | 0.917 | 0.931 | 0.929 | 0.901 |

Time 1 α_{2} |
0.833 | 0.838 | 0.809 | 0.714 | 0.795 | 0.840 | 0.864 | 0.887 | 0.904 |

Time 1 α_{3} |
0.831 | 0.807 | 0.810 | 0.855 | 0.858 | 0.860 | 0.864 | 0.872 | 0.885 |

Mean | 0.796 | 0.810 | 0.813 | 0.836 | 0.863 | 0.872 | 0.886 | 0.896 | 0.897 |

Time 2 α_{1} |
0.704 | 0.774 | 0.809 | 0.929 | 0.925 | 0.903 | 0.915 | 0.913 | 0.881 |

Time 2 α_{2} |
0.827 | 0.835 | 0.803 | 0.716 | 0.804 | 0.845 | 0.848 | 0.875 | 0.890 |

Time 2 α_{3} |
0.828 | 0.796 | 0.798 | 0.857 | 0.859 | 0.864 | 0.927 | 0.932 | 0.937 |

Mean | 0.787 | 0.802 | 0.804 | 0.834 | 0.863 | 0.871 | 0.897 | 0.907 | 0.903 |

Time 3 α_{1} |
0.713 | 0.784 | 0.816 | 0.927 | 0.924 | 0.901 | 0.912 | 0.912 | 0.877 |

Time 3 α_{2} |
0.835 | 0.840 | 0.806 | 0.724 | 0.811 | 0.852 | 0.849 | 0.876 | 0.890 |

Time 3 α_{3} |
0.834 | 0.807 | 0.808 | 0.864 | 0.864 | 0.868 | 0.944 | 0.947 | 0.950 |

Mean | 0.794 | 0.810 | 0.810 | 0.838 | 0.866 | 0.874 | 0.902 | 0.912 | 0.906 |

_{1}, Q_{2}, and Q_{3} included zero, one, or two R-matrix transposes

Comparing the three attribute hierarchies revealed that the CCRs generally increased as the relationship between attributes became stronger, and meanwhile, the number of attribute profiles became smaller. The profile CCRs were above 0.7, and the marginal CCRs were above 0.85 under the linear hierarchy with 10-item tests. The classifications for the independent attributes were the most difficult.

This paper proposed H-TDCM for hierarchical attributes in the longitudinal DCM by imposing model constraints on TDCM. The simulation study explored Q-matrix designs with different numbers of R-matrices. The CCRs generally increased with stronger dependencies between attributes, which is consistent with the findings of Templin and Bradshaw (

Regarding the Q-matrix design, we took the unstructured Q-matrix approach (Liu and Huggins-Manley, ^{T}. Simulation results showed that including one R-matrix transpose in the Q-matrix increased the CCRs in the case of independent attributes. Note that although the identification issue of CDMs and the Q-matrix design are usually treated as two separate research areas, the identification requirement may not always be satisfied in the Q-matrix design studies, especially for more complicated models and shorter tests.

First, we looked at the results for independent attributes. A closer look at the Q-matrices revealed that the first Q-matrix design (_{1}) did not measure α_{1} in isolation; the second Q-matrix design (_{2}) contained only one identity matrix and measured α_{1} in isolation only once. This explained the much lower classification rates for α_{1} compared with other attributes. This finding with the TDCM agrees with the results of conventional DCMs (DeCarlo, _{1} and _{2} for independent attributes, the model parameters suffered from the non-identifiability issue and the consequence was reflected in the lower profile CCRs with _{1} and _{2} than with _{3} in _{1} under _{1} and _{2} were substantially lower than those under _{3}, while the marginal CCRs of the other two attributes did not differ much between Q-matrix designs.

Including ^{T} in the Q-matrix also increases the classification rates for the hierarchical cases in this study, which is consistent with the empirical findings from Tu et al. (^{T} as a submatrix in the Q-matrix ensures a separable Γ -matrix in ^{T} is in the form of

after some row permutation, in which ^{*} takes the value of 0 or 1 and K is the number of attributes. Two ^{T}s were contained in _{3}, which led to a separable Γ -matrix. As a result, _{3} always ensures the identification of the model, while the first design may lead to non-identification issues (Gu and Xu, forthcoming). In contrast, _{2} contained one ^{T} and at least one identity matrix instead of two ^{T}s, which does not affect the model identification. Therefore, _{2} and _{3} showed similar classification rates. One major difference between the two designs is that _{2} contains more single-attribute items and fewer multiple-attribute items. Under the linear hierarchy, for example, _{3} has at least two items with q=(111), which has seven item parameters to be estimated. The parameter recovery of such items may be more difficult than single-attribute items, and the classification rate may suffer. As a result, the performance of _{2} turned out to be better than _{3} for the linear hierarchy.

^{T} as a submatrix in the Q-matrix ensures a separable Γ -matrix.

100 | 0 | 1 | 1 | 1 |

110 | 0 | 0 | 1 | 1 |

111 | 0 | 0 | 0 | 1 |

This study aimed to demonstrate the classification performance of the H-TDCM with a short test and provide practical guidelines for the applications of this longitudinal model for formative classroom assessment. For the current setting of short tests and only a few attributes, we recommend that the Q-matrix contains (1) two identity matrices for independent attributes, (2) two ^{T}s for a divergent hierarchy, and (3) one ^{T} and one identity matrix for a linear hierarchy. Besides, each attribute should be probed by at least three items. However, it should be noted that the current simulation study assumes that it is possible to develop items of all types of q-vectors with equal easiness, which may not be true for certain subject areas. For example, it may be more difficult to develop items that measure each attribute in isolation.

The formative classroom assessment has received renewed attention recently with the development of curriculum reform. The fusion of curriculum, instruction, and the assessment requires timely and constructive feedback that is closely connected to a curriculum and are based on students' learning history (e.g., Bennett,

The datasets generated for this study are available on request to the corresponding author.

All authors listed have made a substantial, direct and intellectual contribution to the work, and approved it for publication.

The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

The Supplementary Material for this article can be found online at: