^{1}

^{*}

^{2}

^{1}

^{2}

Edited by: Holmes Finch, Ball State University, United States

Reviewed by: Hong Jiao, University of Maryland, College Park, United States; Richard James Neufeld, University of Western Ontario, Canada

*Correspondence: Purya Baghaei

This article was submitted to Quantitative Psychology and Measurement, a section of the journal Frontiers in Psychology

This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) or licensor are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.

The linear logistic test model (LLTM) is a well-recognized psychometric model for examining the components of difficulty in cognitive tests and validating construct theories. The plausibility of the construct model, summarized in a matrix of weights, known as the Q-matrix or weight matrix, is tested by (1) comparing the fit of LLTM with the fit of the Rasch model (RM) using the likelihood ratio (LR) test and (2) by examining the correlation between the Rasch model item parameters and LLTM reconstructed item parameters. The problem with the LR test is that it is almost always significant and, consequently, LLTM is rejected. The drawback of examining the correlation coefficient is that there is no cut-off value or lower bound for the magnitude of the correlation coefficient. In this article we suggest a simulation method to set a minimum benchmark for the correlation between item parameters from the Rasch model and those reconstructed by the LLTM. If the cognitive model is valid then the correlation coefficient between the RM-based item parameters and the LLTM-reconstructed item parameters derived from the theoretical weight matrix should be greater than those derived from the simulated matrices.

The linear logistic test model (LLTM; Fischer, _{i} is a weighted sum of the basic parameters η_{j}. The item response function for the standard dichotomous Rasch model is expressed as follows:

LLTM imposes the following linear constraint on the difficulty parameter β_{i}:

where _{ij} is the given weight of the basic parameter _{j} is the estimated difficulty of the basic parameter ^{1}

Under the LLTM, theoretically, the difficulty parameters of the processes hypothesized to be involved in solving the items η_{j} add up and constitute the Rasch model item difficulty parameters β_{i}. In other words, item difficulty is an additive function of the basic parameters η_{j}. That is, if the construct theory suggests that two cognitive operations with difficulty parameters of η_{1} and η_{2} are needed to solve item _{i} = η_{1 +} η_{2}. And if item _{3}, then β_{j} = η1 + η2 + η3 _{j} − β_{i} = η_{3} (Fischer,

LLTM can also be used in investigating the impact of construct irrelevant factors, such as test method, item position, length of the item text, etc., on item difficulty and the impact of experimental conditions such as training and therapy (for applications of LLTM see Kubinger,

To estimate LLTM a matrix of weights _{j} along with their weights _{ij} on each item

One common approach to test the validity of the hypothesized theory reflected in the weight matrix is to compare the fit of the LLTM with the fit of the Rasch model. LLTM is a more restricted model than the Rasch model and, therefore, the LLTM and the Rasch model are hierarchically related or are nested models. Likelihood ratio (LR) test can be used to compare the fit of nested models. The deviance of −2 times log-likelihoods of the two models is approximately chi-square distributed with degrees of freedom equal to the difference between the numbers of parameters in the models (Fischer,

A condition for the LR test is that the superior model, in this case the Rasch model, should fit the data (Fischer, ^{2}

The problem with the LR test is that it is almost always significant and, consequently, LLTM is rejected (Fischer and Formann,

For this reason researchers mostly rely on the correlation between difficulty parameters resulted from fitting the Rasch model and those reconstructed by the LLTM. If the weight matrix is specified correctly, i.e., if the construct theory defined in terms of the basic parameters and their weights is valid, the item parameters from the Rasch model should be the same as those reproduced by the LLTM, except for random error. Nevertheless, there is no cut-off value for the magnitude of the correlation coefficient to ascertain the validity of the cognitive model. Correlations between 0.75 (Baghaei and Ravand,

Horn (

Since it is very difficult to attain good fit for the LLTM researchers almost always rely on the correlation between the Rasch model (RM) item parameters and LLTM reconstructed item parameters to evaluate the usefulness of their cognitive theory in accounting for variations in item parameters. However, there is no recommended cut-off value in the literature on how large the correlation should be to confirm the validity of the weight matrix and by implication the explanatory usefulness of the cognitive model postulated. Furthermore, there is no statistical significance test for this correlation. Fischer and Formann (_{0} that the LLTM holds cannot be expressed by “

The empirical validity of the weight matrix is in fact evidence for the validity of the postulated cognitive model (Baghaei and Kubinger,

In short, we suggest that to evaluate the substantive plausibility of the cognitive model researchers can simulate random weight matrices with the same number of items and operations as there are in the actual theoretical weight matrix and feed them into the LLTM analysis and compute the average of the correlations yielded by these “fake” weight matrixes. We expect the correlation from the theoretical weight matrix to be greater than 95% of these correlations. In this case there is evidence for the usefulness and plausibility of the cognitive model to account for variance in item parameters.

To study the appropriateness and the feasibility of our approach we ran simulations based on two empirical data sets to which LLTM had been fitted before. Ghahramanlou et al. (

Baghaei and Ravand (^{3}

Deviance, number of estimated parameters and information criteria AIC and BIC for the RM and the LLTM of the two real data sets.

RM Listening | 4931.68 | 22 | 4975.68 | 5057.81 |

LLTM Listening | 5512.87 | 6 | 5524.87 | 5547.27 |

RM Reading | 5532.41 | 16 | 5564.41 | 5628.24 |

LLTM Reading | 5839.56 | 5 | 5849.56 | 5869.51 |

For the simulation study, three different scenarios were implemented. In the first two scenarios, weight matrices were intentionally misspecified with the aim of checking the impact on the correlation between the parameters. In Scenario 1, the weight matrix was misspecified to a high degree by simulating the design matrices almost completely at random. In Scenario 2 another approach was taken: the empirical weight matrix was taken as a starting point and perturbations were imposed gradually. Thus, Scenario 1 serves to get the lowest possible benchmark. In practice, a theoretically derived weight matrix should show a better fit and produce a higher correlation than a completely random matrix. Scenario 2 sets a low benchmark (but higher than in Scenario 1). Finally, Scenario 3 serves to get an upper benchmark for the fit of an empirical weight matrix.

The dimensions of the original weight matrices _{L} of the listening test were 61.59% and 38.41%, respectively. For the reading test the proportions of 0′s and 1′s in the weight matrix _{R} were 65.88 and 34.12%, respectively. To create the random weight matrices, each entry of the matrix _{ij} was sampled from _{ij} ∈ {0;1} and the proportion of 1′s was altered between 30 and 70%. The reason for deciding to limit the proportion of 1′s between 0.3 and 0.7 was that in applications of the LLTM the proportion of 1′s in design matrices of the LLTM is smaller than the proportion of 0′s and rarely exceeds 0.7 (for a typical example of an application of the LLTM see Freund et al., _{S} were generated and the original empirical data sets were analyzed using these 1,000 matrices. The correlations between the item parameters from the Rasch model of the empirical data with those reconstructed from the LLTM basic parameters using the simulated weight matrices were calculated. In addition the LR test and information criteria AIC and BIC comparing the LLTM with the Rasch model were computed. If the empirical weight matrices _{R} and _{L} are substantively valid, we expect the majority of the _{S}′ s lead to correlations lower than those based on actual matrices _{R}and _{L} because they do not rest on a theoretical rationale but are only randomly generated. Table

Weight matrices created completely at random, are a rather low benchmark for validity. Therefore, a second scenario was implemented: the new weight matrices were not sampled randomly; instead, the original matrices _{R} and _{L} were taken and were modified to introduce some amount of “randomness” or noise to them. For this purpose a varying number of rows (representing the items) of the _{R} and _{L} were misplaced.

Misplacing the rows in the listening weight matrix _{L} allowed for 3.556874e + 14 permutations, while the reading test weight matrix _{R} allowed 2.585202e + 22 permutations. A random sample of 1,000 permutations were selected as 1,000 new modified weight matrices _{M}, since it was impossible to analyze all the permutations. As in Scenario 1, these matrices were used for an LLTM analysis on the original empirical data sets. The correlations of the item parameters as well as the LR tests and information criteria were computed.

Table

Scenarios 1 and 2 aimed to provide a “lower benchmark.” i.e., how high the correlations between item parameters can be in the case of just “randomized” or “partly randomized” weight matrices. Now, with Scenario 3, we wanted to get an impression of the upper benchmark. Therefore we studied how high the correlations could be if the weight matrix is “perfectly” specified.

The starting point was again the original weight matrices _{R} and _{L}_{.} Furthermore, the estimated basic parameters of the empirical analyses were taken and the item parameters were reconstructed. Subsequently, 1,000 data sets (with dimensions equal to those of the empirical data sets) were simulated on the basis of these reconstructed item parameters. That is, data sets were generated on the basis of the LLTM which means that they had a “perfect” fit (besides the sampling error) to the given weight matrices. Again, these 1,000 data sets were analyzed using the LLTM with _{R} and _{L}. The correlations between item parameter, the LR tests, AIC, and BIC were computed. The estimation method for the Rasch models and LLTM was conditional maximum likelihood method.

As mentioned in the previous section, two empirical examples were chosen for which LLTM analyses had already been conducted. The results in both cases showed a significant LR test. The correlation between the RM item parameters and the item parameters reconstructed by the LLTM was

The proportion of 0′s and 1′s in the weight matrices were varied from 30 to 70% 1′s in the simulations. The descriptive statistics for the correlations between the item parameters from the Rasch model and the LLTM are shown in Table

Descriptive statistics for the correlations obtained from simulated weight matrices.

Listening 30 | 0.1503 | 0.2892 | 0.4142 | 0.5031 | 0.5013 | 0.5890 | 0.7111 | 0.8690 |

Listening 40 | 0.1532 | 0.2905 | 0.4158 | 0.5075 | 0.5020 | 0.5921 | 0.7073 | 0.8819 |

Listening 50 | 0.1506 | 0.2960 | 0.4420 | 0.5176 | 0.5117 | 0.6054 | 0.7030 | 0.8870 |

Listening 60 | 0.0647 | 0.2949 | 0.4107 | 0.5037 | 0.5016 | 0.5880 | 0.7096 | 0.8252 |

Listening 70 | 0.1255 | 0.2891 | 0.4161 | 0.5036 | 0.5042 | 0.5925 | 0.7071 | 0.8377 |

Reading 30 | 0.1100 | 0.3098 | 0.4403 | 0.5507 | 0.5448 | 0.6495 | 0.7670 | 0.8955 |

Reading 40 | 0.1275 | 0.2879 | 0.4355 | 0.5409 | 0.5386 | 0.6467 | 0.7696 | 0.8965 |

Reading 50 | 0.1183 | 0.2823 | 0.4288 | 0.5434 | 0.5350 | 0.6375 | 0.7874 | 0.9292 |

Reading 60 | 0.1400 | 0.2882 | 0.4368 | 0.5448 | 0.5367 | 0.6329 | 0.7606 | 0.9105 |

Reading 70 | 0.0716 | 0.2762 | 0.4383 | 0.5337 | 0.5347 | 0.6417 | 0.7674 | 0.8919 |

Besides the correlations, the LR test comparing the likelihood of the LLTM to that of the Rasch model was performed for each of the randomized weight matrices. As expected, all LR tests were significant (

Descriptive statistics of the information criteria AIC and BIC for the LLTM analysis based on the simulated weight matrices.

Listening 30 | 5,456 | 6,399 | 6,829 | 0 | 5,478 | 6,421 | 6,852 | 0 |

Listening 40 | 5,432 | 6,400 | 6,836 | 0 | 5,454 | 6,423 | 6,859 | 0 |

Listening 50 | 5,418 | 6,383 | 6,827 | 0 | 5,441 | 6,406 | 6,850 | 0 |

Listening 60 | 5,628 | 6,400 | 6,866 | 0 | 5,651 | 6,422 | 6,888 | 0 |

Listening 70 | 5,557 | 6,397 | 6,849 | 0 | 5,580 | 6,419 | 6,871 | 0 |

Reading 30 | 5,673 | 5,976 | 6,169 | 0 | 5,693 | 5,996 | 6,189 | 0 |

Reading 40 | 5,674 | 5,979 | 6,162 | 0 | 5,694 | 5,999 | 6,182 | 0 |

Reading 50 | 5,686 | 5,986 | 6,161 | 0 | 5,706 | 6,006 | 6,181 | 0 |

Reading 60 | 5,668 | 6,002 | 6,182 | 0 | 5,668 | 6,002 | 6,182 | 0 |

Reading 70 | 5,679 | 5,982 | 6,170 | 0 | 5,699 | 6,002 | 6,190 | 0 |

For Scenario 2, the rows of the empirical weight matrices _{L} and _{R} were misplaced. The results are presented for the different amounts of perturbations imposed. Three conditions of small, medium, and large perturbations, depending on the number of rows misplaced, were simulated. Small perturbation means that the new weight matrix is very similar to the original one whereas a high level of perturbation means that the weight matrix is much more similar to a completely randomized weight matrix similar to Scenario 1. For the listening test with 23 rows, displacement of 2 to 7 rows were considered as “small,” 8 to 15 as “medium” and 16 to 23 as “large.” For the reading test with 17 rows, displacement of 2 to 6 rows were considered as “small,” 7 to 12 as “medium,” and 13 to 17 as “large.” The item parameter correlations of Scenario 2 are shown in Table

Descriptive statistics for the correlations obtained from the perturbed weight matrices.

Listening small | 91 | 0.6610 | 0.7572 | 0.7780 | 0.8337 | 0.8177 | 0.8487 | 0.8626 | 0.8734 |

Listening medium | 336 | 0.3414 | 0.3814 | 0.4906 | 0.5649 | 0.5790 | 0.6746 | 0.7604 | 0.8138 |

Listening large | 573 | 0.3414 | 0.3630 | 0.4229 | 0.4942 | 0.4805 | 0.5276 | 0.5744 | 0.6359 |

Reading small | 137 | 0.5239 | 0.5360 | 0.5853 | 0.6570 | 0.6529 | 0.7111 | 0.7681 | 0.7966 |

Reading medium | 419 | 0.2956 | 0.4547 | 0.5327 | 0.5837 | 0.5880 | 0.6382 | 0.7398 | 0.8195 |

Reading large | 444 | 0.1634 | 0.3117 | 0.3800 | 0.4493 | 0.4470 | 0.5171 | 0.5926 | 0.6480 |

Descriptive statistics for AIC and BIC for the LLTM analysis based on the simulated weight matrices.

Listening small | 5,461 | 5,722 | 6,172 | 0 | 5,483 | 5,744 | 6,195 | 0 |

Listening medium | 5,772 | 6,328 | 6,659 | 0 | 5,795 | 6,350 | 6,681 | 0 |

Listening large | 6,146 | 6,407 | 6,650 | 0 | 6,168 | 6,429 | 6,672 | 0 |

Reading small | 5,779 | 5,907 | 6,009 | 0 | 5,799 | 5,927 | 6,029 | 0 |

Reading medium | 5,753 | 5,959 | 6,120 | 0 | 5,773 | 5,979 | 6,140 | 0 |

Reading large | 5,921 | 6,049 | 6,158 | 0 | 5,941 | 6,068 | 6,178 | 0 |

For the LLTM-fitting data sets generated on the basis of the empirical weight matrices, the results of the item parameter correlations are displayed in Table

Descriptive statistics for the correlations for the simulated LLTM-fitting data sets based on the theoretical weight matrices _{L}and _{R}.

Listening | 0.9875 | 0.9917 | 0.9938 | 0.9951 | 0.9949 | 0.9962 | 0.9975 | 0.9984 |

Reading | 0.9472 | 0.9736 | 0.9815 | 0.9860 | 0.9851 | 0.9898 | 0.9939 | 0.9972 |

Table

In addition, information criteria AIC and BIC were calculated for each of the simulated data sets. Because, the comparison of AICs and BICs for model selection is only possible for identical data sets, AICs and BICs of the Rasch model and the LLTM were compared for each data set separately. That means, for each data set, the Rasch model and the LLTM were estimated and the AICs and BICs were compared. For the data simulation that was based on the empirical weight matrix or the Listening data, the AIC favored the LLTM in 6 of 1,000 data sets, the BIC always preferred the RM. For the six data sets, where AIC favored the LLTM, the difference between the AICs was very small (with a maximum difference of 4.64). For the data sets that were simulated according to the empirical weight matrices of the Reading test very similar results were found: the BIC of the LLTM was never smaller than that of the Rasch model - that is, the BIC always preferred the Rasch model. According to the AIC, in 22 of 1,000 data sets, the AIC was smaller for the LLTM than for the Rasch model with a maximum difference of 12.47 between the two AIC values.

In this article a method for evaluating the weight matrix in the linear logistic test model was proposed. The method is based on parallel analysis suggested by Horn (

PB developed the idea and wrote the theoretical part of the manuscript. CH performed the simulation studies.

The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

The Supplementary Material for this article can be found online at:

^{1}IRT models, in general, are not designed to reflect the cognitive processes which underlie the responses. These models are commonly based on mathematical or measurement properties such as sufficiency, separation, and additively and are not linked with the item response generation mechanisms (van der Mass et al.,

^{2}For the Likelihood-Ratio test formula, the typical notation in the LLTM literature is used here (see Fischer,

^{3}To perform our method we have written an R package that can be downloaded under