Edited by: Jason W. Osborne, Old Dominion University, USA
Reviewed by: Andrew Jones, American Board of Surgery, USA; Evgueni Borokhovski, Concordia University, Canada
*Correspondence: Ali Ünlü, Chair for Methods in Empirical Educational Research, TUM School of Education and Centre for International Student Assessment, Technische Universität München, Lothstrasse 17, 80335 Munich, Germany. e-mail:
This article was submitted to Frontiers in Quantitative Psychology and Measurement, a specialty of Frontiers in Psychology.
This is an open-access article distributed under the terms of the Creative Commons Attribution License, which permits use, distribution and reproduction in other forums, provided the original authors and source are credited and subject to any copyright notices concerning any third-party graphics etc.
A personal trait, for example a person’s cognitive ability, represents a theoretical concept postulated to explain behavior. Interesting constructs are latent, that is, they cannot be observed. Latent variable modeling constitutes a methodology to deal with hypothetical constructs. Constructs are modeled as random variables and become components of a statistical model. As random variables, they possess a probability distribution in the population of reference. In applications, this distribution is typically assumed to be the normal distribution. The normality assumption may be reasonable in many cases, but there are situations where it cannot be justified. For example, this is true for criterion-referenced tests or for background characteristics of students in large scale assessment studies. Nevertheless, the normal procedures in combination with the classical factor analytic methods are frequently pursued, despite the effects of violating this “implicit” assumption are not clear in general. In a simulation study, we investigate whether classical factor analytic approaches can be instrumental in estimating the factorial structure and properties of the population distribution of a latent personal trait from educational test data, when violations of classical assumptions as the aforementioned are present. The results indicate that having a latent non-normal distribution clearly affects the estimation of the distribution of the factor scores and properties thereof. Thus, when the population distribution of a personal trait is assumed to be non-symmetric, we recommend avoiding those factor analytic approaches for estimation of a person’s factor score, even though the number of extracted factors and the estimated loading matrix may not be strongly affected. An application to the Progress in International Reading Literacy Study (PIRLS) is given. Comments on possible implications for the Programme for International Student Assessment (PISA) complete the presentation.
Educational research is concerned with the study of processes of learning and teaching. Typically, the investigated processes are not observable, and to unveil these, manifest human behavior in test situations is recorded. According to Lienert and Raatz (
In this paper we deal with factor analytic methods for assessing construct validity of a test, in the sense of its factorial validity (e.g., Cronbach and Meehl,
A second objective of this paper is to examine the scope of these classical methods for estimating the probability distribution of latent ability values or properties thereof postulated in a population under investigation, especially when this distribution is skewed (and not normal). In applied educational contexts, for instance, that is not seldom the practice. Therefore a critical evaluation of this usage of classical factor analytic methods for estimating distributional properties of ability is important, as we do present with our simulation study in this paper, in which metric scale (i.e., at least interval scale; not dichotomous) items are used.
The results of the simulation study indicate that having a non-normal distribution for latent variables does not strongly affect the number of extracted factors and the estimation of the loading matrix. However, as shown in this paper, it clearly affects the estimation of the latent factor score distribution and properties thereof (e.g., skewness).
More precisely, the “estimation accuracy” for factorial structure of these models is shown to be worse when the assumption of interval-scaled data is not met or item statistics are skewed. This corroborates related findings published in other works, which we briefly review later in this paper. More importantly, the empirical distribution of estimated latent ability values is biased compared to the true distribution (i.e., estimates deviate from the true values) when population abilities are skewly distributed. It seems therefore that classical factor analytic procedures, even though they are performed with metric (instead of non-metric) scale indicator variables, are not appropriate approaches to ability estimation when skewly distributed population ability values are to be estimated.
Why should that be of interest? In large scale assessment studies such as the Programme for International Student Assessment (PISA)
The paper is structured as follows. We introduce the considered classical factor analysis models in Section
We consider the method of principal component analysis on the one hand, and the method of exploratory factor and principal axis analysis on the other. At this point recall Footnote 1, where we clarified that, strictly speaking, principal component analysis is not factor analysis and that principal axis analysis is a specific method for estimating the exploratory factor analysis model. Despite this, for the sake of simplicity and for our purposes and analyses, we call these approaches collectively factor analysis/analytic methods or even models. For a more detailed discussion of these methods, see Bartholomew et al. (
Our study shows, amongst others, that the purely computational dimensionality reduction method PCA performs surprisingly well, as compared to the results obtained based on the latent variable models EFA and PAA. This is important, because applied researchers often use PCA in situations where factor analysis more closely matches their purpose of analysis. In general, such computational procedures as PCA are easy to use. Moreover, the comparison of EFA (based on ML) with PAA (eigenstructure of the reduced correlation matrix based on communality estimates) in this paper represents an evaluation of different estimation procedures for the classical factor analysis model. This comparison of the two estimation procedures seems to be justified and interesting, as the (manifest) normality assumption in the observed indicators for the ML procedure is violated, both in the simulation study and empirical large scale assessment PIRLS application. At this point, see also Footnote 1.
The model of principal component analysis (PCA) is
In principal component analysis we assume that
The relevance of the assumption of interval-scaled variables for classical factor analytic approaches is the subject matter of various research works, which we briefly discuss later in this paper.
The model of exploratory factor analysis (EFA) is
In exploratory factor analysis, we assume that
Under this orthogonal factor model,
This decomposition is utilized by the methods of unweighted least squares (ULS), generalized least squares (GLS), or maximum likelihood (ML) for the estimation of
When applying this exploratory factor analysis,
Another possibility of estimation for the EFA model is principal axis analysis (PAA). The model of PAA is
The assumptions of principal axis analysis are
Two remarks are important before we discuss the assumptions associated with the classical factor models in the next section.
First, it can be shown that
Second, the criterion used to determine the number of factors extracted from the data must be distinguished as well. In practice, not all
The three models described in the previous section in particular assume interval-scaled data and full rank covariance or correlation matrices for the manifest variables. Typically in the exploratory factor analysis model, the manifest variables
The question now arises whether the assumptions are critical when it comes to educational tests or survey data?
From the perspective of applying these models to data of criterion-referenced tests, the last three of the above mentioned assumptions are less problematic. For a criterion-referenced test, it is important that all items of the test are valid for the investigated content. As such, the usual way of excluding items from the analysis when the covariance or correlation matrices are not of full rank does not work for criterion-referenced tests, because this can reduce content validity of a test. A similar argument applies to the assumption of substantially large variances of the manifest variables. As Klauer (
The assumption of interval-scaled data and the normality assumption for the manifest test and latent ability scores may also be crucial for the scaling of cognitive data in PISA (OECD,
The primary aim is to review results of previous studies focusing on the impact of violations of model assumptions. As to our knowledge, such studies did not systematically vary the distributions of the factors (in the case of continuous data as well) and primarily investigated the impact of categorical data (however, not varying the latent distributions for the factors). Reviewing results of previous simulation studies based on continuous indicator variables that have compared different estimation methods (including PCA) and have compared different methods for determining the number of factors, as to our knowledge, would have not constituted reviewing relevant literature focusing primarily on the violations of the assumptions associated with those models.
Literature on classical factor models has in particular investigated violations of the assumption of interval-scaled data. In classical factor analysis, Green (
Carroll (
Clarification for findings in Green (
Muthén (
We will add to and extend this literature and investigate in this paper whether the classical factor analysis models can reasonably unveil the factorial structure or properties of the population latent ability distribution in educational test data (e.g., obtained from criterion-referenced tests) when the assumption of normality in the latency may not be justified. None of the studies mentioned above has investigated the “true distribution impact” in these problems.
A simulation study is used to evaluate the performances of the classical factor analytic approaches when the latent variables are not normally distributed.
True factorial structures under the exploratory factor analysis model are simulated, that is, the values of
Note that in the simulation study metric scale, not dichotomous, items are analyzed. This can be viewed as a baseline informative for the dichotomous indicator case as well (cf. Section
The present simulation study particularly aims at analyzing and answering such questions as:
To what extent does the estimation accuracy for factorial structure of the classical factor analysis models depend on the skewness of the population latent ability distribution?
Are there specific aspects of the factorial structure or latent ability distribution with respect to which the classical factor analysis models are more or less robust in estimation when true ability values are skewed?
Given a skewed population ability distribution does the estimation accuracy for factorial structure of the classical factor analysis models depend on the extraction criterion applied for determining the number of factors from the data?
Can person ability scores estimated under classical factor analytic approaches be representative of the true ability distribution or properties thereof when this distribution is skewed?
Or equivalently,
Hence the univariate skewness
In the simulation study, the exploratory factor analysis model with orthogonal factors (cov(
Mattson’s method is used to specify such settings for the simulation study as they may be observed in large scale assessment data. The next section describes this in detail.
The number of manifest variables was fixed to
We decided to analyze released items of the PIRLS 2006 study (IEA,
We decided to simulate under three conditions for the distributions of ω. Under the first condition, ω
Latent variable |
||||||
---|---|---|---|---|---|---|
Normal |
Slightly skewed |
Strongly skewed |
||||
β_{2i} | β_{2i} | β_{2i} | ||||
0 | 3 | −0.060 | 3 | −0.599 | 4.202 | |
0 | 3 | −0.049 | 3 | −0.488 | 3.914 | |
0 | 3 | −0.038 | 3 | −0.377 | 3.649 | |
0 | 3 | −0.027 | 3 | −0.272 | 3.420 | |
0 | 3 | −0.018 | 3 | −0.179 | 3.240 | |
0 | 3 | −0.010 | 3 | −0.102 | 3.114 | |
0 | 3 | −0.049 | 3 | −0.488 | 3.914 | |
0 | 3 | −0.038 | 3 | −0.377 | 3.649 | |
0 | 3 | −0.027 | 3 | −0.272 | 3.420 | |
0 | 3 | −0.018 | 3 | −0.179 | 3.240 | |
0 | 3 | −0.010 | 3 | −0.102 | 3.114 | |
0 | 3 | −0.005 | 3 | −0.047 | 3.041 | |
0 | 3 | −0.027 | 3 | −0.272 | 3.420 | |
0 | 3 | −0.027 | 3 | −0.272 | 3.420 | |
0 | 3 | −0.018 | 3 | −0.179 | 3.240 | |
0 | 3 | −0.010 | 3 | −0.102 | 3.114 | |
0 | 3 | −0.010 | 3 | −0.102 | 3.114 | |
0 | 3 | −0.005 | 3 | −0.047 | 3.041 | |
0 | 3 | −0.027 | 3 | −0.272 | 3.420 | |
0 | 3 | −0.018 | 3 | −0.179 | 3.240 | |
0 | 3 | −0.018 | 3 | −0.179 | 3.240 | |
0 | 3 | −0.010 | 3 | −0.102 | 3.114 | |
0 | 3 | −0.005 | 3 | −0.047 | 3.041 | |
0 | 3 | −0.005 | 3 | −0.047 | 3.041 |
Latent variable |
||||||
---|---|---|---|---|---|---|
Normal |
Slightly skewed |
Strongly skewed |
||||
β_{2i} | β_{2i} | β_{2i} | ||||
0 | 3 | −0.060 | 3 | −0.599 | 4.202 | |
0 | 3 | −0.049 | 3 | −0.488 | 3.914 | |
0 | 3 | −0.038 | 3 | −0.377 | 3.649 | |
0 | 3 | −0.049 | 3 | −0.488 | 3.914 | |
0 | 3 | −0.049 | 3 | −0.488 | 3.914 | |
0 | 3 | −0.038 | 3 | −0.377 | 3.649 | |
0 | 3 | −0.049 | 3 | −0.488 | 3.914 | |
0 | 3 | −0.038 | 3 | −0.377 | 3.649 | |
0 | 3 | −0.027 | 3 | −0.272 | 3.420 | |
0 | 3 | −0.038 | 3 | −0.377 | 3.649 | |
0 | 3 | −0.038 | 3 | −0.377 | 3.649 | |
0 | 3 | −0.038 | 3 | −0.377 | 3.649 | |
0 | 3 | −0.038 | 3 | −0.377 | 3.649 | |
0 | 3 | −0.027 | 3 | −0.272 | 3.420 | |
0 | 3 | −0.027 | 3 | −0.272 | 3.420 | |
0 | 3 | −0.027 | 3 | −0.272 | 3.420 | |
0 | 3 | −0.027 | 3 | −0.272 | 3.420 | |
0 | 3 | −0.018 | 3 | −0.179 | 3.240 | |
0 | 3 | −0.018 | 3 | −0.179 | 3.240 | |
0 | 3 | −0.010 | 3 | −0.102 | 3.114 | |
0 | 3 | −0.010 | 3 | −0.102 | 3.114 | |
0 | 3 | −0.010 | 3 | −0.102 | 3.114 | |
0 | 3 | −0.010 | 3 | −0.102 | 3.114 | |
0 | 3 | −0.005 | 3 | −0.047 | 3.041 |
Under the slightly skewed distribution condition, the theoretical values of skewness for the manifest variables range between −0.060 and −0.005, a condition that captured approximately 20% of the considered PIRLS test items. Under the strongly skewed distribution condition, the theoretical values of skewness lie between −0.599 and −0.047, a condition that covered circa 30% of the PIRLS items (cf. Figure
How to generate variates ω
Besides the number of factors and the distributions of the latent variables, sample size was varied. In the small sample case, every
Sample size | Number of factors | Latent variable distribution |
||
---|---|---|---|---|
Normal | Slightly skewed | Strongly skewed | ||
200 | 4 | 100 | 100 | 100 |
8 | 100 | 100 | 100 | |
600 | 4 | 100 | 100 | 100 |
8 | 100 | 100 | 100 |
Each of the generated 1,200 data sets were analyzed using all of the models of principal component analysis, exploratory factor analysis (ML estimation), and principal axis analysis altogether with a varimax rotation (Kaiser,
The criteria for evaluating the performance of the classical factor models are the number of extracted factors (as compared to true dimensionality), the skewness of the estimated latent ability distribution, and the discrepancy between the estimated and the true loading matrix. The latter two criteria are computed using the true number of factors. Furthermore, Shapiro-Wilk tests for assessing normality of the ability estimates are presented and distributions of the estimated and true factor scores are compared.
For the skewness criterion, under a factor model and a simulation condition, for any data set the factor scores on a factor were computed and their empirical skewness was the value for this data set that was used and plotted. For the discrepancy criterion, under a factor model and a simulation condition, for any data set
In addition to calculating estimated factor score skewness values, we also tested for univariate normality of the estimated factor scores. We used the Shapiro-Wilk test statistic
We present the results of our simulation study.
Figure
When sample size is increased to
Figure
Increasing sample size from
To sum up, we suppose that the “number of factors extracted” is relatively robust against the extent the latent ability values may be skewed. Another observation is that the parallel analysis method seems to outperform the scree test and the Kaiser-Guttman criterion when it comes to detecting the number of underlying factors.
Figure
When true latent ability values are slightly negative skewed,
If true latent ability values are strongly negative skewed,
To sum up, under the classical factor models, the concept of “skewness of the estimated latent ability distribution” seems to be sensitive with respect to the extent the latent ability values may be skewed. It seems that, the more the true latent ability values are skewed, the greater is overestimation of true skewness. In other words, strongly negative skewed distributions may not be estimated without bias based on the classical factor models. Increasing sample size, for example from
We performed Shapiro-Wilk tests for univariate normality of the estimated factor scores. As can be seen from Figure
A similar conclusion can be drawn when the true latent ability values are not normally distributed but instead follow a slightly skewed distribution (Figure
The case of a strongly skewed factor score distribution is depicted in Figure
Finally, Figure
In Table
Model | Latent variable distribution |
|||||||
---|---|---|---|---|---|---|---|---|
Normal |
Slightly skewed |
Strongly skewed |
||||||
200 | 4 | PCA^{a} | 0.143 | 0.156 | 0.143 | 0.156 | 0.158 | 0.173 |
EFA^{b} | 0.129 | 0.142 | 0.124 | 0.136 | 0.141 | 0.154 | ||
PAA^{c} | 0.128 | 0.139 | 0.124 | 0.136 | 0.137 | 0.150 | ||
600 | 4 | PCA | 0.076 | 0.087 | 0.075 | 0.086 | 0.091 | 0.106 |
EFA | 0.063 | 0.072 | 0.062 | 0.072 | 0.080 | 0.095 | ||
PAA | 0.066 | 0.075 | 0.064 | 0.074 | 0.082 | 0.096 | ||
200 | 8 | PCA | 0.165 | 0.169 | 0.162 | 0.166 | 0.172 | 0.176 |
EFA | 0.154 | 0.157 | 0.152 | 0.155 | 0.156 | 0.159 | ||
PAA | 0.135 | 0.138 | 0.134 | 0.138 | 0.143 | 0.146 | ||
600 | 8 | PCA | 0.119 | 0.123 | 0.118 | 0.123 | 0.125 | 0.130 |
EFA | 0.106 | 0.112 | 0.107 | 0.112 | 0.115 | 0.120 | ||
PAA | 0.097 | 0.101 | 0.095 | 0.099 | 0.102 | 0.105 |
Deviations of the estimated loading matrix from the true loading matrix can also be quantified and visualized at the level of individual absolute differences
The majority of the absolute differences lies in the range from 0 to circa 0.20. Larger absolute differences between the estimated and true factor loadings occurred rather rarely. It is also apparent that the 36 distributions hardly differ. This observation suggests that the effects or impacts of sample size, true number of factors, and the latent ability distribution on the accuracy of the classical factor models for estimating the factor loadings are rather weak. In that sense, estimation of the loading matrix seems to be robust overall. In our simulation study, we were not able to see a clear relationship between the distribution of the latent ability values and the discrepancy between the estimated and the true loading matrix.
In addition to the simulation study, the classical factor analytic approaches are also compared on the part of PIRLS 2006 data that we presented in Section
Note that in the empirical application dichotomized multi-category items are analyzed. In practice, large scale assessment data are discrete and not continuous. Yet, the metric scale indicator case considered in the simulation study can serve as an informative baseline; for instance (issue of polychoric approximation) to the extent that a product-moment correlation is a valid representation of bivariate relationships among interval-scaled variables (e.g., Flora et al.,
In PIRLS 2006, four sorts of items were constructed and used for assigning “plausible values” to students (for details, see Martin et al.,
A total of
Extraction method | Factor model |
||
---|---|---|---|
PCA^{a} | EFA^{b} | PAA^{c} | |
Kaiser–Guttman criterion | 6 | 6 | 6 |
Scree test | 1 | 1 | 1 |
Parallel analysis method | 1 | 4 | 4 |
The situation at this point is comparable to what we have reported in simulation in Figure
The varimax rotated loading matrices for the exploratory factor analysis and principal axis analysis models with four factors are reported in Tables
Item | Factor |
|||
---|---|---|---|---|
1 | 2 | 3 | 4 | |
R011A01C |
0.15 | 0.26 | −0.05 | |
R011A02M |
0.14 | 0.28 | 0.19 | |
R011A03C |
0.16 | 0.24 | 0.09 | 0.03 |
R011A04C |
0.19 | 0.10 | 0.06 | |
R011A05M |
0.22 | 0.08 | 0.19 | 0.21 |
R011A06M |
0.20 | 0.03 | 0.14 | 0.06 |
R011A07C |
0.20 | 0.22 | 0.15 | |
R011A08C |
0.04 | −0.09 | ||
R011A09C |
0.18 | 0.11 | 0.00 | |
R011A10M |
0.28 | 0.27 | 0.22 | 0.11 |
R011A11C |
0.06 | 0.14 | 0.02 | |
R021E01M |
0.08 | 0.19 | −0.06 | |
R021E02M |
0.02 | 0.09 | 0.24 | |
R021E03M |
0.14 | −0.02 | 0.02 | |
R021E04M |
0.17 | 0.28 | 0.15 | 0.02 |
R021E05C |
0.22 | 0.23 | 0.12 | |
R021E06M |
0.17 | 0.09 | 0.28 | |
R021E07C |
0.13 | 0.06 | 0.22 | |
R021E08M |
0.23 | 0.04 | ||
R021E09C |
0.24 | 0.02 | 0.20 | |
R021E10C |
0.27 | 0.23 | 0.17 | 0.07 |
R021E11M |
0.00 | 0.01 | 0.06 | |
R021E12C |
0.17 | 0.22 |
Item | Factor |
|||
---|---|---|---|---|
1 | 2 | 3 | 4 | |
R011A01C |
0.15 | 0.26 | −0.06 | |
R011A02M |
0.14 | 0.29 | 0.18 | |
R011A03C |
0.16 | 0.24 | 0.09 | 0.02 |
R011A04C |
0.20 | 0.10 | 0.06 | |
R011A05M |
0.22 | 0.07 | 0.19 | 0.24 |
R011A06M |
0.19 | 0.02 | 0.14 | 0.07 |
R011A07C |
0.20 | 0.22 | 0.16 | |
R011A08C |
0.03 | −0.08 | ||
R011A09C |
0.19 | 0.12 | 0.00 | |
R011A10M |
0.28 | 0.27 | 0.22 | 0.11 |
R011A11C |
0.07 | 0.13 | 0.02 | |
R021E01M |
0.07 | 0.19 | −0.06 | |
R021E02M |
0.03 | 0.09 | 0.24 | |
R021E03M |
0.14 | −0.02 | 0.02 | |
R021E04M |
0.17 | 0.26 | 0.16 | 0.04 |
R021E05C |
0.21 | 0.23 | 0.12 | |
R021E06M |
0.17 | 0.08 | 0.27 | |
R021E07C |
0.13 | 0.06 | 0.23 | |
R021E08M |
0.24 | 0.05 | ||
R021E09C |
0.24 | 0.02 | 0.19 | |
R021E10C |
0.27 | 0.24 | 0.17 | 0.06 |
R021E11M |
0.00 | 0.02 | 0.05 | |
R021E12C |
0.17 | 0.22 |
Assessing construct validity of a test in the sense of its factorial structure is important. For example, we have addressed possible implications for the analysis of criterion-referenced tests or for such large scale assessment studies as the PISA or PIRLS. There are a number of latent variable models that may be used to analyze the factorial structure of a test. This paper has focused on the following classical factor analytic approaches: principal component analysis, exploratory factor analysis, and principal axis analysis. We have investigated how accurately the factorial structure of test data can be estimated with these approaches, when assumptions associated with the procedures are not satisfied. We have examined the scope of those methods for estimating properties of the population latent ability distribution, especially when that distribution is slightly or strongly skewed (and not normal).
The estimation accuracy of the classical factor analytic approaches has been investigated in a simulation study. The study has in particular shown that the estimation of the true number of factors and of the underlying factor loadings seems to be relatively robust against a skewed population ability or factor score distribution (see Sections
A primary aim of our work is to develop some basic understanding for how and to what extent the results of classical factor analyses (in the present paper, PCA, EFA, and PAA) may be affected by a non-normal latent factor score distribution. This has to be distinguished from non-normality in the manifest variables, which has been largely studied in the literature on the factor analysis of items (cf. Section
We have discussed possible implications of the findings for criterion-referenced tests and large scale educational assessment. The assumptions of the classical factor models have been seen to be crucial in these application fields. We suggest, for instance, that the presented classical procedures should not be used, unless with special caution if at all, to examine the factorial structure of dichotomously scored criterion-referenced tests. Instead, if model violations of the “sensitive” type are present, better suited or more sophisticated latent variable models can be used (see Skrondal and Rabe-Hesketh,
As with factor analysis a general problem (e.g., Maraun,
The results of this paper provide implications for popular research practices in the empirical educational research field. The methods that we have utilized are traditional and often applied in practice (e.g., by educational scientists), for instance to determine the factorial validity of criterion-referenced tests or to study large scale assessment measurement instruments. In addition, to consider other, more sophisticated fit statistics can be interesting and valuable. For example, such model fit statistics as the root mean square residual, comparative fit index, or the root mean squared error of approximation may be investigated. Albeit these fit statistics are well-known and applied in the confirmatory factor analysis (CFA) context, they could be produced for exploratory factor analysis (given that CFA and EFA are based on the same common factor model).
We conclude with important research questions related to the PISA study. In the context of PISA, principal component analysis is used, in the purely computational sense. Other distributional, inferential, or confirmatory factor models, especially those for the verification of the factorial validity of the PISA context questionnaires, have not been considered. Interesting questions arise: are there other approaches to dimensionality reduction that can perform at least as well as the principal component analysis method in PISA data (e.g., multidimensional scaling; Borg and Groenen,
The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.
The authors wish to thank Sabine Felbinger for her critical reading and helpful comments. In particular, we are deeply indebted to Jason W. Osborne, Chief Editor, and four reviewers. Their critical and valuable comments and suggestions have improved the manuscript greatly.
^{†}The research reported in this paper is based on the dissertation thesis by Kasper (
^{1}For the sake of simplicity and for the purpose and analysis of this paper, we want to refer to all of these approaches (PCA, EFA, PAA) collectively as classical factor analysis/analytic methods. Albeit it is known that PCA differs from factor analysis in important aspects, and that PAA rather represents an alternative estimation procedure for EFA. PCA and EFA are different technically and conceptually. PCA seeks to create composite scores of observed variables while EFA assumes latent variables. There is no latent variable in PCA. PCA is not a model and instead is simply a re-expression of variables based on the eigenstructure of their correlation matrix. A statistical model, as is for EFA, is a simplification of observed data that necessarily does not perfectly reproduce the data, leading to the inclusion of an error term. This point is well-established in the methodological literature (e.g., Velicer and Jackson,
^{2}PISA is an international large scale assessment study funded by the Organisation for Economic Co-operation and Development (OECD), which aims to evaluate education systems worldwide by assessing 15-year-old students’ competencies in reading, mathematics, and science. For comprehensive and detailed information, see
^{3}For the sake of simplicity and without ambiguity, in this paper we want to refer to component scores from PCA as “factor scores” or “ability values,” albeit components conceptually may not be viewed as latent variables or factors. See also Footnote 1.
^{4}Note that, at the latent level, there is no formal assumption that the latent factors (what we synonymously also want to call “person abilities”) are normally distributed. At the manifest level, maximum likelihood estimation (EFA) assumes that the observed variables are normal; ULS and GLS (EFA), PAA (EFA), and PCA do not. The latter two methods only require a non-singular correlation matrix (e.g., see MacCallum,
^{5}Obviously, PCA as introduced in this paper cannot be used as a data generating probability model underlying the population. However, the simulation study shows that PCA results can approximate a factor analysis (cf. also Footnote 1).
^{6}All figures of this paper were produced using the R statistical computing environment (R Development Core Team,
^{7}For the factor analyses in this paper, we used the SAS program and its PROC FACTOR implementation of the methods PCA, EFA, and PAA. More precisely, variation of the PROC FACTOR statements, run in their default settings, yields the performed procedures PCA, EFA, and PAA (e.g., EFA if METHOD = ML).
^{8}Because of rotational indeterminacy in the factor analysis approach (e.g., Maraun,
^{9}The Kaiser-Guttman criterion is a poor way to determine the number of factors. However, due to the fact that none of the existing studies has investigated the estimation accuracy of this criterion when the latent ability distribution is skewed, we have decided to include the Kaiser-Guttman criterion in our study. This criterion may also be viewed as a “worst performing” baseline criterion, which other extraction methods need to outperform, as best as possible.