^{*}

Edited by: Erik D. Thiessen, Carnegie Mellon University, USA

Reviewed by: Konstantin G. Arbeev, Duke University, USA

*Correspondence: Julia Moeller,

This article was submitted to Developmental Psychology, a section of the journal Frontiers in Psychology

This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) or licensor are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.

“

This article discusses the risks of standardization and ipsatization in longitudinal studies. First, it summarizes some common purposes of standardization in psychological studies. Second, it explains why and when standardization and ipsatization are problematic in the analysis of longitudinal data and profiles. Third, it shows alternative ways to achieve similar purposes while avoiding the risks.

Z-standardization and ipsatization are procedures to transform absolute values, or ratings (e.g., 1 =

Standardization and ipsatization are applied for the following purposes:

Standardization is used to bring variables with different response scales (e.g., a scale from 1 =

Z-standardized scores are displayed in graphs to accentuate the mean-level differences between groups or profiles of observations.

Ipsatization is used to account for uniform response biases, such as acquiescence (=tendency to affirm all items). For instance, in cross-cultural comparisons, items are often ipsatized to account for culture-specific response biases (Tweed and DeLongis,

While standardization and ipsatization are easy and widely accepted, there are many constellations in which these procedures are not useful or misleading. For cross-sectional studies, these issues have long been discussed (e.g., Fischer and Milfont,

Standardizing repeated measures within individuals impedes examining mean-level differences between individuals, because each individual's mean score becomes zero. The standardized means don't inform whether the individuals differed in their original experiences.

Standardization across individuals within measurement time points impedes examining mean level changes from one time point to another, because all means at all time points become zero, whereas the raw-score means might have shown a decrease in the measured variable, such as interest (see e.g., Denissen et al.,

Standardization across individuals across time points obfuscates the information about the relative rank of an individual at given time points, and impedes disentangling rank-order and mean-level stability. For instance, Anna might have had relatively high interest in grade one and grade three, compared to others at the same time point. However, since interest often decreases with time, Anna's absolute interest was much lower in grade 3 than grade 1, as was everybody else's. With standardization across time points and individuals, the information about the time-point-specific relative rank-order gets mixed with the mean-level change, and it will look like Anna had high interest at time one but somewhat low or medium interest at time two.

Standardization across individuals within age groups/cohorts impedes studying age differences at given time points. For instance, in a study that examined three cohorts (6th, 8th, and 10th grade) in 3 years (1992, 1995, 1997; see Csikszentmihalyi and Schneider,

Misinterpretation of differences between profiles and groups is likely when z-standardized scores are used to compare these profiles, particularly if the variables differed in their means and variances prior to the transformation. Two problems complicate interpreting group differences based on z-scores: First, the z-scores represent ranks in relation to other individuals, but not the degree to which an item was affirmed by a given individual. If an item had a low sample mean score, then a “high” z-score above 0 (above the sample mean) can represent a “rather not” statement below the midpoint of the original response scale (see Moeller et al.,

Standardization across individuals should not be done with ipsatized scores, because that entangles the intra-individual frame of reference (ipsatization) and the inter-individual frame (standardization) and is hard to interpret.

Ipsatization changes the covariance matrix in a way that makes the data unsuitable for correlational techniques like exploratory and confirmatory factor analysis, structural equation modeling, and multivariate techniques like multiple regression and multivariate analysis of variance (Cornwell and Dunlap,

Due to the complexity of longitudinal data and analyses, the above-described problems often co-occur. For instance, standardizing situation-specific repeated measures across individuals increases the risk of misinterpreting mean differences of z-scores between situation-level profiles of state measures, because the z-standardized situation-specific measures are at the same time determined by the intra-individual distribution of these variables (see problem no. 5), and the inter-individual distribution of these variables (see problems 2–5). This makes it almost impossible to interpret whether a relatively high rank (z-score) represents a variable that was rated as “high” on the original response scale by a specific person in a specific situation. For an example of intertwined standardization problems, see Denissen et al. (

For bringing differently measured items to the same metric, there are several easy alternative monotonous scale transformations available, which, unlike standardization, do not change the multivariate distribution and covariance matrix of the transformed variables. One solution is the proportion of maximum scaling (“POMS”) method (Little,

For instance, for a scale that originally ranged from 1 to 7, first the value 1 is subtracted from each observation to make the scale go from 0 to 6, and then each score is divided by 6 to make the scale go from 0 to 1. Contrary to standardization, this maintains the proportions of the absolute distances between the observed response options.

Another possibility is the percent of maximum possible (“POMP”) method (Cohen et al.,

For examining mean-level differences between profiles and groups, raw scores or scales transformed with the POMS or POMP method can be used. This has the advantages that the scores reflect the individual's degree of affirmation/rejection of the items, and that group differences are displayed in the correct proportions. For a discussion of further advantages and alternative transformations, see Little (

To account for uniform response bias such as acquiescence, a common-method factor can be modeled in structural equation models (Billiet and McClendon,

With both ipsatization and method factors, it remains difficult to disentangle biased response styles from genuine experiences. For instance, some individuals really are interested in a broad variety of topics (=affirm all interest items) and do not show are clear interest profile with high interest in some and low interest in other topics (Rounds and Tracey,

Z-standardization is a widely used procedure, applied for getting rid of acquiescence and other response biases, bringing variables of different metrics to the same metric, and emphasizing differences between groups in graphs.

In longitudinal data and analyses of subgroups of observations, z-standardization leads to a number of problems. It changes in often undesirable ways the distances between observations, and the multivariate distributions of cross-sectional and longitudinal data. The psychological literature is rich in examples of misinterpreted z-scores, some of which were described in this article. While many pitfalls are known for cross-sectional studies, longitudinal studies add further problems, due to confounded frames of reference (the original response scale, the intra-individual distribution, the inter-individual distribution within given time points, the inter-individual distribution across different time points, the variation within vs. between cohorts, and any combinations of these). Generally, it is not insightful to first standardize variables within units (individuals, cohorts, states, organizations) and then compare mean scores across these units that gave the reference frame for standardization. This should be trivial, but can often be observed in the current research, and is easily overseen or mishandled the more units and reference frames are added to the data structure.

Modeling common-method factors is a useful alternative to account for response biases while avoiding the downsides of ipsatization. Alternative easy monotonous scale transformations are available to get items with different response scales to the same metric (Cohen et al.,

The author declares that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

I thank Katariina Salmela-Aro for her support, and Jacquelynne S. Eccles, Anna-Lena Dicke, Melanie Keiner, and Julia Dietrich for suggestions and encouragement and Arielle White for proof-reading. This work was supported by a grant from the Jacobs Foundation, through the post-doc program “Pathways to Adulthood”. This article was written while the author worked at the University of Helsinki, and was revised and resubmitted after the author's change to Yale University.