^{1}

^{2}

^{3}

^{4}

^{3}

^{5}

^{6}

^{3}

^{3}

^{7}

^{2}

^{1}

^{2}

^{3}

^{4}

^{5}

^{6}

^{7}

Edited by: Renerio Fraguas, University of São Paulo, Brazil

Reviewed by: Gianluca Serafini, University of Genoa, Italy; Nefize Yalin, King’s College London, United Kingdom

Specialty section: This article was submitted to Mood and Anxiety Disorders, a section of the journal Frontiers in Psychiatry

This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) or licensor are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.

Recent studies have shown that item responses on the Center for Epidemiologic Studies Depression Scale (CES-D) and Kessler Screening Scale for Psychological Distress (K6) exhibit the same characteristic item response patterns among the general population. However, the distributional patterns of responses on the Patient Health Questionnaire-8 (PHQ-8) among the general population have not been adequately studied. Thus, we conducted a pattern analysis of PHQ-8 item responses among US adults. Data (18,446 individuals) were obtained from the 2015 Behavioral Risk Factor Surveillance Survey (BRFSS). Item responses on the BRFSS version of the PHQ-8 were scored using the number of days response set and then converted to the original 4-point scale. The patterns of item responses were analyzed through graphical analysis. Lines of item responses scored using the number of days response set showed the same pattern among the eight items, characterized by crossing at a single point between “0 days” and “1 day,” and parallel fluctuation from “1 day” to “14 days” on a semi-logarithmic scale. Lines of item responses converted to the 4-point scale also showed the same characteristic pattern among the eight items. The present results demonstrate that the item responses on the PHQ-8 show the same characteristic patterns among items, consistent with the CES-D and the K6.

Major depression is a common but serious psychological disorder that affects more than 300 million people around the world (

Epidemiological studies of depressive symptoms have been conducted intensively using a variety of depression screening scales (

In general, large sample sizes allow researchers to better identify the mathematical pattern of a sample distribution. Analyzing data from about 32,000 respondents in a national survey of the Japanese population, we first observed that responses to the Center for Epidemiologic Studies Depression Scale (CES-D) exhibited a common mathematical pattern among the 16 depressive symptom items (Figure

Item responses of the Center for Epidemiologic Studies Depression Scale. Item responses for the 16 items are presented using a normal scale

To confirm the reproducibility of such findings for another depression screening scale, we investigated the item responses on the Kessler Screening Scale for Psychological Distress (K6) in representative US studies. Although the K6 is a broad measure of psychological distress, the K6 has been used as a screening tool for depression (

The Patient Health Questionnaire-9 (PHQ-9) is one of the most commonly used measures for depression screening worldwide (

The original PHQ-8 allows individuals to self-rate the frequency of various depressive symptoms over the past 2 weeks using a 4-point verbal scale: “not at all,” “several days,” “more than half the days,” and “nearly every day.” However, on the BRFSS version of the PHQ-8, respondents were asked to self-rate the number of days of each depressive symptom during the past 14 days; then, this number was converted to the original 4-point scale of the PHQ-8 (

For the present study, we elucidated the characteristics of item responses on the PHQ-8 using the number of days response set and determined whether they exhibited similar patterns across all items. After confirming that item responses using the number of days response set followed the characteristic pattern, we analyzed the pattern of item responses converted to the original 4-point response set.

Data were obtained from the 2015 BRFSS (

The BRFSS questionnaire comprises three parts: (1) a standard set of questions asked in all 50 states and the District of Columbia, consisting of queries about health-related perceptions, conditions, and behaviors; (2) optional modules, which are sets of questions on specific topics (e.g., anxiety and depression, excess sun exposure, cancer survivorship); and (3) state-added questions. The demographic variables of the BRFSS questionnaire include age, sex, education, marital status, employment, income, and race/ethnicity. All BRFSS data are available on the website (

In 2015, four states conducted the optional Anxiety and Depression Module (ADM): Mississippi, North Dakota, Tennessee, and West Virginia. Therefore, the analyses in this study are limited to data from those four states. The ADM includes the PHQ-8. The total 2015 ADM sample comprised 22,943 respondents, including 6,035, 4,972, 5,979, and 5,957 respondents for Mississippi, North Dakota, Tennessee, and West Virginia, respectively. The response rates for Mississippi, North Dakota, Tennessee, and West Virginia were 49.9, 58.9, 38.6, and 48.9%, respectively. Detailed descriptions of the sociodemographic characteristics of the BRFSS respondents are reported elsewhere (

Our institutional review board does not consider the analysis of publicly available data as research involving human subjects. Since this study used a de-identified, publicly available data set, institutional review board approval was not required.

As noted in the Section “

First, we analyzed the distributions of item responses using the number of days response set. The pattern of item responses for the eight items was visualized with histograms (normal scales and a semi-logarithmic scale). Mathematically, if the ratios between two consecutive response options are the same among all items, all lines for the item responses will exhibit a parallel pattern on a semi-logarithmic scale (

After confirming that the item responses using the number of days response set followed the same pattern among the eight items, consistent with previous studies, we analyzed the distributions of item responses converted to the original 4-point response set. Analyses were conducted using JMP version 11 for Windows (SAS Institute, Inc., Cary, NC, USA).

Of the 22,943 respondents, those who did not report the number of days during the past 14 days for all eight items (

Table

Item responses scored by the number of days response set.

Item | Number of days, % |
||||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|

0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | 10 | 11 | 12 | 13 | 14 | |

Loss of interest | 70.0 | 4.3 | 7.0 | 3.2 | 2.2 | 2.2 | 0.7 | 2.2 | 0.5 | 0.2 | 1.7 | 0.0 | 0.4 | 0.1 | 5.2 |

Feeling depressed | 75.1 | 4.6 | 5.6 | 2.6 | 1.5 | 1.8 | 0.4 | 1.9 | 0.5 | 0.1 | 1.2 | 0.0 | 0.3 | 0.0 | 4.4 |

Sleep problems | 54.4 | 3.9 | 7.0 | 4.7 | 3.4 | 3.1 | 1.3 | 2.9 | 0.9 | 0.2 | 2.4 | 0.1 | 0.7 | 0.2 | 15.0 |

Loss of energy | 40.1 | 6.1 | 12.1 | 6.5 | 4.3 | 4.5 | 1.4 | 3.7 | 1.0 | 0.2 | 2.9 | 0.1 | 0.7 | 0.2 | 16.2 |

Appetite problems | 65.4 | 4.2 | 6.5 | 4.1 | 2.5 | 2.8 | 0.9 | 2.4 | 0.5 | 0.1 | 1.7 | 0.0 | 0.3 | 0.1 | 8.5 |

Self-blame | 83.3 | 2.9 | 3.2 | 1.5 | 0.9 | 1.1 | 0.3 | 1.0 | 0.4 | 0.0 | 0.8 | 0.0 | 0.2 | 0.0 | 4.4 |

Concentration problems | 85.6 | 1.5 | 2.8 | 1.3 | 1.1 | 1.1 | 0.3 | 1.2 | 0.2 | 0.1 | 0.6 | 0.0 | 0.2 | 0.0 | 3.8 |

Agitation/retardation | 89.1 | 1.2 | 1.9 | 1.3 | 0.8 | 0.9 | 0.3 | 0.9 | 0.2 | 0.1 | 0.5 | 0.0 | 0.1 | 0.0 | 2.8 |

To assess the pattern of item responses using the number of days, all eight item response frequencies were plotted on the same scale (Figure

Item responses scored using the number of days response set from 0 to 14 days.

To further examine the patterns of item responses for the eight items, a graph was constructed for 1–14 days (Figure

Item responses scored using the number of days response set from 1 to 14 days.

Using a semi-logarithmic scale, lines of the item responses showed parallel fluctuation from 1 to 14 days (Figure

Ratios of frequencies between adjacent numbers of days.

Item | Rate of 2–1 day | Rate of 3–2 days | Rate of 4–3 days | Rate of 5–4 days | Rate of 6–5 days | Rate of 7–6 days | Rate of 8–7 days | Rate of 9–8 days | Rate of 10–9 days | Rate of 11–10 days | Rate of 12–11 days | Rate of 13–12 days | Rate of 14–13 days |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|

Loss of interest | 1.6 | 0.5 | 0.7 | 1.0 | 0.3 | 3.3 | 0.2 | 0.3 | 10.9 | 0.02 | 13.8 | 0.2 | 60.1 |

Feeling depressed | 1.2 | 0.5 | 0.6 | 1.2 | 0.2 | 4.5 | 0.2 | 0.1 | 22.4 | 0.01 | 17.7 | 0.1 | 116.6 |

Sleep problems | 1.8 | 0.7 | 0.7 | 0.9 | 0.4 | 2.2 | 0.3 | 0.2 | 14.6 | 0.03 | 11.3 | 0.4 | 60.1 |

Loss of energy | 2.0 | 0.5 | 0.7 | 1.0 | 0.3 | 2.6 | 0.3 | 0.2 | 15.9 | 0.02 | 13.2 | 0.3 | 90.3 |

Appetite problems | 1.5 | 0.6 | 0.6 | 1.1 | 0.3 | 2.6 | 0.2 | 0.2 | 20.7 | 0.01 | 11.8 | 0.2 | 143.2 |

Self-blame | 1.1 | 0.5 | 0.6 | 1.2 | 0.3 | 3.0 | 0.3 | 0.1 | 24.8 | 0.03 | 7.5 | 0.2 | 134.5 |

Concentration problems | 1.9 | 0.5 | 0.8 | 1.1 | 0.3 | 3.6 | 0.2 | 0.3 | 9.7 | 0.02 | 17.5 | 0.2 | 88.0 |

Agitation/retardation | 1.6 | 0.7 | 0.6 | 1.2 | 0.3 | 3.1 | 0.2 | 0.3 | 8.6 | 0.02 | 12.0 | 0.3 | 85.0 |

Average | 1.6 | 0.6 | 0.7 | 1.1 | 0.3 | 2.8 | 0.3 | 0.2 | 14.9 | 0.02 | 12.6 | 0.3 | 83.6 |

SD | 0.3 | 0.1 | 0.1 | 0.1 | 0.1 | 0.7 | 0.1 | 0.1 | 6.1 | 0.01 | 3.3 | 0.1 | 31.4 |

Table

Item responses converted to the 4-point response set.

Item | Item response, % |
Rate of “more than half the days” to “several days” | Rate of “nearly every day” to “more than half the days” | |||
---|---|---|---|---|---|---|

Not at all (0–1 days) | Several days (2–6 days) | More than half the days (7–11 days) | Nearly every day (12–14 days) | |||

Loss of interest | 74.2 | 15.3 | 4.7 | 5.8 | 0.31 | 1.23 |

Feeling depressed | 79.7 | 12.0 | 3.6 | 4.7 | 0.30 | 1.32 |

Sleep problems | 58.2 | 19.5 | 6.4 | 15.9 | 0.33 | 2.50 |

Loss of energy | 46.2 | 28.9 | 7.9 | 17.0 | 0.27 | 2.16 |

Appetite problems | 69.6 | 16.9 | 4.7 | 8.9 | 0.28 | 1.88 |

Self-blame | 86.2 | 7.0 | 2.2 | 4.6 | 0.32 | 2.03 |

Concentration problems | 87.1 | 6.7 | 2.2 | 4.0 | 0.32 | 1.87 |

Agitation/retardation | 90.3 | 5.2 | 1.6 | 2.9 | 0.31 | 1.82 |

Average | 73.9 | 13.9 | 4.2 | 8.0 | 0.30 ± 0.02 | 1.85 ± 0.42 |

To identify the patterns of item responses, all eight item response rates were plotted on the same scale (Figure

Item responses converted to the original 4-point response set. Item responses for the eight items of depressive symptoms exhibited a common mathematical pattern among the eight items on a normal scale

Using a semi-logarithmic scale, the lines of the item responses showed a parallel V-shaped pattern from “several days” to “nearly every day” (Figure

This study’s main finding is that the item responses on the PHQ-8 using the number of days response set showed a common pattern among the eight items. The pattern was characterized by the lines crossing at a single point between “0 days” and “1 day,” and parallel fluctuation from “1 day” to “14 days,” on a semi-logarithmic scale. Although the pattern from “1 day” to “14 days” on the semi-logarithmic scale was complicated, the fluctuation pattern appeared to be a series of parallel V-shaped patterns. This is consistent with the item responses using the 4-point response set—as well as the results of other studies using the CES-D and K6 (

As noted in the Section “

Item responses using the number of days response set showed a similar pattern of peaks and valleys among the eight items. Peaks at 5, 7, 10, and 12 days, and adjacent valleys, suggest that some of the peaks and valleys were subject to end-digit preference bias (

Although the PHQ-8, CES-D, and K6 differ in terms of item content and response set (

In line with previous studies, the present results do not indicate that the latent variable of depressive symptom scores follows a normal distribution. However, normality-assuming statistics (e.g., Pearson correlation coefficient) are widely used in population studies of depressive symptoms (

This research has some limitations. First, while we investigated the pattern of item responses using graphical analysis, we could not quantify the degree of similarity of item responses. Since the pattern of item responses was complicated (especially item responses using the number of days response set), it was difficult to apply unitary regression analysis to the item responses. In general, the quantification of similarity is more difficult with more complex patterns. In the case of a simple and unitary pattern, we can use existing distribution models (normal, linear, exponential, etc.) and easily calculate the goodness of fit using unitary regression analysis. Conversely, since the pattern of item responses is a collection of a number of sub-patterns, we must assess the similarities in all parts of the sub-patterns and consider the results of the similarities of all parts together. To our knowledge, there is no standard statistical procedure that integrates the results of the similarities of all parts in a complex pattern. In short, we think quantifying the similarities of the complex patterns in this study is an issue for future research.

Next, the PHQ-8 omits the ninth item of the PHQ-9. It is unclear whether the finding that the item responses on depression screening scales follow a common pattern can be generalized to suicide-related items. Further studies employing assessment tools such as the Suicide History Self-Rating Screening Scale are needed to determine whether suicide-related items follow the same item response patterns as other items (

However, our research has methodological advantages. First, although the methods of this study were simple (visualizations using histograms), they allowed us to observe a complex pattern of item responses; this could have been unsuccessful if the item responses had not been visualized with histograms. Generally speaking, graphical analysis is crucial for exploratory data analysis involving complex patterns (

Finally, we come to the significance and potential application of our study. First, with regard to descriptive statistics, if the mathematical patterns for item responses on the PHQ-8 are established, the distributions of item responses can be described using mathematical models. Consequently, we can easily evaluate the patterns of item responses with estimated parameters. Previous epidemiological studies have paid little attention to the patterns of item responses in depression screening scales, partly because there were no models to describe the patterns of item responses. Since researchers understand the characteristics of data through models, the use of mathematical models will enable us to observe further findings (

Second, regarding inference statistics, the distributional patterns of item responses are significant because statistical hypothesis tests and statistical estimators are derived from statistical models, which are assumed to adequately approximate the empirical distribution. As noted in the Introduction, if the empirical distributions of item responses follow a non-normal distribution, the statistical model of normal variables will require reconsideration.

Finally, evidence that the mathematical patterns of item responses on the PHQ-8, CES-D, and K6 are the same will provide further insight into the mechanism of depressive symptoms. Given the same mathematical pattern among all depressive symptoms, we assumed that all depressive symptom items shared one latent trait. In clinical settings, depressive symptoms are assessed using depression rating scales and added to construct sum scores. These scores are used as a proxy for depression severity. To allow for such interpretations, depression rating scales must measure a single construct (

ST carried out the design of the study and the statistical analysis and wrote the manuscript. YK, KI, MA, and HY contributed to the analysis of the data. OY and TF contributed to the acquisition of data. YK, KI, MA, HY, OY, and TF interpreted the data and wrote the manuscript. All authors read and approved the final manuscript.

The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

The authors would like to thank the 2015 Behavioral Risk Factor Surveillance Survey for providing the data for this study.