^{*}

Edited by: James Gaskin, Brigham Young University, United States

Reviewed by: Rink Hoekstra, University of Groningen, Netherlands; Juan Jose Fernandez Muxñoz, Universidad Rey Juan Carlos, Spain

This article was submitted to Organizational Psychology, a section of the journal Frontiers in Psychology

This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.

One challenge when communicating science to practitioners and the general public is accurately representing statistical results. In particular, describing the meaning of statistical significance to a non-scientific audience is especially difficult given the technical nature of a correct definition. Correct interpretations of statistical significance can be unintuitive, nuanced, and use unfamiliar technical language. As a result, when researchers are tasked with providing short and understandable interpretations of statistical significance it can be tempting to default to convenient but incorrect interpretations. In the current paper, we offer a concise, simple, and correct interpretation of statistical significance that is suitable for communications targeting a general audience.

For researchers in applied fields like industrial/organizational (I/O) psychology that follow the scientist-practitioner model, it is important to be able to disseminate knowledge and communicate science to non-scientific audiences. One challenge often faced by researchers is effectively communicating what statistical significance means. Imagine that you submit an article about your latest study to a popular press publication and the editor returns some edits. One sentence has been changed from, “All of the results were statistically significant” to, “All of the results were statistically significant (indicating that the results were not likely due to chance).”

Do you approve, reject, or modify the edit? Approving it means you sign off on adding an incorrect interpretation of statistical significance. Rejecting it means that you leave it up to readers to know or figure out for themselves what statistical significance means. Modifying it means that you have the difficult task of providing an easy-to-read, but correct, definition of statistical significance for a general audience. When faced with this trilemma, it may be easy to default to a correct sounding albeit incorrect interpretation of statistical significance. Our goal is to help researchers who need to communicate science to non-scientific audiences by providing a concise and easy to understand interpretation of statistical significance that is correct.

In order to effectively disseminate research findings to a general audience, researchers are tasked with simplifying and succinctly describing their results and conclusions. Given the ubiquity of statistical significance, the dissemination process may involve explaining what statistical significance means to a general audience – including managers, executives, lawyers, and journalists. Providing an intelligible and concise explanation of statistical significance can be hard to do without falling prey to common fallacies and misinterpretations (see

Accurately interpreting statistical significance is not easy – history and research show that significance testing is notorious for being misunderstood (e.g.,

One implication of these issues is that if a researcher is tasked with providing an understandable definition of statistical significance it can be easy to default to inaccurate definitions and commonly used fallacies. Notably, commonly used fallacies and misinterpretations (

Since its introduction nearly 90 years ago, null hypothesis significance testing (NHST) has been the most widely used method for statistical analysis in psychology (

For as long as it has been used, NHST has been criticized for being defined or interpreted incorrectly.

After a decade or so passed since Bakan’s paper,

As the years passed, misinterpretations of significance testing continued and ^{1}. Ultimately, changes were made to the APA publication manual that were recently re-affirmed in the APA’s journal article reporting standards (

But what does it mean for something to be statistically significant? Many researchers who have been formally educated on the subject, and some textbooks will (incorrectly) tell you that statistical significance means that the odds that a result happened due to chance is small – specifically, in most cases, that the odds are less than five percent (

Statistical significance refers to the conditional probability of hypothetical data. In the vast majority of cases where significance testing is used, a researcher starts with the assumption that there is NO effect, relation, or difference between what is being investigated. This is known as the null hypothesis. Next, the researcher evaluates the probability of data, given this null hypothesis. Consider a research team investigating the relation between drinking coffee and hating one’s boss. The team begins with the null hypothesis, that coffee has NO effect on how much someone hates their boss. Then the team determines the probability of data (or more extreme data), assuming the null hypothesis is true.

More technically, significance testing uses an index called the

With its many technicalities, significance testing is not inherently ready for public consumption. It involves conditional probability, hypothetical results (whatever those are), and the null hypothesis (a peculiar starting assumption given researchers are often examining relations for the very reason that they expect them to be non-zero). Is there a way to bypass the technical details and hypotheticals, but still accurately convey what statistical significance means? We think that there is. To do so, we consider the end utility of significance testing and leverage this deduction rather than trying to parse the technical aspects of its definition into something palatable and easily digestible.

According to the correct definition of statistical significance, what is the end utility of concluding that a result is statistically significant? We propose that the utility may be seen as follows: Given that there seems to be a low probability of getting results as extreme, or more extreme, than what was observed when I assume the actual effect is zero (i.e., the data are unlikely, given the null) perhaps my starting assumption that there is no relation is incorrect. In other words, concluding that something is “statistically significant” is not dissimilar from saying, there is now some reason to believe that the effect is non-zero. I cannot say what it is, it just may not be zero. Effect sizes and confidence intervals can give information about what the effect may be, but statistical significance alone does not provide information about how large an effect may be – it just MAY not be zero.

We suggest that this, “may not be zero,” interpretation is a simple, concise, and not incorrect interpretation of statistical significance. We can put this interpretation into practice by applying it to the opening paragraph’s trilemma:

“All of the results were statistically significant (indicating that results were not likely due to chance).”

“All of the results were statistically significant (indicating that the true effects may not be zero).”

Or

“All of the results were statistically significant (which suggests that there is reason to doubt that the true effects are zero).”

What is clear form this interpretation is that it is uninformative, bordering on meaningless. This is true and this is the nature of significance testing. Attempts to get more interpretational juice from the proverbial squeeze when interpreting statistical significance are likely lead to interpretational overreach and predictable mistakes. If information beyond “may not be zero” is desired, researchers should supplement

What if a result was not statistically significant? Does that at least tell us that the null hypothesis is true? Sadly, no. Because significance testing assumes the null is true,

Researchers in applied fields like I/O psychology are often required to communicate and interpret what statistical significance means to non-scientific audiences. Relying on a technically accurate formal definition of statistical significance is not always productive because it is not meaningful or intuitive for general audiences. Properly understanding technically correct definitions is challenging even for trained researchers, as it is well documented that statistical significance is frequently misunderstood and misinterpreted by researchers who rely on it (

Significance testing can be a helpful tool for making inferences from data. However, as is the case with other useful tools, mistakes and accidents sometimes happen when using the tool. This is why so many useful tools have safety features added to them over time to prevent accidents from mistakes or probable habits of misuse (e.g., firearms have safeties, chainsaws have chain brakes, etc.). Statistical significance has been used for a long time without the aid of safety features to deter inappropriate use and avoid accidents. The short hand interpretation we provide (i.e., interpreting statistical significance as “may not be zero,”) can be viewed as a safety feature that may reduce science communication accidents when significance testing is used when communicating with the general public. Our short-hand interpretation also has a clear advantage of making it readily apparent how uninformative significance testing is on its own. This makes it hard to oversell and overstate the importance of single research findings and allows practitioners and consumers of research to have an honest accounting of what research is telling them.

JS and DS wrote the paper.

The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.