Highlight

Statistical Significance
Are research findings the result of a particular intervention or due to chance?

An important concept in education research is statistical significance. The idea is to determine the likelihood that a particular outcome in a study, such as higher student math achievement, is mere chance or was associated with a specific intervention. Statistical significance is measured by what is known as a “p value.” Big values of p are evidence supporting the idea that the results were due to chance. Small values of p are evidence against the idea that chance explains the outcome.

As you might imagine, researchers usually hope to obtain small p values.  In social science research, it is common for scholars to set a threshold for determining significance at 5 percent. If the threshold is set at 5 percent and the p value is less than or equal to that percentage, then the result is considered to be statistically significant. (For those interested in the technical details, California State University, Long Beach offers an online explainer that describes how and why researchers choose this threshold.) Note that scholars often refer to a statistically significant result simply as “significant.” The p value is not only among the most widely used statistical terms; it is also among the most misused. One common misuse involves using p values in studies in which no chance process occurred, as when a test is given to every single student in a school district instead of to a random selection of students. The p value cannot tell you whether your results were affected by factors unrelated to chance – such as the fact that some of the students in the district were absent the day of the test. After all, the students probably weren’t randomly selected to miss school.

Another common misuse is to assume that statistical significance is equivalent to obtaining big, meaningful or important results. Unfortunately, a p value will not really tell you anything about the magnitude or real-world importance of the results. It is important to keep in mind, for instance, that p values are very sensitive to sample sizes. The bigger the sample, the more likely the findings are statistically significant. But even while studies with many thousands of participants may yield highly statistically significant results, the actual effect of the program or intervention may be marginal.

­– Denise-Marie Ordway and Holly Yettick