What Is a p-Value?
A p-value is the probability of obtaining a result at least as extreme as the one observed, assuming the null hypothesis is true. That definition is precise but not always intuitive, so consider this analogy.
Imagine you suspect a coin is unfair. You flip it 20 times and get 15 heads. The p-value answers the question: "If the coin were perfectly fair, how likely would I be to see 15 or more heads in 20 flips?" If that probability is very low (say, p = .021), you have reason to doubt the coin is fair. If it is relatively high (say, p = .41), the result is easily explained by normal chance.
The p-value does not tell you whether your hypothesis is correct. It tells you how surprising your data would be if nothing were actually going on. This distinction is critical, and misunderstanding it is the source of most p-value misinterpretations.
How to Interpret p-Values
The Basic Logic
Every hypothesis test begins with a null hypothesis (H0), which typically states there is no effect, no difference, or no relationship. The p-value quantifies how compatible your observed data are with that null hypothesis.
- A small p-value means your data are unlikely under H0. This gives you grounds to reject H0.
- A large p-value means your data are consistent with H0. You fail to reject H0 (but this does not prove H0 is true).
Interpretation Reference Table
| p-value range | Conventional label | Typical interpretation | |---------------|-------------------|----------------------| | p < .001 | Highly significant | Very strong evidence against H0 | | p < .01 | Significant | Strong evidence against H0 | | p < .05 | Significant | Sufficient evidence against H0 at the conventional threshold | | .05 < p < .10 | Marginally significant | Weak evidence; sometimes discussed but not conclusive | | p > .10 | Not significant | Insufficient evidence to reject H0 |
A Worked Example
Suppose you conduct an independent samples t-test comparing exam scores between a study-group condition (M = 78.4, SD = 9.2, n = 35) and a solo-study condition (M = 73.1, SD = 10.5, n = 35). The test yields t(68) = 2.25, p = .028.
Here is how to interpret this step by step:
- State the null hypothesis: There is no difference in exam scores between the two study conditions.
- Check the p-value against the threshold: p = .028 is less than .05.
- Make a decision: Reject the null hypothesis.
- Interpret in context: Students in the study-group condition scored significantly higher on the exam than those who studied alone.
The p-value of .028 means that if there were truly no difference between conditions, you would observe a difference this large or larger only about 2.8% of the time by chance alone.
The .05 Threshold: Why and When
The convention of using alpha = .05 as the significance threshold traces back to Ronald Fisher in the 1920s. Fisher suggested .05 as a convenient reference point, not as a rigid boundary. Over decades, however, it became treated as an absolute cutoff, which Fisher himself never intended.
When .05 Makes Sense
For most exploratory research in the social and behavioral sciences, alpha = .05 provides a reasonable balance between detecting real effects (power) and avoiding false positives (Type I error). It means you accept a 5% chance of concluding an effect exists when it actually does not.
When to Use a Different Threshold
Some situations call for stricter or more lenient thresholds:
- Multiple comparisons: When testing many hypotheses simultaneously, the family-wise error rate inflates. Bonferroni correction or false discovery rate adjustments lower the per-test alpha.
- High-stakes decisions: Clinical trials, drug approvals, and genomics studies often use p < .01 or p < .001 because the consequences of a false positive are severe.
- Exploratory research: Some fields accept p < .10 for preliminary findings that warrant further investigation.
The key point is that .05 is a convention, not a law of nature. Always consider the context and consequences of your decision.
Common Misinterpretations of p-Values
This section addresses the most widespread errors in p-value interpretation. If you take away one thing from this guide, let it be this: most researchers at some point have held at least one of these misconceptions.
Mistake 1: "p = .03 Means There Is a 97% Chance the Result Is True"
This is perhaps the single most common misinterpretation. The p-value is not the probability that your research hypothesis is true. It is the probability of obtaining your data (or more extreme data) given that the null hypothesis is true. These are fundamentally different statements.
The probability that a hypothesis is true given the data requires Bayesian analysis with prior probabilities. A frequentist p-value simply cannot answer that question.
Mistake 2: "Non-significant Means No Effect"
A result of p = .12 does not prove that no effect exists. It means you did not find sufficient evidence to reject the null hypothesis at your chosen alpha level. The study may have been underpowered (too few participants), the effect may be real but small, or measurement error may have obscured it.
Absence of evidence is not evidence of absence. This is especially important in studies with small sample sizes, where non-significant results are common even when real effects exist.
Mistake 3: "The p-Value Tells You the Size of the Effect"
A very small p-value (say, p < .001) does not mean the effect is large or important. With a large enough sample, even trivially small differences become statistically significant. A study with 50,000 participants might find a 0.5-point difference on a 100-point scale with p < .001. The effect is statistically significant but practically meaningless.
Always report and interpret an effect size alongside the p-value. Common effect size measures include Cohen's d, eta squared (partial eta squared), and R squared.
Mistake 4: "Smaller p = More Important Result"
A result with p = .001 is not necessarily more important or more replicable than one with p = .04. The p-value is influenced by sample size, variance, and the magnitude of the effect. Two studies examining the same phenomenon can yield different p-values simply because they used different sample sizes.
Importance should be judged by effect size, practical significance, and how well the finding replicates, not by comparing p-values.
Mistake 5: "p = .049 and p = .051 Are Fundamentally Different"
Treating p = .049 as "significant" and p = .051 as "not significant" implies a sharp qualitative boundary that does not exist. The evidence against the null hypothesis is nearly identical for both values. Reporting one as a discovery and the other as a null result is an artifact of dichotomous thinking, not a reflection of the underlying data.
Many statisticians and journal editors now advocate for reporting exact p-values and interpreting them on a continuum rather than relying on pass/fail cutoffs.
Mistake 6: "A Significant p-Value Means the Results Will Replicate"
Statistical significance in a single study does not guarantee that the finding will replicate. A p = .04 result has a meaningful chance of failing to reach significance in an exact replication, particularly if the original study was underpowered or if the true effect is small.
Replication depends on effect size, sample size, and study design. The p-value from a single study is one piece of evidence, not proof.
How to Report p-Values in APA Format
APA 7th edition has specific rules for reporting p-values. Following these conventions signals methodological rigor and helps readers interpret your results consistently.
Rule 1: Report Exact p-Values
Report the exact p-value to two or three decimal places. Do not simply write "p < .05" when you have a more precise value.
- Correct: p = .034
- Correct: p = .007
- Avoid: p < .05 (when you know the exact value)
Rule 2: Use p < .001 for Very Small Values
When the p-value is less than .001, report it as p < .001 rather than writing out many decimal places. Do not write p = .000, as a p-value is never exactly zero.
- Correct: p < .001
- Incorrect: p = .000
- Incorrect: p = .0003
Rule 3: No Leading Zero
Because p-values cannot exceed 1.0, APA style omits the leading zero. The same rule applies to other statistics bounded by 1, such as r and R squared.
- Correct: p = .034
- Incorrect: p = 0.034
APA Reporting Examples by Test
Independent samples t-test:
The treatment group (M = 24.50, SD = 4.80) scored significantly higher than the control group (M = 20.10, SD = 5.30), t(58) = 3.45, p = .001, d = 0.89.
One-way ANOVA:
There was a statistically significant difference in satisfaction ratings across the three conditions, F(2, 87) = 4.92, p = .009, partial eta squared = .10.
Pearson correlation:
Study hours and GPA were positively correlated, r(98) = .37, p < .001.
Chi-square test of independence:
There was a significant association between department and turnover status, chi-square(3, N = 240) = 11.85, p = .008, V = .22.
Non-significant result (still report the exact p-value):
The difference between groups was not statistically significant, t(44) = 1.38, p = .175, d = 0.41.
Note that even when results are not significant, you still report the exact p-value and effect size. This information is valuable for meta-analyses and future power analyses.
p-Value vs Effect Size: Why Both Matter
The p-value and effect size answer different questions. The p-value asks: "Is there evidence that an effect exists?" The effect size asks: "How large is that effect?"
| | p-value | Effect size | |---|---------|-------------| | Question answered | Is the effect likely real? | How large is the effect? | | Influenced by sample size | Heavily | Minimally | | Can be misleading alone | Yes | Yes | | APA 7th edition requirement | Yes | Yes |
Consider two studies on a new teaching method:
- Study A (N = 500): t(498) = 2.10, p = .036, d = 0.19
- Study B (N = 40): t(38) = 2.85, p = .007, d = 0.90
Study A has a significant result but a tiny effect size. The teaching method produces a barely noticeable improvement. Study B has a smaller p-value and a large effect size, suggesting a substantial and meaningful improvement. Reporting only p-values would obscure this important distinction.
APA 7th edition requires both for good reason. Together, they give a complete picture of your findings.
Statistical Significance vs Practical Significance
Statistical significance means the result is unlikely under the null hypothesis. Practical significance means the result matters in the real world. These are not the same thing.
A pharmaceutical trial might find that a new drug lowers blood pressure by 0.5 mmHg more than a placebo, with p < .001 and N = 20,000. Statistically significant? Yes. Clinically meaningful? Probably not, since doctors consider a change of at least 5 mmHg necessary for practical benefit.
When interpreting your results, always ask three questions:
- Is the effect statistically significant? (Check the p-value against your alpha level.)
- How large is the effect? (Check the effect size against benchmarks and prior research.)
- Does the effect matter in practice? (Consider the real-world implications in your specific domain.)
A finding that satisfies all three is the strongest kind of evidence. A finding that satisfies only the first is the weakest.
Try StatMate's Free Calculators
Every one of StatMate's 20 free calculators automatically computes p-values and formats them in APA 7th edition style. You do not need to look up formatting rules or worry about leading zeros, decimal places, or when to use p < .001. The output is ready to paste into your manuscript.
Here are a few calculators particularly relevant to the concepts in this guide:
- StatMate's free t-test calculator reports t, df, exact p, and Cohen's d in a single output.
- StatMate's free ANOVA calculator provides F, p, and both eta squared and partial eta squared.
- StatMate's free correlation calculator outputs r, p, and R squared together.
- StatMate's free chi-square calculator computes the chi-square statistic, exact p, and Cramer's V automatically.
- StatMate's free sample size calculator helps you plan studies with adequate power so your p-values are meaningful.
All results include both significance testing and effect sizes, so you never have to report one without the other.