When to Use a One-Sample t-Test
A one-sample t-test compares the mean of a single sample to a known or hypothesized population value. It answers one straightforward question: does this group differ from a specific standard?
You would use a one-sample t-test when:
- Testing against a population norm. A researcher wants to know whether students at a particular university score differently on a standardized IQ test compared to the national average of 100.
- Comparing to a benchmark or criterion. A quality control engineer measures whether the average weight of cereal boxes differs from the labeled weight of 500 g.
- Evaluating change from a baseline. A psychologist assesses whether average reaction times in a treatment group differ from a known baseline of 250 ms.
The key requirement is that you have a single continuous variable measured on one group, and a fixed reference value to compare it against. The data should be approximately normally distributed, especially with small samples.
The APA Reporting Template
APA 7th edition requires a specific format for reporting t-test results. For a one-sample t-test, the standard template is:
t(df) = X.XX, p = .XXX, d = X.XX
Where:
| Symbol | Meaning | |--------|---------| | t | The t-statistic (italicized) | | df | Degrees of freedom, calculated as N - 1 | | p | The p-value (no leading zero) | | d | Cohen's d effect size (with leading zero) |
Formatting rules to remember:
- Use italics for statistical symbols: t, p, d, M, SD, N
- Report p-values without a leading zero (.034, not 0.034) because p cannot exceed 1.0
- Report effect sizes with a leading zero (0.75, not .75) because d can exceed 1.0
- Use two decimal places for t and d, three for p
- For very small p-values, write p < .001 rather than the exact value
Step 1: Report Descriptive Statistics
Before presenting inferential results, APA style requires you to report the descriptive statistics of your sample and clearly state the test value.
The sample (N = 45) had a mean IQ score of M = 105.30 (SD = 12.40). Scores were compared to the population mean of 100.
Key elements to include:
| Element | What to report | Example | |---------|---------------|---------| | Sample size | N = | N = 45 | | Sample mean | M = | M = 105.30 | | Standard deviation | SD = | SD = 12.40 | | Test value | The known/hypothesized value | Population mean of 100 |
Always specify what the test value represents. Simply writing "compared to 100" is insufficient. State the source: a population parameter, a published norm, a regulatory standard, or a theoretical expectation.
Step 2: Report the t-Test Results
After the descriptives, report the inferential statistics. Include the t-statistic, degrees of freedom, exact p-value, and a confidence interval around the mean difference.
A one-sample t-test revealed that participants' IQ scores were significantly higher than the population mean of 100, t(44) = 2.87, p = .006, 95% CI [1.58, 9.02].
Breaking this down:
- df = 44 because N - 1 = 45 - 1 = 44
- 95% CI [1.58, 9.02] is the confidence interval of the mean difference (sample mean minus test value). Since the interval does not include zero, this confirms the significant result.
- The direction of the effect is stated in words ("significantly higher than") rather than relying solely on the sign of the t-statistic.
If the p-value is extremely small:
t(44) = 4.52, p < .001
Step 3: Report Effect Size (Cohen's d)
APA 7th edition strongly recommends reporting effect sizes alongside significance tests. For a one-sample t-test, Cohen's d is calculated as:
d = (M - mu) / SD
Where M is the sample mean, mu is the test value, and SD is the sample standard deviation.
Interpretation guidelines (Cohen, 1988):
| Cohen's d | Interpretation | |-------------|----------------| | 0.20 | Small effect | | 0.50 | Medium effect | | 0.80 | Large effect |
For the IQ example: d = (105.30 - 100) / 12.40 = 0.43, indicating a small-to-medium effect.
The effect size was moderate, d = 0.43.
Always report the effect size even when the result is not significant. A non-significant p-value with a medium effect size tells a different story than a non-significant p-value with a negligible effect.
Complete APA Reporting Example
Here is a full paragraph combining all elements for a one-sample t-test, suitable for the Results section of a manuscript.
Scenario: A researcher measured IQ scores in a sample of 45 university students to determine whether their cognitive ability differed from the general population mean of 100.
A one-sample t-test was conducted to determine whether the mean IQ score of the sample differed from the population mean of 100. The sample mean (M = 105.30, SD = 12.40) was significantly higher than the test value, t(44) = 2.87, p = .006, d = 0.43, 95% CI of the difference [1.58, 9.02]. The effect size indicated a small-to-medium difference between the sample and the population norm.
This paragraph includes every element a reviewer expects: the purpose, descriptive statistics, test results, effect size, confidence interval, and a brief interpretation.
Reporting Non-Significant Results
When the one-sample t-test is not significant, you still report all of the same statistics. The key difference is in the language: avoid saying the groups "are equal" or that "there was no difference." Instead, state that no statistically significant difference was found.
Scenario: A nutritionist measured the daily caloric intake of 30 participants and compared it to the recommended 2,000 calories.
A one-sample t-test indicated that the mean daily caloric intake (M = 2,045.00, SD = 180.50) did not differ significantly from the recommended value of 2,000 calories, t(29) = 1.37, p = .182, d = 0.25, 95% CI [-22.40, 112.40]. The small effect size suggests that any deviation from the recommendation was minimal.
Notice that the confidence interval includes zero, which is consistent with the non-significant result. Also note that the effect size is still reported and interpreted.
One-Sample t-Test vs. Other Tests: When to Use Each
Choosing the correct test requires understanding what each test is designed for. The one-sample t-test occupies a specific niche: comparing one group's mean to a fixed, known value. The following comparisons clarify when to use alternative tests instead.
vs. Independent-Samples t-Test
These tests answer fundamentally different questions. A one-sample t-test compares one group to a fixed value. An independent-samples t-test compares two separate groups to each other. If you are comparing exam scores from two different classes, that is an independent-samples test. If you are comparing one class against a national standard, that is a one-sample test.
The critical distinction is the nature of the comparison value. In a one-sample test, the reference value (e.g., population mean = 100) is a known constant, not estimated from data. In an independent-samples test, both means are estimated from data and carry sampling error.
One-sample: t(44) = 2.87, p = .006 — comparing sample mean to a known population value of 100.
Independent-samples: t(88) = 3.14, p = .002 — comparing means of Group A and Group B, both estimated from data.
vs. Paired-Samples t-Test
A common confusion: the paired-samples t-test also involves a single group, but it compares two related measurements (e.g., pre-test vs. post-test). The one-sample t-test compares one measurement to a fixed constant, not to another measurement from the same participants.
If your "comparison value" comes from the same participants measured at a different time point, you need a paired-samples t-test. If it is a fixed standard that does not vary with your sample, you need a one-sample t-test.
Example of misuse: A researcher measures blood pressure before and after an intervention and compares the post-intervention mean to 120 mmHg using a one-sample t-test. This ignores individual-level change and loses statistical power. The correct approach is a paired-samples t-test comparing pre and post measurements.
vs. One-Sample Wilcoxon Signed-Rank Test
When the normality assumption is violated and the sample is small (typically N < 30), the one-sample Wilcoxon signed-rank test is the non-parametric alternative. It tests whether the median (rather than the mean) differs from the test value.
A one-sample Wilcoxon signed-rank test indicated that median response times (Mdn = 260.50 ms) were significantly higher than the baseline of 250 ms, T = 312, z = 2.15, p = .032, r = .34.
Use the Wilcoxon alternative when your data are heavily skewed, contain outliers, or are measured on an ordinal scale. For samples larger than approximately 30, the one-sample t-test is robust to moderate departures from normality due to the Central Limit Theorem.
Decision Flowchart
- How many groups? One group → continue. Two or more groups → use independent-samples t-test or ANOVA.
- What is the comparison value? A fixed known constant → one-sample t-test. Another measurement from the same participants → paired-samples t-test.
- Is the data approximately normal? Yes or N > 30 → one-sample t-test. No and N < 30 → one-sample Wilcoxon signed-rank test.
Effect Size for One-Sample t-Test: Cohen's d in Depth
Effect size quantifies the practical magnitude of a finding, independent of sample size. For the one-sample t-test, Cohen's d is the standard measure, expressing the difference between the sample mean and the test value in standard deviation units.
Calculation Methods
The direct formula is:
d = (M - mu) / SD
You can also compute d from the t-statistic and degrees of freedom:
d = t / sqrt(N)
This is useful when you are reading published results that report t and N but not the raw means and standard deviations. For the IQ example: d = 2.87 / sqrt(45) = 2.87 / 6.71 = 0.43, which matches the direct calculation.
Interpretation Benchmarks
Cohen (1988) proposed the following benchmarks, but they should be applied cautiously and in context:
| Cohen's d | Label | Practical meaning | |-------------|-------|-------------------| | 0.20 | Small | Detectable only with careful measurement; little practical impact in most contexts | | 0.50 | Medium | Noticeable to observers; likely to have practical consequences | | 0.80 | Large | Obvious difference; substantial practical importance | | 1.20+ | Very large | Rare in social sciences; common in medical or educational interventions |
These benchmarks are defaults, not universal truths. A d of 0.20 might be highly meaningful in a field where effects are typically small (e.g., public health interventions applied to millions), while a d of 0.80 might be unremarkable in a context where large effects are routine.
Confidence Intervals for d
Reporting a confidence interval around d communicates the precision of your effect size estimate. APA 7th edition recommends confidence intervals for all effect sizes. The formula for the CI uses the noncentral t-distribution and is computationally intensive, which is why calculators are recommended.
The effect was moderate, d = 0.43, 95% CI [0.12, 0.73].
A confidence interval for d that includes zero indicates the effect size is not significantly different from zero, which aligns with a non-significant p-value. When the CI is wide, the estimate is imprecise and should be interpreted with caution.
APA Format for Effect Size
The sample scored significantly above the population norm, t(44) = 2.87, p = .006, d = 0.43, 95% CI [0.12, 0.73], indicating a small-to-medium effect.
Note that d uses a leading zero (0.43, not .43) because it can exceed 1.0. Always pair the numeric value with a verbal interpretation.
Assumptions and How to Check Them
The one-sample t-test relies on several assumptions. Violating these assumptions can produce misleading results, particularly with small samples.
1. Continuous Dependent Variable
The dependent variable must be measured on a continuous scale (interval or ratio). Common examples include test scores, reaction times, weights, and blood pressure readings. If your variable is ordinal (e.g., a single Likert item), the one-sample t-test is not appropriate. Use the one-sample Wilcoxon signed-rank test instead, or use the t-test only on composite scale scores formed from multiple items.
2. Independence of Observations
Each data point must be independent of every other data point. This means one participant's score should not influence another's. Independence is violated when data have a hierarchical structure (e.g., students nested within classrooms) or a temporal dependency (e.g., repeated measurements from the same participant). Independence is ensured by proper study design, not by statistical testing after the fact.
3. Normality of the Dependent Variable
The one-sample t-test assumes the data are drawn from a normally distributed population. For small samples (N < 30), this assumption is critical. For larger samples, the Central Limit Theorem ensures the sampling distribution of the mean is approximately normal even if the raw data are not.
How to check normality:
- Shapiro-Wilk test. The most powerful test for normality with small to moderate samples. A significant result (p < .05) indicates non-normality. Report as: "The Shapiro-Wilk test indicated the data were approximately normally distributed, W = 0.97, p = .342."
- Q-Q plot. Plot quantiles of your data against quantiles of a normal distribution. Points falling approximately along the diagonal line suggest normality. Systematic deviations (S-curves, heavy tails) indicate violations.
- Skewness and kurtosis. Values between -2 and +2 are generally acceptable. Report these values when justifying the use of a parametric test.
4. What to Do When Assumptions Are Violated
If normality is violated with a small sample, you have several options:
- Use the Wilcoxon signed-rank test — the non-parametric alternative that does not assume normality.
- Transform the data — log, square root, or reciprocal transformations can normalize skewed distributions. Report the transformation and back-transform results for interpretation.
- Bootstrap the confidence interval — resampling methods do not require normality assumptions and provide robust confidence intervals.
- Proceed with caution — if N > 30, the t-test is typically robust to moderate violations. Note the violation and justify your decision in the manuscript.
The Shapiro-Wilk test indicated a significant departure from normality, W = 0.89, p = .004. Given the sample size (N = 50) and the robustness of the t-test to non-normality with larger samples, the parametric test was retained.
Common Applications in Research
Understanding when to apply a one-sample t-test helps researchers frame their hypotheses correctly. Here are the most common applications with reporting examples.
Comparing a Sample to a Population Norm
This is the classic use case. Standardized tests (IQ, GRE, SAT, depression inventories) have established population norms. Researchers test whether a specific subgroup differs from these norms.
A one-sample t-test was conducted to determine whether nursing students' scores on the Beck Depression Inventory (M = 14.20, SD = 6.80, N = 62) differed from the general population norm of 10.0. The sample scored significantly higher than the norm, t(61) = 4.86, p < .001, d = 0.62, 95% CI [2.48, 5.92], suggesting elevated depressive symptomatology in this group.
Scale Validation Against a Midpoint
Researchers developing or validating scales often test whether responses differ from the scale midpoint. This establishes whether respondents tend toward agreement or disagreement.
The mean score on the 7-point satisfaction scale (M = 5.32, SD = 1.15, N = 200) was compared to the scale midpoint of 4.0. A one-sample t-test indicated significantly above-midpoint satisfaction, t(199) = 16.22, p < .001, d = 1.15, 95% CI [1.16, 1.48].
Quality Control Against a Standard
In industrial and manufacturing contexts, the one-sample t-test verifies whether a production process meets specifications.
The mean fill volume of sampled bottles (M = 502.30 mL, SD = 3.80, N = 40) was compared to the target of 500 mL. The one-sample t-test indicated a significant overfill, t(39) = 3.83, p < .001, d = 0.61, 95% CI [1.09, 3.51].
Pre-Registered Hypotheses
When a specific directional prediction is stated in a pre-registration document, the one-sample t-test can be used with a one-tailed test. This is particularly common in replication studies where the expected direction of the effect is well-established.
Consistent with the pre-registered hypothesis, participants' average score (M = 78.50, SD = 10.20, N = 35) exceeded the passing criterion of 75, t(34) = 2.03, p = .025, one-tailed, d = 0.34, 95% CI [0.44, 6.56].
Reporting One-Tailed vs. Two-Tailed Tests
By default, a one-sample t-test is two-tailed, testing whether the sample mean differs from the test value in either direction. A one-tailed test is appropriate only when you have a strong directional hypothesis established before data collection.
When to Use Each
Two-tailed (default — use this unless you have a specific reason not to):
- You want to detect a difference in either direction
- You are conducting exploratory research
- You have no strong theoretical basis for predicting direction
- Your pre-registration does not specify a direction
One-tailed:
- You have a strong, theoretically justified directional prediction
- The direction was specified before data collection (ideally in a pre-registration)
- Only one direction of the effect is meaningful for your research question
- You are willing to ignore effects in the opposite direction
APA Format Differences
Two-tailed (no special notation needed):
A one-sample t-test was conducted to determine whether scores differed from the national average of 75. The results were significant, t(39) = 1.92, p = .031, d = 0.30, 95% CI [0.72, 7.28].
One-tailed (must label explicitly):
A one-sample t-test was conducted to determine whether scores exceeded the national average of 75. The results were significant, t(39) = 1.92, p = .031, one-tailed, d = 0.30, 95% CI [0.72, 7.28].
The key differences: (1) state the direction in the research question, (2) label the p-value as one-tailed, and (3) describe the effect direction in the interpretation.
Justification Requirements
Reviewers scrutinize one-tailed tests carefully because they halve the p-value, making it easier to reach significance. You must provide justification in the Method section:
Based on prior research demonstrating that mindfulness training consistently improves attention scores (Smith et al., 2020; Jones & Brown, 2022), a one-tailed test was used to evaluate whether the trained group's scores exceeded the population mean.
Controversy Around One-Tailed Tests
Many methodologists advise against one-tailed tests in most research contexts. The primary concern is that they are often used post-hoc to achieve significance when a two-tailed test yields a non-significant result. This constitutes p-hacking. If you did not pre-register the direction, use a two-tailed test. If a reviewer questions your one-tailed test, you should be able to point to a pre-registration or a clear theoretical basis documented before data collection.
Common Mistakes in One-Sample t-Test Reporting
1. Testing Against the Wrong Population Value
The validity of a one-sample t-test depends entirely on the appropriateness of the comparison value. Using an outdated norm, a norm from a different population, or an arbitrary benchmark can render the test meaningless.
Incorrect: Comparing current students' scores to a norm published in 1985 without acknowledging that population scores may have shifted (the Flynn effect for IQ, for instance).
Correct: "Scores were compared to the most recent national norm of 100, based on the 2020 standardization sample (Wechsler, 2020)."
2. Ignoring Normality for Small Samples
With N < 30, the normality assumption is critical. Many researchers skip this check entirely. Always inspect distributions visually and run a Shapiro-Wilk test for small samples. If the data are markedly non-normal, the Wilcoxon signed-rank test is the appropriate alternative.
3. Not Reporting the Confidence Interval
Some researchers report t, p, and d but omit the confidence interval. The CI provides information about the precision and range of the estimated mean difference that the p-value alone cannot convey. APA 7th edition recommends CIs for all inferential tests.
Incomplete: t(44) = 2.87, p = .006, d = 0.43.
Complete: t(44) = 2.87, p = .006, d = 0.43, 95% CI [1.58, 9.02].
4. Confusing Statistical Significance With Practical Significance
A statistically significant result does not necessarily mean the difference is practically meaningful. With a large enough sample, even trivially small differences become significant. Always interpret the effect size alongside the p-value.
Example: With N = 500, a mean IQ of 101.2 compared to a norm of 100 could yield p = .015, but d = 0.08 indicates a negligible effect. Reporting this as a "significant difference" without acknowledging the tiny effect size is misleading.
5. Missing Effect Size Entirely
Reporting t and p alone is no longer considered sufficient. APA 7th edition requires or strongly recommends an effect size measure. Cohen's d takes minimal effort to calculate and adds meaningful context.
6. Failing to State the Direction
Always describe whether the sample mean was above or below the test value. The sign of the t-statistic alone is not enough for readers to understand the practical meaning of the result.
Unclear: "The result was significant, t(44) = 2.87, p = .006."
Clear: "The sample mean was significantly higher than the population norm, t(44) = 2.87, p = .006."
One-Sample t-Test APA Checklist
Before submitting your manuscript, verify that your one-sample t-test reporting includes all required elements:
- [ ] Purpose of the test clearly stated
- [ ] Test value specified with its source
- [ ] Sample size (N) reported
- [ ] Descriptive statistics: M and SD
- [ ] t-statistic with degrees of freedom: t(df) = X.XX
- [ ] Exact p-value (or p < .001): p = .XXX
- [ ] Effect size with interpretation: d = X.XX
- [ ] 95% confidence interval of the mean difference
- [ ] Direction of effect described in words
- [ ] Normality assumption addressed
- [ ] Italics used for all statistical symbols
Frequently Asked Questions
What is the difference between a one-sample t-test and a z-test?
Both compare a sample mean to a known value, but the z-test requires the population standard deviation to be known, which is rare in practice. The one-sample t-test uses the sample standard deviation instead, making it suitable for virtually all real-world applications. When the sample size is large (typically N > 30), the t-distribution closely approximates the normal distribution, and the two tests yield nearly identical results. In APA format, you would report either test the same way, replacing the t symbol with z if appropriate: z = 2.45, p = .014.
Can I use a one-sample t-test with a small sample (N < 10)?
Technically yes, but several considerations apply. The normality assumption becomes critical with very small samples because the Central Limit Theorem provides minimal protection. Verify normality with a Shapiro-Wilk test and Q-Q plot. Even if the data appear normal, the t-test will have low statistical power, meaning you may fail to detect real effects. Consider whether a non-parametric alternative (Wilcoxon signed-rank test) is more appropriate, and report the power analysis to contextualize your findings.
How do I choose the test value (mu) for a one-sample t-test?
The test value must be theoretically or practically justified. Common sources include published population norms (e.g., IQ mean of 100), regulatory standards (e.g., maximum allowable 500 mg), scale midpoints (e.g., 4.0 on a 7-point scale), or values from prior research. Avoid selecting the test value based on your data, as this invalidates the test. If you do not have a clear justification for a specific test value, you may need a different research design entirely.
What sample size do I need for a one-sample t-test?
For a two-tailed test with alpha = .05 and power = .80, the required sample sizes by effect size are approximately: d = 0.20 (small) requires N = 199; d = 0.50 (medium) requires N = 34; d = 0.80 (large) requires N = 15. These are guidelines. Use a formal power analysis to determine the exact sample size for your expected effect size and desired power level.
Is the one-sample t-test robust to non-normality?
The one-sample t-test is moderately robust to non-normality, especially as sample size increases. With N > 30, the Central Limit Theorem ensures the sampling distribution of the mean approaches normality regardless of the population distribution shape. However, with small samples and severely skewed or heavy-tailed distributions, the test can produce misleading p-values. In such cases, the Wilcoxon signed-rank test or bootstrapped confidence intervals are preferable.
Should I report Cohen's d or Hedges' g for a one-sample t-test?
For a one-sample t-test, Cohen's d is the standard and most widely recognized effect size measure. Hedges' g applies a small-sample correction that becomes negligible with N > 20. If your sample is very small (N < 20), Hedges' g provides a less biased estimate. In most cases, Cohen's d is sufficient and expected by reviewers. Report whichever you use and name it explicitly.
Can I use a one-sample t-test for pre-post comparisons?
No. If you measure the same participants at two time points, use a paired-samples t-test. A one-sample t-test on post-test scores compared to the pre-test mean ignores the paired structure of the data, discards information about individual change, and typically has lower power. The only scenario where a one-sample t-test is appropriate for change is when you compute difference scores and test whether the mean difference differs from zero — but this is mathematically equivalent to the paired-samples t-test.
How do I report a one-sample t-test in a table?
When reporting multiple one-sample t-tests (e.g., comparing several subscale means to a norm), use a table format:
| Variable | M | SD | t(df) | p | d | 95% CI | |----------|-----|------|---------|-----|-----|--------| | Subscale A | 52.30 | 8.40 | t(49) = 1.94 | .058 | 0.27 | [-0.09, 4.69] | | Subscale B | 55.10 | 7.20 | t(49) = 5.01 | < .001 | 0.71 | [3.05, 7.15] | | Subscale C | 48.80 | 9.10 | t(49) = -0.93 | .356 | -0.13 | [-3.76, 1.36] |
Note: All subscale scores compared to the published norm of 50. Include a note below the table specifying the test value and its source.
Try StatMate's Free One-Sample t-Test Calculator
Formatting one-sample t-test results by hand is tedious and error-prone. StatMate's One-Sample t-Test Calculator automatically generates publication-ready APA output with Cohen's d, confidence intervals, and assumption checks.
What you get for free:
- APA 7th edition formatted results, ready to copy into your manuscript
- Cohen's d effect size with interpretation
- 95% confidence interval of the mean difference
- Shapiro-Wilk normality test
- Visual distribution chart
- PDF export of complete results
Pro features:
- AI-powered plain-language interpretation of your results
- Word (.docx) export with APA formatting preserved
- No ads
Paste your data, enter the test value, and get your APA results in seconds at statmate.org/calculators/one-sample-t.