When should I use a t-test?

Use a t-test to compare means of two groups. An independent samples t-test compares two different groups (e.g., treatment vs control), while a paired t-test compares the same group at two time points (e.g., pre-test vs post-test).

What is the difference between Welch's and Student's t-test?

Student's t-test assumes equal variances between groups, while Welch's t-test does not. StatMate uses Welch's t-test by default, as recommended by the APA.

How do I interpret Cohen's d effect size?

Cohen's d = 0.2 is a small effect, 0.5 is medium, and 0.8+ is large. Effect size tells you the practical magnitude of the difference, unlike p-values which only indicate statistical significance.

Can I use a t-test with small sample sizes?

Yes, a t-test works with as few as 2 observations per group. However, with samples under 30, check the normality assumption. If normality is violated, consider the Mann-Whitney U test instead.

T-Test Calculator

Compare means between two groups using an independent or paired samples t-test. Results are formatted in APA 7th edition style.

What is a T-Test?

A t-test is a statistical test used to compare the means of two groups and determine if they are significantly different from each other. Developed by William Sealy Gosset in 1908 under the pseudonym "Student," the t-test is one of the most commonly used statistical tests in social science, psychology, medicine, and education research. It answers a simple question: is the difference between two group means likely due to a real effect, or just random chance?

Independent Samples T-Test

Use an independent samples t-test when comparing means from two different, unrelated groups. For example, comparing test scores between a treatment group and a control group, or comparing salary between male and female employees. This calculator uses Welch's t-test by default, which does not assume equal variances and is recommended by the American Psychological Association as the default approach.

Paired Samples T-Test

Use a paired samples t-test when comparing means from the same group at two different times (pre-test vs post-test) or when participants are matched on key variables. The paired t-test accounts for the correlation between measurements, making it more powerful than an independent samples test when the design allows it. Common examples include before/after intervention studies and within-subjects experimental designs.

Worked Example: Independent Samples T-Test

A researcher wants to test whether a new teaching method improves exam scores. 15 students use the new method (experimental group) and 15 use the traditional method (control group).

Experimental Group (n=15)

85, 90, 78, 92, 88, 95, 82, 91, 87, 93, 86, 89, 94, 80, 91

M = 88.07, SD = 4.94

Control Group (n=15)

78, 82, 75, 80, 77, 83, 79, 81, 76, 84, 73, 80, 82, 77, 79

M = 79.07, SD = 3.15

Results

t(23.47) = 5.87, p < .001, d = 2.15, 95% CI [5.82, 12.18]

The experimental group scored significantly higher than the control group, with a very large effect size (Cohen's d = 2.15).

When to Use a T-Test vs. Other Tests

Situation	Recommended Test
Comparing 2 independent group means	Independent samples t-test
Comparing pre/post scores (same group)	Paired samples t-test
Comparing 3+ group means	One-way ANOVA
Non-normal data, 2 groups	Mann-Whitney U test
Non-normal paired data	Wilcoxon signed-rank test

Assumptions of the T-Test

Before interpreting your results, verify these assumptions are met:

1. Scale of Measurement

The dependent variable must be continuous (interval or ratio scale). If your data are ordinal (e.g., Likert scales), consider a non-parametric alternative.

2. Random Sampling

Data should be collected from a representative, randomly selected portion of the population.

3. Normality

Each group's data should be approximately normally distributed. With sample sizes above 30 per group, the t-test is robust to violations of normality due to the Central Limit Theorem. For smaller samples, check normality using the Shapiro-Wilk test.

4. Homogeneity of Variance (for Student's t)

The two groups should have approximately equal variances. StatMate uses Welch's t-test by default, which does not require this assumption and is recommended for general use.

Understanding Cohen's d Effect Size

While p-values tell you whether a difference is statistically significant, Cohen's d tells you how large the difference is in practical terms. This is critical because with large sample sizes, even tiny, meaningless differences can be "significant."

Cohen's d	Interpretation	Practical Meaning
0.2	Small	Difference noticeable only with careful measurement
0.5	Medium	Difference visible to the naked eye
0.8	Large	Substantial, obvious difference
1.2+	Very Large	Very strong effect, hard to miss

How to Report T-Test Results in APA Format

According to APA 7th edition guidelines, t-test results should include the t-statistic, degrees of freedom, p-value, effect size, and confidence interval. Here are templates you can use:

Independent Samples

An independent-samples t-test revealed that the experimental group (M = 88.07, SD = 4.94) scored significantly higher than the control group (M = 79.07, SD = 3.15),t(23.47) = 5.87, p < .001, d = 2.15, 95% CI [5.82, 12.18].

Paired Samples

A paired-samples t-test indicated that post-test scores (M = 82.40, SD = 6.12) were significantly higher than pre-test scores (M = 75.60, SD = 7.35),t(24) = 4.32, p < .001, d = 0.86.

Note: Report t-values and degrees of freedom to two decimal places. Report p-values to three decimal places, except usep < .001 when the value is below .001. Always include an effect size measure.

Common Mistakes to Avoid

Reporting p = .000: Statistical software sometimes displays p = .000, but you should report this as p < .001. A p-value is never exactly zero.
Ignoring effect size: A statistically significant result with d = 0.1 may not be practically meaningful. Always report and interpret the effect size.
Using a t-test for 3+ groups: If you have three or more groups, use ANOVA instead. Running multiple t-tests inflates your Type I error rate.
Assuming equal variances: Unless you have strong reason to assume equal variances, use Welch's t-test (the default in StatMate).
Confusing statistical significance with practical importance: A p < .05 result does not automatically mean the finding is important or clinically relevant.

Calculation Accuracy

StatMate's t-test calculations have been validated against R (t.test function) and SPSS output. We use the jstat library for probability distributions and implement Welch's t-test with Welch-Satterthwaite degrees of freedom approximation. All results match R output to at least 4 decimal places.

Try Other Calculators

ANOVA

Compare means across 3+ groups

Chi-Square

Test categorical associations

Correlation

Measure relationship strength

Descriptive

Summarize your data

Sample Size

Power analysis & sample planning

One-Sample T

Test against a known value

Mann-Whitney U

Non-parametric group comparison

Wilcoxon

Non-parametric paired test

Regression

Model X-Y relationships

Multiple Regression

Multiple predictors

Cronbach's Alpha

Scale reliability

Logistic Regression

Binary outcome prediction

Factor Analysis

Explore latent factor structure

Kruskal-Wallis

Non-parametric 3+ group comparison

Repeated Measures

Within-subjects ANOVA

Two-Way ANOVA

Factorial design analysis

Friedman Test

Non-parametric repeated measures

Fisher's Exact

Exact test for 2×2 tables

McNemar Test

Paired nominal data test