Non-parametric alternative to the paired samples t-test. Compare two related measurements without assuming normal distribution.
The Wilcoxon signed-rank test is a non-parametric statistical test used to compare two related samples, matched samples, or repeated measurements on a single sample. Developed by Frank Wilcoxon in 1945, it serves as the non-parametric alternative to the paired samples t-test. Instead of comparing means (which requires normally distributed data), the Wilcoxon test works with the ranks of the differences between paired observations, making it appropriate when your data violate normality assumptions or when you are working with ordinal data.
Use this test when you have paired or repeated-measures data and cannot assume normality. Common scenarios include pre-test/post-test designs where scores are not normally distributed, Likert scale data from surveys (ordinal data), small sample sizes where normality is difficult to verify, and any before/after study where you want a more robust analysis that is less sensitive to outliers.
The key difference between these two tests lies in their assumptions. The paired t-test assumes that the differences between pairs are normally distributed, while the Wilcoxon signed-rank test only assumes that the distribution of differences is symmetric. This makes the Wilcoxon test more versatile, though when normality holds, the paired t-test is slightly more powerful (i.e., better at detecting real differences). As a rule of thumb: if your data are clearly normal, use the paired t-test; if there is any doubt about normality or your data are ordinal, use the Wilcoxon test.
A therapist measures anxiety scores (on a 1–100 scale) for 10 patients before and after a 6-week treatment program. Because the sample is small and the distribution is unknown, the Wilcoxon signed-rank test is chosen over the paired t-test.
Pre-Treatment (n=10)
72, 85, 91, 68, 77, 83, 95, 88, 74, 79
Mdn = 81.00
Post-Treatment (n=10)
78, 89, 95, 73, 82, 87, 98, 92, 79, 83
Mdn = 85.00
Results
W = 0.0, z = −2.80, p = .005, rank-biserial r = 1.00
The Wilcoxon signed-rank test indicated that post-treatment scores were significantly higher than pre-treatment scores, with a large effect size. All 10 patients showed improvement after the 6-week program.
| Situation | Recommended Test |
|---|---|
| Paired data, normal differences | Paired samples t-test |
| Paired data, non-normal or ordinal | Wilcoxon signed-rank test |
| Two independent groups, normal data | Independent samples t-test |
| Two independent groups, non-normal | Mann-Whitney U test |
| 3+ related groups, non-normal | Friedman test |
| 3+ independent groups, non-normal | Kruskal-Wallis test |
While the Wilcoxon test has fewer assumptions than the paired t-test, it still requires certain conditions to be met:
1. Paired Observations
Data must consist of paired observations — either repeated measures on the same subjects (pre/post) or matched pairs. Each pair produces one difference score.
2. Ordinal or Continuous Scale
The dependent variable must be measured on at least an ordinal scale, so that differences can be meaningfully ranked. The test does not require interval or ratio data, unlike the paired t-test.
3. Symmetric Distribution of Differences
The distribution of the differences between pairs should be approximately symmetric around the median. This is a weaker assumption than normality. If the distribution of differences is highly skewed, consider the sign test instead, which makes no symmetry assumption at all.
4. Independence Between Pairs
Each pair of observations must be independent of every other pair. The measurements within a pair are related (that is the whole point), but different pairs should not influence each other.
The rank-biserial correlation (r) is the recommended effect size measure for the Wilcoxon signed-rank test. It ranges from −1 to +1, where values near ±1 indicate that nearly all pairs changed in the same direction, and values near 0 indicate no consistent direction of change. It is calculated as (W+ − W−) / (W+ + W−).
| |r| | Interpretation | Practical Meaning |
|---|---|---|
| < 0.1 | Negligible | No meaningful directional trend |
| 0.1 – 0.3 | Small | Slight tendency in one direction |
| 0.3 – 0.5 | Medium | Noticeable directional pattern |
| ≥ 0.5 | Large | Strong, consistent directional change |
According to APA 7th edition guidelines, Wilcoxon signed-rank test results should include the test statistic (W or T), z-approximation, p-value, effect size, and relevant descriptive statistics (medians). Here are templates you can use:
Template
A Wilcoxon signed-rank test indicated that post-test scores (Mdn = [value]) were [significantly/not significantly] different from pre-test scores (Mdn = [value]), W = [value], z = [value], p = [value], r = [value].
Real Example
A Wilcoxon signed-rank test indicated that post-treatment anxiety scores (Mdn = 85.00) were significantly lower than pre-treatment scores (Mdn = 81.00), W = 0.0, z = −2.80, p = .005, r = 1.00. The large effect size indicates that the treatment produced a consistent improvement across all patients.
Note: Report W to one decimal place and z to two decimal places. Report p-values to three decimal places, except use p < .001 when the value is below .001. Always include an effect size measure (rank-biserial r).
StatMate's Wilcoxon signed-rank test calculations have been validated against R's wilcox.test() function and SPSS output. We use the normal approximation with continuity correction, proper tie handling via average ranks, and the jstat library for normal distribution probabilities. The rank-biserial correlation is computed following Kerby (2014). All results match R output to at least 4 decimal places.
T-Test
Compare means between two groups
ANOVA
Compare means across 3+ groups
Chi-Square
Test categorical associations
Correlation
Measure relationship strength
Descriptive
Summarize your data
Sample Size
Power analysis & sample planning
One-Sample T
Test against a known value
Mann-Whitney U
Non-parametric group comparison
Regression
Model X-Y relationships
Multiple Regression
Multiple predictors
Cronbach's Alpha
Scale reliability
Logistic Regression
Binary outcome prediction
Factor Analysis
Explore latent factor structure
Kruskal-Wallis
Non-parametric 3+ group comparison
Repeated Measures
Within-subjects ANOVA
Two-Way ANOVA
Factorial design analysis
Friedman Test
Non-parametric repeated measures
Fisher's Exact
Exact test for 2×2 tables
McNemar Test
Paired nominal data test
Paste from Excel/Sheets or drop a CSV file
Paste from Excel/Sheets or drop a CSV file
Enter your data and click Calculate
or click "Load Example" to try it out