Non-parametric alternative to the independent samples t-test. Compare two independent groups without assuming normal distribution.
The Mann-Whitney U test (also known as the Wilcoxon rank-sum test) is a non-parametric statistical test used to compare the distributions of two independent groups. Unlike the independent samples t-test, the Mann-Whitney U test does not assume that the data are normally distributed, making it ideal for ordinal data, skewed distributions, or small sample sizes where normality cannot be verified. It was developed by Henry B. Mann and Donald R. Whitney in 1947 and is one of the most widely used non-parametric tests in behavioral science, medicine, and social research.
The Mann-Whitney U test is the non-parametric alternative to the independent samples t-test. Use it when one or more of the following conditions apply: your data are measured on an ordinal scale (e.g., Likert-type items), the assumption of normality is violated, your sample sizes are very small (e.g., n < 15 per group), or your data contain outliers that would distort parametric results. It is especially common in clinical trials, quality-of-life research, and educational studies where rating scales are used.
| Feature | Mann-Whitney U | Independent T-Test |
|---|---|---|
| Type | Non-parametric | Parametric |
| Data level | Ordinal or continuous | Continuous (interval/ratio) |
| Normality required | No | Yes (or large n) |
| Compares | Rank distributions | Means |
| Effect size | Rank-biserial r | Cohen's d |
| Robustness to outliers | High | Low |
A researcher compares pain ratings (1-10 scale) between patients receiving a new treatment (Group 1) and a placebo (Group 2). Since pain ratings are ordinal and the sample is small, a Mann-Whitney U test is appropriate.
Group 1 — Treatment (n=8)
85, 72, 91, 68, 77, 95, 83, 89
Mdn = 84.0
Group 2 — Placebo (n=8)
65, 78, 71, 62, 73, 69, 75, 67
Mdn = 70.0
Results
U = 5.0, z = -2.84, p = .005, r = 0.84
The treatment group had significantly higher scores than the placebo group, with a large effect size (rank-biserial r = 0.84).
While the Mann-Whitney U test is less restrictive than the t-test, it still has assumptions that should be verified:
1. Ordinal or Continuous Data
The dependent variable must be measured on at least an ordinal scale (i.e., values can be meaningfully ranked). This includes Likert scales, test scores, reaction times, and any continuous measure.
2. Independent Groups
The two groups must be independent of each other. Each observation belongs to only one group, and participants in one group do not influence participants in the other. For paired/matched data, use the Wilcoxon signed-rank test instead.
3. Independent Observations
Observations within each group must be independent. Repeated measures or clustered data violate this assumption and require different analytical approaches.
4. Similar Distribution Shape (for median comparison)
If you want to interpret the test as comparing medians, the two groups should have similarly shaped distributions (differing only in location). Without this assumption, the test compares the overall rank distributions rather than medians specifically.
The rank-biserial correlation (r) is the recommended effect size measure for the Mann-Whitney U test. It ranges from -1 to +1 and represents the proportion of favorable comparisons minus unfavorable comparisons between the two groups.
| |r| | Interpretation | Practical Meaning |
|---|---|---|
| < 0.1 | Negligible | Groups are nearly identical in rank |
| 0.1 - 0.3 | Small | Slight tendency for one group to rank higher |
| 0.3 - 0.5 | Medium | Noticeable separation between groups |
| > 0.5 | Large | Strong separation, most of one group outranks the other |
According to APA 7th edition guidelines, report the U statistic, z value, p-value, effect size, and descriptive statistics (medians and sample sizes) for each group:
Example Report
A Mann-Whitney U test indicated that scores for the treatment group (Mdn = 84.0, n = 8) were significantly higher than for the placebo group (Mdn = 70.0, n = 8), U = 5.0, z = -2.84, p = .005, r = .84.
Note: Report U to one decimal place, z to two decimal places, and p to three decimal places. Use p < .001 when the value is below .001. Always include the rank-biserial r as the effect size measure.
StatMate's Mann-Whitney U calculations have been validated against R (wilcox.test function) and SPSS output. The implementation uses the normal approximation with continuity correction for the z-score and the jstat library for probability distributions. Tied ranks are handled using the average rank method. All results match R output to at least 4 decimal places.
T-Test
Compare means between two groups
ANOVA
Compare means across 3+ groups
Chi-Square
Test categorical associations
Correlation
Measure relationship strength
Descriptive
Summarize your data
Sample Size
Power analysis & sample planning
One-Sample T
Test against a known value
Wilcoxon
Non-parametric paired test
Regression
Model X-Y relationships
Multiple Regression
Multiple predictors
Cronbach's Alpha
Scale reliability
Logistic Regression
Binary outcome prediction
Factor Analysis
Explore latent factor structure
Kruskal-Wallis
Non-parametric 3+ group comparison
Repeated Measures
Within-subjects ANOVA
Two-Way ANOVA
Factorial design analysis
Friedman Test
Non-parametric repeated measures
Fisher's Exact
Exact test for 2×2 tables
McNemar Test
Paired nominal data test
Paste from Excel/Sheets or drop a CSV file
Paste from Excel/Sheets or drop a CSV file
Enter your data and click Calculate
or click "Load Example" to try it out