What is the difference between the Wilcoxon signed-rank test and the Wilcoxon rank-sum test?

The Wilcoxon signed-rank test is for paired (related) samples, such as pre-test and post-test measurements from the same participants. The Wilcoxon rank-sum test (also called the Mann-Whitney U test) is for two independent groups. Despite sharing the Wilcoxon name, they test different hypotheses: the signed-rank test evaluates whether the median of paired differences is zero, while the rank-sum test evaluates whether one group tends to have larger values than the other.

Can I use the Wilcoxon signed-rank test with Likert scale data?

Yes. The Wilcoxon signed-rank test is appropriate for ordinal data, including individual Likert-type items. Because it operates on ranks rather than raw values, it does not require the equal-interval assumption that the paired t-test needs. However, if you have a composite scale computed from multiple Likert items (which approximates a continuous distribution), a paired t-test may be acceptable if differences are approximately normal.

What sample size do I need for the Wilcoxon signed-rank test?

There is no strict minimum, but at least 5-6 pairs are needed for the exact test to produce a significant result at alpha = .05. For adequate power to detect a medium effect (r = .30), aim for at least 25-30 pairs. The Z approximation becomes reliable with approximately 20 or more pairs. Always conduct a power analysis for your specific effect size and desired power level.

Should I report the exact or asymptotic p-value?

For small samples (fewer than approximately 20-25 pairs), report the exact p-value because the normal approximation may not be accurate. For larger samples, the asymptotic (Z-based) p-value is acceptable and is what most software outputs by default. If your software provides both, report the exact value for small samples and note which method was used.

How do I handle zero differences (ties with zero)?

Pairs with zero differences contribute no information about the direction of change and are excluded from the analysis by most software. Report the number of excluded pairs. The effective sample size for computing the effect size should reflect the number of non-zero pairs, though practices vary across sources.

Can I use the Wilcoxon test for more than two time points?

Not directly. The Wilcoxon signed-rank test compares exactly two related conditions. For three or more time points, use the Friedman test as the omnibus test, followed by pairwise Wilcoxon signed-rank tests with Bonferroni correction as post-hoc comparisons. Alternatively, conduct pairwise comparisons directly with an adjusted significance level.

What is the Hodges-Lehmann estimator and should I report it?

The Hodges-Lehmann estimator is the nonparametric equivalent of the mean difference. For paired data, it equals the median of all Walsh averages of the difference scores. Reporting it with a confidence interval is recommended because it provides a robust point estimate of the typical shift between conditions, supplementing the median difference and effect size with a measure of precision.

Is the Wilcoxon test assumption-free?

No. While the Wilcoxon signed-rank test does not assume normality of differences, it does assume that the paired differences are independent of each other, the differences are measured on at least an ordinal scale, and the distribution of differences is symmetric around the median (though this assumption is debated and the test is fairly robust to mild asymmetry). Violations of independence are more problematic than violations of symmetry.

Wilcoxon符号順位検定をAPA第7版で報告する方法 — 効果量・報告例

Wilcoxon符号順位検定の正しい報告が重要な理由

Wilcoxon符号順位検定は、対応のあるt検定の最も頻繁に使用されるノンパラメトリック代替法です。Frank Wilcoxon（1945）が開発したこの検定は、2つの関連する測定間の差の分布がゼロを中心に対称であるかどうかを、それらの差が正規分布に従うことを要求せずに評価します。

臨床試験、教育的介入、行動研究での広範な採用にもかかわらず、Wilcoxon符号順位検定は出版された文献で最も不一致に報告される統計量の一つです。よくあるエラーには、中央値ではなく平均値の報告、効果量の完全な省略、符号順位検定と順位和検定の混同、正確p値と漸近p値のどちらを使用したかの明記忘れなどがあります。

Wilcoxon符号順位検定計算ツールで試してみてください。

Wilcoxon検定と対応のあるt検定の使い分け

対応のあるt検定は、対応する観測間の差が正規分布していることを仮定します。この仮定が満たされない場合、Wilcoxon符号順位検定が正しい代替法です。以下のいずれかに該当する場合に使用してください：

順序従属変数。 リッカート型項目、疼痛重症度ランキング、満足度評定などの順序尺度で測定されている。順序データには平均値は意味を持ちません。
非正規な対の差。 差のスコアに対するShapiro-Wilk検定が p < .05を示すか、Q-Qプロットが重い裾、歪度、外れ値を示す。
小さいサンプルサイズ。 対の数が20〜25未満の場合、中心極限定理が平均差のサンプリング分布を十分に正規化しない可能性がある。
床/天井効果。 スコアが測定範囲の極端な値に集中し、t検定が信頼性高く扱えない分布を産出する。

完全な正規性のもとでは対応のあるt検定がわずかに高い統計的検出力を持ちますが、正規性が違反されている場合、Wilcoxonは外れ値や歪度に歪められないため、しばしばt検定を上回ります。

| 判断要因 | 対応のあるt検定を選択 | Wilcoxonを選択 | |---------|-------------------|--------------| | 差が正規分布 | はい | — | | 差が歪んでいるまたは重い裾 | — | はい | | 順序測定尺度 | — | はい | | 連続等間隔/比率尺度 | はい | — | | 差に外れ値あり | — | はい | | サンプルサイズ > 30対、軽度の非正規性 | はい（頑健） | どちらでも | | サンプルサイズ < 20対、正規性が不確か | — | はい |

検定統計量の理解：T、W、Z

Wilcoxon報告で最も混乱する側面の一つは、ソフトウェアパッケージや教科書間での記法の不一致です。

| 記号 | 慣例 | 使用ソフト | |------|------|----------| | T | 正の（または小さい方の）順位和 | 多くの統計教科書 | | W | 符号付き順位和 | R (wilcox.test)、一部の教科書 | | T+ | 特に正の順位和 | Siegel & Castellan記法 |

小標本（通常 n < 20）の場合、正確な検定統計量 T（または W）が報告されます。大標本の場合、ソフトウェアは順位和を正規近似を用いて Z 統計量に変換します。

APA報告テンプレート

小標本の場合（正確検定）

A Wilcoxon signed-rank test indicated that post-intervention scores (Mdn = 4.50) were significantly higher than pre-intervention scores (Mdn = 3.00), T = 45, p = .012, r = .48.

大標本の場合（Z近似）

A Wilcoxon signed-rank test showed a statistically significant change in pain ratings from baseline (Mdn = 7.00, IQR = 5.00-8.00) to follow-up (Mdn = 4.00, IQR = 3.00-6.00), Z = -3.41, p < .001, r = .54.

必須要素チェックリスト

すべてのWilcoxon APA報告には以下を含める必要があります：

完全な検定名：初出時に（Wilcoxon符号順位検定）
記述統計量： 各条件の中央値と四分位範囲、平均値ではなく
検定統計量： サンプルサイズとソフトウェアに応じて T、W、または Z
正確なp値（または非常に小さい値には p < .001）
効果量： 順位双列相関（r）
差の方向性の明示的な記述

効果量：順位双列相関（r）

Wilcoxon符号順位検定の標準的な効果量は、r と記号される順位双列相関です。

方法1：Z統計量から

最も広く使用される式：

r = Z / sqrt(N)

ここで N は対応する観測の総数です。

例： Z = -3.41、N = 40対の場合：

r = |-3.41| / sqrt(40) = 3.41 / 6.32 = 0.54

解釈の基準

| r 値 | 解釈 | |--------|------| | .10 | 小さい効果 | | .30 | 中程度の効果 | | .50 | 大きい効果 |

常に文脈の中で効果量を解釈してください。臨床研究では r = .20が臨床的に意味のある変化を表す場合があります。

ステップバイステップの報告例：介入前後（N = 20）

シナリオ

健康心理学者が、6週間の不眠症に対する認知行動療法（CBT-I）プログラムの前後で20名の患者の睡眠の質（1-10順序尺度）を測定します。

ステップ1：記述統計量の報告

介入前の睡眠の質の中央値は4.00（IQR = 3.00-5.00）、介入後の睡眠の質の中央値は7.00（IQR = 5.75-8.00）であった。

ステップ2：ノンパラメトリック検定の選択を正当化する

睡眠の質は順序尺度で測定され、Shapiro-Wilk検定により対の差の分布が正規性から有意に逸脱していたため（W = 0.88, p = .021）、対応のあるt検定の代わりにWilcoxon符号順位検定を使用した。

ステップ3：検定結果の報告

Wilcoxon符号順位検定により、CBT-I後の睡眠の質スコア（Mdn = 7.00, IQR = 5.75-8.00）はベースライン（Mdn = 4.00, IQR = 3.00-5.00）と比較して有意に高かった, Z = -3.72, p < .001, r = .83。これは大きい効果を表している。

完全なAPA段落

Wilcoxon符号順位検定を用いて、6週間のCBT-Iプログラムが自己報告の睡眠の質に与える効果を評価した（N = 20）。睡眠の質が順序尺度で測定され対の差が正規分布していなかったため（Shapiro-Wilk W = 0.88, p = .021）、ノンパラメトリック検定を選択した。介入前の睡眠の質の中央値は4.00（IQR = 3.00-5.00）、介入後の睡眠の質の中央値は7.00（IQR = 5.75-8.00）であった。Wilcoxon符号順位検定は睡眠の質の統計的に有意な改善を示した, Z = -3.72, p < .001, r = .83。20名の参加者のうち17名がスコアの改善を示し、2名が低下、1名が変化なしであった。効果量は介入の大きい実践的効果を示している。

非有意な結果の報告

Wilcoxon符号順位検定を実施し、研修ワークショップ前（Mdn = 5.00, IQR = 4.00-6.00）と後（Mdn = 5.00, IQR = 4.00-7.00）の自己効力感評定を比較した。検定は統計的に有意な変化を示さなかった, Z = -1.34, p = .180, r = .21。小さい効果量は、ワークショップが参加者の自己効力感にほとんど影響を与えなかったことを示唆している。

正確p値と漸近p値：使い分け

小標本（通常 n < 20〜25対）の場合、正規近似が正確でない可能性があるため、正確p値を報告すべきです。大標本の場合、漸近（Zベースの）p値で問題ありません。

小標本（正確検定）：

T = 12, p_exact = .023

大標本（Z近似）：

Z = -2.87, p = .004

信頼区間：Hodges-Lehmann推定量

APA第7版は信頼区間をますます推奨しています。Wilcoxon検定では、関連する信頼区間はHodges-Lehmann推定量（平均差のノンパラメトリック類似物）の周りに構成されます。

Wilcoxon符号順位検定は、ベースライン（Mdn = 7.00）から治療後（Mdn = 4.00）への疼痛スコアの統計的に有意な減少を示した, Z = -3.41, p < .001, r = .54。中央値差のHodges-Lehmann推定値は-2.50, 95% CI [-3.50, -1.75]であった。

同順位とゼロ差の扱い

対の差がゼロに等しい場合、これらの観測は通常分析から除外され、有効サンプルサイズが減少します。同順位の数を報告してください：

40対のうち3対がゼロの差を示し除外されたため、37対が分析に含まれた。

よくある間違いと回避方法

1. 中央値ではなく平均値を報告する

最も頻繁なエラーです。Wilcoxon検定は順位で操作するため、中央値とIQRが適切な記述統計量です。

2. 符号順位検定と順位和検定を混同する

Wilcoxon符号順位検定は対応のある標本用です。Wilcoxon順位和検定（Mann-Whitney U）は独立した群用です。初出時に常に完全な名称を記載してください。

3. 効果量の誤った計算

r = Z/sqrt(N) を計算する際に、対の数（N = 30）ではなく個人の総数（N = 60）を使用する
順位双列相関 r ではなくCohenの d を報告する
大きさを解釈する際にZの絶対値を使用することを忘れる

4. 同順位とゼロ差を無視する

除外されたゼロ差の対を報告し、広範な同順位がある場合にはそれを認めてください。

5. 正確検定と漸近検定の区別を欠く

小標本（n < 20〜25）には正確p値を使用してください。大標本にはZ近似で問題ありません。使用した方法を常に記載してください。

6. 効果量を省略する

APA第7版は、すべての推測統計検定に効果量を要求しています。Wilcoxon符号順位検定の標準的な指標は順位双列相関 r です。

Wilcoxon APAチェックリスト

原稿を提出する前に、Wilcoxonの結果セクションに以下が含まれていることを確認してください：

初出時の完全な検定名（Wilcoxon符号順位検定）
サンプルサイズ（N または対の数）
各条件の中央値（平均値ではなく）
各条件の四分位範囲（IQR）
明確にラベル付けされた検定統計量（T、W、または Z）
正確なp値（または p < .001）
効果量：順位双列相関（r）
効果量の解釈（小、中、大）
差の方向性の明示的な記述
ノンパラメトリック検定選択の正当化
多数の場合の同順位への対処
Hodges-Lehmann推定量の信頼区間（該当する場合）
改善、低下、変化なしを示した参加者数

よくある質問

Wilcoxon符号順位検定とWilcoxon順位和検定の違いは何ですか？

Wilcoxon符号順位検定は対応のある（関連した）標本用で、同一の参加者からの介入前後の測定などに使用します。Wilcoxon順位和検定（Mann-Whitney U検定）は2つの独立した群用です。

Wilcoxon符号順位検定にリッカート尺度データを使用できますか？

はい。Wilcoxon符号順位検定は順位で操作するため、個々のリッカート型項目を含む順序データに適しています。

Wilcoxon符号順位検定に必要なサンプルサイズは？

正確検定がアルファ = .05で有意な結果を産出するには、少なくとも5〜6対が必要です。中程度の効果（r = .30）を検出するための十分な検出力には、25〜30対を目標にしてください。

正確p値と漸近p値のどちらを報告すべきですか？

小標本（20〜25対未満）には正確p値を報告してください。大標本にはZ近似で問題ありません。

Wilcoxon検定は3つ以上の時点に使用できますか？

直接的には使用できません。3つ以上の関連条件の場合、オムニバス検定としてFriedman検定を使用し、その後Bonferroni修正によるWilcoxon符号順位検定を事後比較として実施してください。

StatMateの無料Wilcoxon計算ツール

Wilcoxonの結果を手動でフォーマットするのは面倒でエラーが発生しやすい作業です。StatMateのWilcoxon符号順位検定計算ツールはプロセス全体を自動化します：

即座のAPA出力。 対応データを入力すると、APA第7版基準にフォーマットされた Z、p、r 値を含む出版可能な結果段落が得られます。
自動効果量。 順位双列相関が計算され解釈されます。
仮定チェック。 明確な合格/不合格指標を含む対の差のShapiro-Wilk正規性検定。
視覚出力。 変化の方向性と大きさを示す対の差チャート。
ワンクリックエクスポート。 クリップボードにコピー、PDF、APA形式のWord文書（Pro）。

Wilcoxon計算ツールを開く