How many predictors can I include?

The general rule is to maintain a sample-size-to-predictor ratio (N/k) of at least 10. For example, with 100 observations you can include up to 10 predictors. Including too many predictors relative to your sample size leads to overfitting and unreliable coefficient estimates.

What is multicollinearity and how do I check it?

Multicollinearity occurs when predictor variables are highly correlated with each other, inflating standard errors and making coefficient estimates unstable. Check the Variance Inflation Factor (VIF) provided by StatMate — values above 10 indicate problematic multicollinearity that should be addressed by removing or combining correlated predictors.

How do I report multiple regression results in APA format?

APA 7th edition requires reporting the F-statistic, degrees of freedom, p-value, R², adjusted R², and each predictor's B, β, t, and p values. StatMate automatically generates APA-formatted results that you can copy directly into your research paper or thesis.

重回帰分析計算ツール

OLS回帰を使用して複数の予測変数から結果変数を予測します。結果にはR²、VIF付き係数、分散分析表、APA形式の出力が含まれます。

What is Multiple Regression Analysis?

Multiple regression analysis is a statistical technique used to examine the relationship between two or more independent variables (predictors) and a single continuous dependent variable (outcome). While simple regression models the effect of a single predictor, multiple regression incorporates several predictors simultaneously—allowing researchers to assess each variable's unique contribution while controlling for the others. The method uses Ordinary Least Squares (OLS) estimation, which finds the set of coefficients that minimizes the sum of squared residuals between observed and predicted values.

The general equation is Y = b₀ + b₁X₁ + b₂X₂ + … + b_kX_k + e, where b₀ is the intercept, b₁…b_k are the unstandardized regression coefficients, and e is the residual error. Multiple regression is appropriate when you want to predict an outcome from multiple factors, understand the relative importance of different predictors, or estimate a variable's effect while holding others constant.

Key Statistics

R² and Adjusted R²

R² represents the proportion of variance in the dependent variable explained by the model. However, it always increases when predictors are added—even irrelevant ones. Adjusted R² penalizes for the number of predictors, making it more suitable for comparing models with different numbers of variables.

F-Test and Individual t-Tests

The F-test evaluates whether the overall model is significant (i.e., whether at least one predictor has a non-zero effect). Individual t-tests then assess each predictor's unique contribution while controlling for all other predictors in the model.

Standardized Coefficients (β) and VIF

Unstandardized coefficients (B) are interpreted in the original units. Standardized coefficients (β) allow direct comparison of relative predictor importance. The Variance Inflation Factor (VIF) detects multicollinearity—values above 10 indicate problematic collinearity that inflates standard errors and destabilizes coefficient estimates.

Durbin-Watson Statistic

The Durbin-Watson statistic tests for autocorrelation in residuals, ranging from 0 to 4. Values near 2 indicate no autocorrelation. Values near 0 suggest positive autocorrelation; values near 4 suggest negative autocorrelation. The acceptable range is typically 1.5–2.5.

Multiple Regression vs. Other Methods

Method	Predictors	Outcome	Use Case
Simple Regression	1 continuous	Continuous	Single predictor-outcome relationship
Multiple Regression	2+ continuous	Continuous	Simultaneous effects of multiple predictors
Logistic Regression	Continuous / categorical	Binary (0/1)	Predicting binary outcomes (pass/fail)
ANOVA	Categorical (groups)	Continuous	Comparing means across 3+ groups

Assumptions

1. Linearity

The relationship between each predictor and the outcome must be linear. Check residual-vs-predicted plots for curvilinear patterns.

2. Independence

Observations must be independent. Verify with Durbin-Watson (1.5–2.5). Violations are common in time-series and clustered data.

3. Normality of Residuals

Residuals should be approximately normally distributed. Robust to violations with larger samples (N ≥ 30) due to the Central Limit Theorem.

4. Homoscedasticity

Residual variance should be constant across all predicted values. A "funnel shape" in residual plots indicates heteroscedasticity.

5. No Multicollinearity (VIF < 10)

Predictors should not be highly correlated with each other. Check VIF values and correlation matrices. High multicollinearity inflates standard errors and makes coefficient estimates unstable.

6. No Autocorrelation (Durbin-Watson ≈ 2)

Residuals should not be correlated with each other. Particularly important for time-series data. Use GLS or add lagged variables if violated.

APA Reporting Format

Example

A multiple regression analysis was conducted to predict GPA from study hours, sleep hours, and attendance rate. The model was statistically significant, F(3, 26) = 22.29, p < .001, R² = .72, adjusted R² = .69, explaining approximately 72% of the variance in GPA. Study hours (B = 0.055, β = .49, t = 5.50, p < .001), attendance rate (B = 0.018, β = .33, t = 4.50, p < .001), and sleep hours (B = 0.112, β = .21, t = 2.95, p = .007) were all significant predictors.

Common Mistakes to Avoid

Overfitting: Including too many predictors relative to sample size. Maintain N/k > 10 (sample size to predictor ratio). With 5 predictors, you need at least 50 observations.
Ignoring multicollinearity: Failing to check VIF values can lead to sign reversals and inflated standard errors in coefficients.
Confusing B and β: Use B for unit-based interpretation and β for comparing relative importance across predictors.
Stepwise regression pitfalls: Automated variable selection produces sample-specific results with low cross-validity. Prefer theory-driven variable selection.
Causal over-interpretation: Regression shows association, not causation. Use "predicts" rather than "causes" unless your design supports causal claims.

Calculation Accuracy

StatMate's multiple regression calculations have been validated against R's lm() function and SPSS regression output. We use OLS estimation with the jstat library for F- and t-distributions. All coefficients, standard errors, t-statistics, p-values, R², adjusted R², F-statistics, VIF, and Durbin-Watson values match R and SPSS output to at least 4 decimal places.

他の計算ツールを試す

t検定

2群の平均値を比較

分散分析

3群以上の平均値を比較

カイ二乗検定

カテゴリ変数の関連を検定

相関分析

関係の強さを測定

記述統計

データを要約

サンプルサイズ

検出力分析・標本計画

1標本t検定

既知の値との比較

マン・ホイットニーU

ノンパラメトリック群間比較

ウィルコクソン検定

ノンパラメトリック対応検定

回帰分析

X-Yの関係をモデル化

クロンバックのα

尺度の信頼性

ロジスティック回帰

二値アウトカムの予測

因子分析

潜在因子構造の探索

クラスカル・ウォリス

ノンパラメトリック3群以上比較

反復測定

被験者内分散分析

二元配置分散分析

要因計画の分析

フリードマン検定

ノンパラメトリック反復測定

フィッシャーの正確検定

2×2表の正確検定

マクネマー検定

対応のある名義データの検定

What is Multiple Regression Analysis?

Key Statistics

R² and Adjusted R²

F-Test and Individual t-Tests

Standardized Coefficients (β) and VIF

Durbin-Watson Statistic

Multiple Regression vs. Other Methods

Method	Predictors	Outcome	Use Case
Simple Regression	1 continuous	Continuous	Single predictor-outcome relationship
Multiple Regression	2+ continuous	Continuous	Simultaneous effects of multiple predictors
Logistic Regression	Continuous / categorical	Binary (0/1)	Predicting binary outcomes (pass/fail)
ANOVA	Categorical (groups)	Continuous	Comparing means across 3+ groups

Assumptions

1. Linearity

The relationship between each predictor and the outcome must be linear. Check residual-vs-predicted plots for curvilinear patterns.

2. Independence

Observations must be independent. Verify with Durbin-Watson (1.5–2.5). Violations are common in time-series and clustered data.

3. Normality of Residuals

Residuals should be approximately normally distributed. Robust to violations with larger samples (N ≥ 30) due to the Central Limit Theorem.

4. Homoscedasticity

Residual variance should be constant across all predicted values. A "funnel shape" in residual plots indicates heteroscedasticity.

5. No Multicollinearity (VIF < 10)

Predictors should not be highly correlated with each other. Check VIF values and correlation matrices. High multicollinearity inflates standard errors and makes coefficient estimates unstable.

6. No Autocorrelation (Durbin-Watson ≈ 2)

Residuals should not be correlated with each other. Particularly important for time-series data. Use GLS or add lagged variables if violated.

APA Reporting Format

Example

Common Mistakes to Avoid

Overfitting: Including too many predictors relative to sample size. Maintain N/k > 10 (sample size to predictor ratio). With 5 predictors, you need at least 50 observations.

Ignoring multicollinearity: Failing to check VIF values can lead to sign reversals and inflated standard errors in coefficients.

Confusing B and β: Use B for unit-based interpretation and β for comparing relative importance across predictors.

Stepwise regression pitfalls: Automated variable selection produces sample-specific results with low cross-validity. Prefer theory-driven variable selection.

Causal over-interpretation: Regression shows association, not causation. Use "predicts" rather than "causes" unless your design supports causal claims.