Predict an outcome from multiple predictors using OLS regression. Results include R², coefficients with VIF, ANOVA table, and APA-formatted output.
Multiple regression analysis is a statistical technique used to examine the relationship between two or more independent variables (predictors) and a single continuous dependent variable (outcome). While simple regression models the effect of a single predictor, multiple regression incorporates several predictors simultaneously—allowing researchers to assess each variable's unique contribution while controlling for the others. The method uses Ordinary Least Squares (OLS) estimation, which finds the set of coefficients that minimizes the sum of squared residuals between observed and predicted values.
The general equation is Y = b0 + b1X1 + b2X2 + … + bkXk + e, where b0 is the intercept, b1…bk are the unstandardized regression coefficients, and e is the residual error. Multiple regression is appropriate when you want to predict an outcome from multiple factors, understand the relative importance of different predictors, or estimate a variable's effect while holding others constant.
R² and Adjusted R²
R² represents the proportion of variance in the dependent variable explained by the model. However, it always increases when predictors are added—even irrelevant ones. Adjusted R² penalizes for the number of predictors, making it more suitable for comparing models with different numbers of variables.
F-Test and Individual t-Tests
The F-test evaluates whether the overall model is significant (i.e., whether at least one predictor has a non-zero effect). Individual t-tests then assess each predictor's unique contribution while controlling for all other predictors in the model.
Standardized Coefficients (β) and VIF
Unstandardized coefficients (B) are interpreted in the original units. Standardized coefficients (β) allow direct comparison of relative predictor importance. The Variance Inflation Factor (VIF) detects multicollinearity—values above 10 indicate problematic collinearity that inflates standard errors and destabilizes coefficient estimates.
Durbin-Watson Statistic
The Durbin-Watson statistic tests for autocorrelation in residuals, ranging from 0 to 4. Values near 2 indicate no autocorrelation. Values near 0 suggest positive autocorrelation; values near 4 suggest negative autocorrelation. The acceptable range is typically 1.5–2.5.
| Method | Predictors | Outcome | Use Case |
|---|---|---|---|
| Simple Regression | 1 continuous | Continuous | Single predictor-outcome relationship |
| Multiple Regression | 2+ continuous | Continuous | Simultaneous effects of multiple predictors |
| Logistic Regression | Continuous / categorical | Binary (0/1) | Predicting binary outcomes (pass/fail) |
| ANOVA | Categorical (groups) | Continuous | Comparing means across 3+ groups |
1. Linearity
The relationship between each predictor and the outcome must be linear. Check residual-vs-predicted plots for curvilinear patterns.
2. Independence
Observations must be independent. Verify with Durbin-Watson (1.5–2.5). Violations are common in time-series and clustered data.
3. Normality of Residuals
Residuals should be approximately normally distributed. Robust to violations with larger samples (N ≥ 30) due to the Central Limit Theorem.
4. Homoscedasticity
Residual variance should be constant across all predicted values. A "funnel shape" in residual plots indicates heteroscedasticity.
5. No Multicollinearity (VIF < 10)
Predictors should not be highly correlated with each other. Check VIF values and correlation matrices. High multicollinearity inflates standard errors and makes coefficient estimates unstable.
6. No Autocorrelation (Durbin-Watson ≈ 2)
Residuals should not be correlated with each other. Particularly important for time-series data. Use GLS or add lagged variables if violated.
Example
A multiple regression analysis was conducted to predict GPA from study hours, sleep hours, and attendance rate. The model was statistically significant, F(3, 26) = 22.29, p < .001, R² = .72, adjusted R² = .69, explaining approximately 72% of the variance in GPA. Study hours (B = 0.055, β = .49, t = 5.50, p < .001), attendance rate (B = 0.018, β = .33, t = 4.50, p < .001), and sleep hours (B = 0.112, β = .21, t = 2.95, p = .007) were all significant predictors.
StatMate's multiple regression calculations have been validated against R's lm() function and SPSS regression output. We use OLS estimation with the jstat library for F- and t-distributions. All coefficients, standard errors, t-statistics, p-values, R², adjusted R², F-statistics, VIF, and Durbin-Watson values match R and SPSS output to at least 4 decimal places.
T-Test
Compare means between two groups
ANOVA
Compare means across 3+ groups
Chi-Square
Test categorical associations
Correlation
Measure relationship strength
Descriptive
Summarize your data
Sample Size
Power analysis & sample planning
One-Sample T
Test against a known value
Mann-Whitney U
Non-parametric group comparison
Wilcoxon
Non-parametric paired test
Regression
Model X-Y relationships
Cronbach's Alpha
Scale reliability
Logistic Regression
Binary outcome prediction
Factor Analysis
Explore latent factor structure
Kruskal-Wallis
Non-parametric 3+ group comparison
Repeated Measures
Within-subjects ANOVA
Two-Way ANOVA
Factorial design analysis
Friedman Test
Non-parametric repeated measures
Fisher's Exact
Exact test for 2×2 tables
McNemar Test
Paired nominal data test
Paste from Excel/Sheets or drop a CSV file
Paste from Excel/Sheets or drop a CSV file
Paste from Excel/Sheets or drop a CSV file
Enter your data and click Calculate
or click "Load Example" to try it out