データに線形モデルを当てはめます。結果にはR²、F検定、回帰係数、散布図、APA形式の出力が含まれます。
Simple linear regression is a statistical method used to model the relationship between a single independent variable (X) and a dependent variable (Y) by fitting a straight line to the observed data. The regression equation takes the form ŷ = b₀ + b₁x, where b₀ is the y-intercept and b₁ is the slope of the regression line. This method estimates the parameters using ordinary least squares (OLS), which minimizes the sum of squared differences between observed and predicted values.
Regression analysis was pioneered by Sir Francis Galton in the 1880s during his studies of hereditary stature, where he observed that children's heights tended to "regress" toward the population mean. The mathematical framework was later formalized by Karl Pearson and Ronald Fisher, who developed the inferential statistics (F-test, t-tests for coefficients) used in modern regression analysis. Today, simple linear regression is one of the most fundamental tools in statistics, serving as the foundation for multiple regression, ANOVA, and many machine learning algorithms.
Slope (b₁)
The slope represents the expected change in Y for a one-unit increase in X. A positive slope indicates a positive relationship (as X increases, Y increases), while a negative slope indicates an inverse relationship. The slope is tested for significance using a t-test with n - 2 degrees of freedom.
Intercept (b₀)
The intercept is the predicted value of Y when X equals zero. In many practical situations, X = 0 may not be meaningful (e.g., predicting weight from height), so the intercept should be interpreted cautiously. Its primary role is to position the regression line correctly.
Standard Error of the Estimate
The standard error of the estimate (SEE) measures the average distance between observed values and the regression line. Smaller values indicate that the data points cluster more tightly around the line, suggesting better prediction accuracy.
R² represents the proportion of variance in the dependent variable that is explained by the independent variable. It ranges from 0 to 1, where 0 means the model explains none of the variability and 1 means it explains all of the variability. Adjusted R² accounts for the number of predictors and is particularly useful when comparing models.
| R² Value | Interpretation | Practical Meaning |
|---|---|---|
| < 0.10 | Very Weak | Model explains very little variance; X is a poor predictor |
| 0.10 – 0.30 | Weak | Small but potentially meaningful predictive power |
| 0.30 – 0.50 | Moderate | Meaningful prediction; useful for many social science applications |
| 0.50 – 0.70 | Strong | Substantial predictive accuracy; good model fit |
| > 0.70 | Very Strong | Excellent model fit; X is a strong predictor of Y |
Note: These thresholds are general guidelines. In fields like physics or engineering, R² values above 0.90 are common. In psychology and social sciences, R² values of 0.20–0.40 are often considered meaningful.
A researcher examines whether the number of hours spent studying predicts exam performance in a sample of 10 university students.
Study Hours (X)
1, 2, 3, 4, 5, 6, 7, 8, 9, 10
Exam Score (Y)
2.1, 4.0, 5.8, 8.2, 9.8, 12.1, 14.0, 15.9, 18.2, 19.8
Results
F(1, 8) = 2854.88, p < .001, R² = .997
ŷ = 0.04 + 1.97x
The model is statistically significant and explains 99.7% of the variance in exam scores. For each additional hour of study, the predicted exam score increases by approximately 1.97 points.
Before interpreting your regression results, verify that these assumptions are met. Violating assumptions can lead to biased estimates, incorrect standard errors, and invalid inference.
1. Linearity
The relationship between X and Y must be linear. Inspect a scatter plot of the data. If the relationship is curved (e.g., quadratic, logarithmic), consider transforming your variables or using polynomial regression. A residual plot showing a random scatter around zero supports linearity.
2. Independence of Errors
The residuals (errors) must be independent of each other. This is especially important with time-series data, where successive observations may be correlated (autocorrelation). The Durbin-Watson test can detect autocorrelation. Values near 2 indicate no autocorrelation.
3. Normality of Residuals
The residuals should be approximately normally distributed. This assumption is important for hypothesis testing and confidence intervals. Check normality using a Q-Q plot or the Shapiro-Wilk test. With large samples (n > 30), the Central Limit Theorem makes regression robust to mild non-normality.
4. Homoscedasticity (Constant Variance)
The variance of residuals should be approximately constant across all levels of X. In a residual vs. fitted values plot, the spread of residuals should remain roughly the same. If the spread fans out (heteroscedasticity), consider using weighted least squares or robust standard errors.
According to APA 7th edition guidelines, regression results should include the F-statistic with degrees of freedom, the p-value, R², the regression equation, and individual coefficient statistics. Here is a template you can adapt:
Simple Linear Regression
A simple linear regression was conducted to predict exam scores from study hours. The model was statistically significant, F(1, 8) = 2854.88, p < .001, R² = .997. Study hours significantly predicted exam scores, b = 1.97, t(8) = 53.43, p < .001, 95% CI [1.88, 2.05]. For each additional hour of study, exam scores increased by an average of 1.97 points.
Non-significant Result
A simple linear regression was conducted to predict happiness scores from daily screen time. The model was not statistically significant, F(1, 48) = 1.23, p = .274, R² = .025. Screen time did not significantly predict happiness scores, b = -0.15, t(48) = -1.11, p = .274, 95% CI [-0.42, 0.12].
Note: Report regression coefficients, t-values, and F-values to two decimal places. Report p-values to three decimal places, except use p < .001 when the value is below .001. Always include R² and the 95% confidence interval for key coefficients.
| Situation | Recommended Test |
|---|---|
| One predictor, one continuous outcome | Simple linear regression |
| Multiple predictors, one continuous outcome | Multiple linear regression |
| Relationship strength only (no prediction) | Pearson / Spearman correlation |
| Binary outcome variable | Logistic regression |
| Non-linear relationship | Polynomial regression or data transformation |
| Comparing group means (categorical predictor) | T-test or ANOVA |
StatMate's regression calculations have been validated against R's lm() and summary.lm() functions. We compute the OLS regression using the standard normal equations and derive F-statistics, t-statistics, and confidence intervals using the jstat library for probability distributions. All results match R output to at least 4 decimal places.
t検定
2群の平均値を比較
分散分析
3群以上の平均値を比較
カイ二乗検定
カテゴリ変数の関連を検定
相関分析
関係の強さを測定
記述統計
データを要約
サンプルサイズ
検出力分析・標本計画
1標本t検定
既知の値との比較
マン・ホイットニーU
ノンパラメトリック群間比較
ウィルコクソン検定
ノンパラメトリック対応検定
重回帰分析
複数の予測変数
クロンバックのα
尺度の信頼性
ロジスティック回帰
二値アウトカムの予測
因子分析
潜在因子構造の探索
クラスカル・ウォリス
ノンパラメトリック3群以上比較
反復測定
被験者内分散分析
二元配置分散分析
要因計画の分析
フリードマン検定
ノンパラメトリック反復測定
フィッシャーの正確検定
2×2表の正確検定
マクネマー検定
対応のある名義データの検定
Excel/スプレッドシートから貼り付け、またはCSVファイルをドロップ
Excel/スプレッドシートから貼り付け、またはCSVファイルをドロップ
データを入力して「計算」をクリックしてください
または「サンプルデータを読み込む」をクリックしてお試しください