Residuals
Definition
The residual for each observation is the difference between predicted values of $y$ (dependent variable) and observed values of $y$. \begin{align} \text{Residual}&=\text{actual } y \text{ value} - \text{predicted }y \text{ value} \text{,}\\ r_i&=y_i-\hat{y_i} . \end{align}
Having a negative residual means that the predicted value is too high, similarly if you have a positive residual it means that the predicted value was too low. The aim of a regression line is to minimise the sum of residuals.
Calculating Residuals
Knowing that \[r_i=y_i-\hat{y_i}\] and knowing that the regression line has the equation \[\displaystyle \hat{y_i}=a+b{x_i}\] we calculate the residual of an observation as follows: \[r_i=y_i-\hat{y_i}=y_i-(a+bx_i).\]
Worked Example
Worked Example
To see how students' physical ability has increased over a four-year period, ten students completed an obstacle course and then four years later they took the same course again. Here are their times:
Student | Debbie | Edna | Jerry | Norman | Joseph | Betty | Susan | Marilyn | Bert | Alice |
---|---|---|---|---|---|---|---|---|---|---|
First Test, $x$, (seconds) | $67$ | $53$ | $68$ | $57$ | $71$ | $74$ | $63$ | $75$ | $66$ | $66$ |
Second Test, $y$, (seconds) | $46$ | $29$ | $37$ | $44$ | $41$ | $35$ | $41$ | $43$ | $33$ | $36$ |
The equation of our regression line is $\hat{y}=23.91+0.22x$. What is the predicted time to complete the second course for Betty and what is the residual value?
Solution
Using our regression line equation we can calculate the predicted value, $\hat{y}$, by simply substituting in our value for $x$ (the first test score for Betty).
\begin{align} \hat{y_i}&=a{x_i}+b\\ &=23.91+0.22x_i\\ &=23.91+0.22\times74\\ &=40.19 \end{align}
The residual value is calculated by
\begin{align} r_i&=y_i-\hat{y_i}\\ &=35-40.19\\ &=-5.19 \end{align}
Video Example
This is a video example involving calculating residuals produced by Alissa Grant-Walker.