← Home · Regression, four ways
Regression (line of best fit)
Same idea, four ways to study it. Tap a style and find the one that clicks for you. 📺 Prefer to watch? Videos are on the lesson page.
📈 Formula book: the least-squares line is ŷ = mx + c, with m = r·(sy/sx) and c = ȳ − mx̄. Your calculator finds m and c, your job is to interpret and predict.

What it is

A regression line is the line of best fit through a scatterplot. It gives you an equation, so you can predict y from any x. You won't draw it by hand, the calculator finds it, your job is to read what it means and use it.

The equation (formula book)

ŷ = mx + c
m (gradient)
how much y changes for each 1 extra x.
c (y-intercept)
the predicted y when x = 0.

ŷ ("y-hat") = the predicted value. The book gives the formulas the calculator uses: m = r·(sy/sx) and c = ȳ − mx̄.

Worked example

Study hours (x) vs test score % (y) gives ŷ = 4.5x + 38, data range x = 2 to 10.

  1. Gradient m = 4.5: each extra hour of study per week → predicted score up by 4.5 marks.
  2. Intercept c = 38: a student who does no study is predicted to score 38 (but x = 0 is outside the data, so treat with care).
  3. Predict for x = 7 (inside 2 to 10): ŷ = 4.5(7) + 38 = 31.5 + 38 = 69.5%.

Two more must-knows

Interpolation = predicting inside the data range → reliable. Extrapolation = predicting outside it → risky, the pattern may not hold.

r² (coefficient of determination) = just square r. "r² = 0.98" means 98% of the variation in y is explained by x.

Watch out

• Always interpret in context, name the variables and units. "Gradient is 4.5" gets no marks on its own.
• Check whether x = 0 is realistic before trusting the intercept.
• Predicting far outside the data is extrapolation, flag it as less reliable.
• r² is a percentage of variation explained, not the same as r.

The line through the points

x (explanatory) y (response) ŷ = mx + c

The line sits as close as possible to all the points. The little orange gaps are residuals (actual minus predicted), the line minimises the total of their squares.

Inside the data vs outside

extrap. data range (interpolation) extrap.

Predicting inside the green band is interpolation (reliable). Outside, in red, is extrapolation (less reliable).

Warm up first

Don't read yet, just have a go in your head:

In ŷ = 4.5x + 38, what does the 4.5 tell you?
For every 1 extra unit of x, predicted y goes up by 4.5. It's the gradient.
What does the 38 tell you?
The predicted y when x = 0. It's the y-intercept c.
r = 0.9. What is r²?
0.9² = 0.81 → 81% of the variation in y is explained by x.

Faded example: ŷ = 4.5x + 38

Rung 1 · watch one done fully

Predict for x = 7: ŷ = 4.5(7) + 38 = 31.5 + 38 = 69.5. (x = 7 is inside 2 to 10, so it's reliable interpolation.)

Rung 2 · you fill the gaps

Predict for x = 7: ŷ = 4.5(7) + 38 = ? + 38 = ?

Check my gaps
31.5, then 69.5.
Rung 3 · all you

Using ŷ = 4.5x + 38, how many study hours are predicted to give a score of 80? Solve for x. Check below.

Check my answer
80 = 4.5x + 38 → 4.5x = 42 → x = 42 ÷ 4.5 ≈ 9.3 hours (inside the data range, so reliable).

Exam-style stretch: extrapolation

The data covers x = 2 to 10 hours. A teacher uses ŷ = 4.5x + 38 to predict the score for someone studying 15 hours. Should they trust it?

Show the answer
No. x = 15 is outside the data range (2 to 10), so this is extrapolation. The linear pattern may not continue, so the prediction is unreliable.

Say it back

In one sentence, out loud: what do the gradient and intercept each tell you in context? If you can say it, you've got it.

⚡ Regression, one look

Lineŷ = mx + c  (formula book: y = mx + c)
Gradient mchange in y per 1 extra x (in context)
Intercept cpredicted y when x = 0
Predictsubstitute x into the equation
Interpolationpredict inside data range → reliable
Extrapolationpredict outside → less reliable
= r² → % of variation in y explained by x
Trapalways interpret in context · check x = 0 makes sense