๐ Scene Setup
You're Josh, an independent data consultant. It's 11pm and you're in Riviera Studios' screening room, laptop open, surrounded by cold coffee cups. At 9am tomorrow, Riviera will sign a $50 million streaming deal based on their analyst's claim that their linear model perfectly predicts IMDb scores from marketing budget.
Your job: verify the residuals before morning. The contract clause is simple, if the model isn't appropriate, the deal doesn't go through. You have 6 pages of the analytics report to sign off on.
๐ Riviera Studios Dataset, Budget vs IMDb Score
Film
Budget (x, $M)
Actual IMDb Score (y)
1
5
5.8
2
10
5.8
3
15
7.4
4
20
7.7
5
25
8.3
Riviera's Model: ลท = 0.2x + 4
1
Page 1 of 6, Verify the Prediction
Using Riviera's model ลท = 0.2x + 4, calculate the predicted IMDb score for Film 3 (budget = $15M).
The analyst's report says 7.0. Do the maths and confirm.
๐ก Substitute x = 15 into ลท = 0.2x + 4. Multiply first, then add.
โ Page 1 signed off. The prediction checks out, Film 3's predicted score is 7.0.
2
Page 2 of 6, First Residual
Film 3 has an actual IMDb score of 7.4. The model predicted 7.0. Calculate the residual for Film 3.
Residual = actual โ predicted. A positive result means the model under-predicted.
๐ก Residual = y โ ลท = actual โ predicted = 7.4 โ 7.0
โ Page 2 signed off. Residual = +0.4. Film 3's actual score beat the prediction.
3
Page 3 of 6, Negative Residual Alert
Film 2 has an actual score of 5.8. The model predicts ลท = 0.2(10) + 4 = 6.0. Calculate the residual for Film 2.
Watch the sign, if the actual value is LESS than the predicted value, the residual is negative. Enter a negative number if needed.
๐ก Residual = actual โ predicted = 5.8 โ 6.0. Is the actual higher or lower? Enter the sign!
โ Page 3 signed off. Residual = โ0.2. The model over-predicted Film 2 by 0.2 points.
4
Page 4 of 6, Spot the Outlier
Here are all 5 residuals:
Film 1: +0.8 | Film 2: โ0.2 | Film 3: +0.4 | Film 4: โ0.3 | Film 5: โ0.7
Which film number has the largest absolute residual? (i.e., the point furthest from the regression line, regardless of sign)
Compare the absolute values: |+0.8| = 0.8, |โ0.2| = 0.2, etc. Enter the film number only.
๐ก List the absolute values: 0.8, 0.2, 0.4, 0.3, 0.7. Which film has the biggest number? Enter just the film number.
โ Page 4 signed off. Film 1 is the outlier, its residual of 0.8 is the largest. Worth flagging in the report.
5
Page 5 of 6, Positive Residuals
Residuals: Film 1: +0.8 | Film 2: โ0.2 | Film 3: +0.4 | Film 4: โ0.3 | Film 5: โ0.7
How many of the 5 films have a positive residual? (i.e., the actual score was higher than predicted)
Count only the ones with a + sign in front.
๐ก Go through the list: +0.8 โ, โ0.2 โ, +0.4 โ, โ0.3 โ, โ0.7 โ. How many checkmarks?
โ Page 5 signed off. 2 films have positive residuals (Films 1 and 3). Now for the final judgement call...
6
Page 6 of 6, The Final Verdict
You check the residual plots for two studios competing for the deal:
Studio A (Riviera): Residual plot shows points scattered randomly above and below zero, no discernible pattern, small residuals throughout.
Studio B (rival): Residual plot shows a clear U-shaped curve, residuals are strongly positive at low x values, negative in the middle, and positive again at high x values.
Which studio's model is appropriate for linear regression? Enter 1 for Studio A or 2 for Studio B.
Remember: random scatter = linear model is appropriate. A clear pattern = non-linear relationship, linear model is NOT appropriate.
๐ก Ask yourself: which residual plot has NO pattern? Random scatter above and below zero means the linear model IS a good fit. A curved or systematic pattern means it ISN'T.
โ Page 6 signed off. Studio A (Riviera) shows random scatter, linear model appropriate!
๐ฌ
Director's Cut: Complete
You've signed off on all 6 pages. Riviera's residual plot shows random scatter, the linear model IS appropriate. The contract stands. The $50M deal goes through.
Film 1 was flagged as an outlier (residual 0.8), but the overall model fits the data well. Studio B's U-shaped residual plot? That model's going straight to the cutting room floor. ๐๏ธ
residual = actual โ predicted ยท
random scatter โ linear model appropriate ยท
pattern โ non-linear model needed