What even is this?
Sometimes you have two sets of data, like training hours and goals scored, and you want to know: does one thing affect the other? Scatterplots let you see the relationship visually. Pearson's r gives you a number to measure how strong that relationship is.
In the exam you'll be asked to describe what you see in a scatterplot, interpret an r value, and sometimes make predictions. This page covers all of that.
In any bivariate (two-variable) study, one variable explains or predicts the other. You need to know which is which, it affects which axis each goes on.
📌 Explanatory Variable
The one doing the explaining. Goes on the x-axis. Also called the independent variable.
Example: training hours per week
🎯 Response Variable
The one responding to changes. Goes on the y-axis. Also called the dependent variable.
Example: goals scored per season
When you describe an association in a scatterplot, cover these four things:
Direction
Positive, as x increases, y increases
Negative, as x increases, y decreases
No correlation, no clear pattern
Form
Linear, points follow a straight line
Non-linear, points follow a curve
(If non-linear, a regression line may not be appropriate)
Strength
Strong, points cluster tightly around the line
Moderate, noticeable trend but spread out
Weak, barely any pattern visible
Outliers
Any points that don't fit the overall pattern. Mention them if they're clearly visible, and note that they can affect the correlation coefficient.
Examples of different associations:
Pearson's r is a single number that measures the strength and direction of a linear association. Your calculator does the hard work, you just need to know how to read it.
🧮On your Casio: the exact buttons to get r and the regression line (fx-100AU).Show me →| r value | Strength | Direction |
|---|---|---|
| r = 1 | Perfect | Positive |
| 0.75 ≤ r < 1 | Strong | Positive |
| 0.5 ≤ r < 0.75 | Moderate | Positive |
| 0 < r < 0.5 | Weak | Positive |
| r = 0 | No linear correlation | |
| −0.5 < r < 0 | Weak | Negative |
| −0.75 < r ≤ −0.5 | Moderate | Negative |
| −1 < r ≤ −0.75 | Strong | Negative |
| r = −1 | Perfect | Negative |
| Player | A | B | C | D | E | F | G | H |
|---|---|---|---|---|---|---|---|---|
| Training sessions/week (x) | 2 | 3 | 4 | 4 | 5 | 6 | 7 | 8 |
| Goals per season (y) | 7 | 9 | 12 | 15 | 16 | 20 | 22 | 25 |
The scatterplot shows a clear upward trend. The calculator gives r = 0.99. Describe and interpret the association.
Identify the variables
The analyst is using training sessions to predict goals scored.
→ Explanatory (x): training sessions per week
→ Response (y): goals per season
Describe the scatterplot
Looking at the plot: the points go upward left to right (positive), they follow a straight-line pattern (linear), and they cluster tightly around that line (strong). No obvious outliers.
Interpret r = 0.99
The value is very close to +1. Using the classification table:
Is this causation?
Not necessarily. The strong correlation suggests a relationship, but other factors (player skill, fitness, team quality) also affect goals. We can say the variables are associated, not that one causes the other.
Tap a question to reveal the answer. Try to answer it yourself first!
Response variable (y): exam score
The researcher is using sleep to explain or predict exam performance, so sleep is the explanatory variable. Exam score responds to the amount of sleep.
• Direction: positive, as temperature increases, sales also increase
• Form: linear, follows a roughly straight-line pattern
• Strength: moderate, the points are loosely spread around the line
• Outliers: none mentioned
• r = −0.83 → |r| = 0.83, which falls in the range 0.75 to 1 → strong
• The negative sign → negative direction, as TV hours increase, exam scores tend to decrease
Note: this doesn't mean watching TV causes lower scores, there could be other explanations (less study time, tiredness, etc.).
Weakest: Dataset D (r = −0.35)
Strength is determined by the absolute value of r (how close it is to 1 or −1, ignoring the sign):
• |A| = 0.42 → weak
• |B| = 0.91 → strong ← strongest
• |C| = 0.78 → strong
• |D| = 0.35 → weak ← weakest
The negative sign on B just means it's a negative direction, it doesn't make it weaker.
While the correlation is strong, bigger feet don't cause better reading. The real explanation is a lurking variable: age.
Older children have both bigger feet and better reading ability, age is driving both. This is a classic example of two variables being correlated simply because they're both related to a third variable.