Correlation & Linear Regression
Time Allocation: 15 minutes total (5 min reading, 10 min individual work)
Part 1: Conceptual Understanding (3 minutes)
Instructions: Answer the following questions:
- What does the correlation coefficient (\(r\)) measure and what is its range?
- How is the slope coefficient (\(b_1\)) interpreted in a linear regression equation?
- What is the difference between correlation and causation?
Part 2: Correlation Analysis from R Output (4 minutes)
Analyze correlation coefficients from R output:
Example 1: Study Time vs. Exam Scores
# Simulated data: Study hours vs. Exam scores
study_data <- data.frame(
study_hours = c(2, 4, 6, 8, 10, 12, 14, 16, 18, 20),
exam_score = c(65, 72, 78, 82, 85, 88, 90, 92, 94, 95)
)
cor(study_data$study_hours, study_data$exam_score)
Questions:
- What is the correlation coefficient?
- How would you describe the strength and direction of this relationship?
Example 2: Temperature vs. Ice Cream Sales
# Simulated data: Temperature vs. Ice cream sales
temp_data <- data.frame(
temperature = c(65, 70, 75, 80, 85, 90, 95, 100),
ice_cream_sales = c(50, 65, 80, 95, 110, 125, 140, 155)
)
cor(temp_data$temperature, temp_data$ice_cream_sales)
Questions:
- What is the correlation coefficient?
- Interpret what this correlation means in context:
Example 3: Age vs. Reaction Time
# Simulated data: Age vs. Reaction time
age_data <- data.frame(
age = c(20, 25, 30, 35, 40, 45, 50, 55, 60, 65),
reaction_time = c(0.25, 0.28, 0.32, 0.35, 0.38, 0.42, 0.46, 0.51, 0.57, 0.64)
)
cor(age_data$age, age_data$reaction_time)
Questions:
- What is the correlation coefficient?
- What does the sign of the correlation tell you about this relationship?
Part 3: Slope and Intercept Interpretation from R Output (3 minutes)
Analyze slope and y-intercept from linear regression output:
Example 1: Height vs. Weight
# Simulated data: Height (inches) vs. Weight (pounds)
height_weight <- data.frame(
height = c(60, 62, 64, 66, 68, 70, 72, 74, 76, 78),
weight = c(120, 130, 140, 150, 160, 170, 180, 190, 200, 210)
)
model1 <- lm(weight ~ height, data = height_weight)
summary(model1)
Call:
lm(formula = weight ~ height, data = height_weight)
Residuals:
Min 1Q Median 3Q Max
-1.095e-13 -8.997e-15 4.136e-15 2.205e-14 6.561e-14
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) -1.800e+02 1.848e-13 -9.742e+14 <2e-16 ***
height 5.000e+00 2.669e-15 1.874e+15 <2e-16 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Residual standard error: 4.848e-14 on 8 degrees of freedom
Multiple R-squared: 1, Adjusted R-squared: 1
F-statistic: 3.511e+30 on 1 and 8 DF, p-value: < 2.2e-16
Questions:
- What is the slope coefficient?
- Interpret the y-intercept in context:
Example 2: Advertising vs. Sales
# Simulated data: Advertising budget ($1000s) vs. Sales ($1000s)
advertising_data <- data.frame(
advertising = c(1, 2, 3, 4, 5, 6, 7, 8, 9, 10),
sales = c(50, 65, 75, 85, 95, 105, 115, 125, 135, 145)
)
model2 <- lm(sales ~ advertising, data = advertising_data)
summary(model2)
Call:
lm(formula = sales ~ advertising, data = advertising_data)
Residuals:
Min 1Q Median 3Q Max
-3.2727 -0.3864 0.2273 0.8409 1.4545
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 43.0000 0.9770 44.01 7.84e-11 ***
advertising 10.2727 0.1575 65.24 3.39e-12 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Residual standard error: 1.43 on 8 degrees of freedom
Multiple R-squared: 0.9981, Adjusted R-squared: 0.9979
F-statistic: 4256 on 1 and 8 DF, p-value: 3.39e-12
Questions:
- What is the slope coefficient?
- What does the y-intercept represent in this context?
Example 3: Years of Experience vs. Salary
# Simulated data: Years of experience vs. Salary ($1000s)
experience_data <- data.frame(
experience = c(0, 2, 4, 6, 8, 10, 12, 14, 16, 18),
salary = c(40, 45, 50, 55, 60, 65, 70, 75, 80, 85)
)
model3 <- lm(salary ~ experience, data = experience_data)
summary(model3)
Call:
lm(formula = salary ~ experience, data = experience_data)
Residuals:
Min 1Q Median 3Q Max
-6.656e-15 -1.437e-15 -2.510e-16 1.319e-15 7.838e-15
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 4.000e+01 2.292e-15 1.745e+16 <2e-16 ***
experience 2.500e+00 2.147e-16 1.164e+16 <2e-16 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Residual standard error: 3.9e-15 on 8 degrees of freedom
Multiple R-squared: 1, Adjusted R-squared: 1
F-statistic: 1.356e+32 on 1 and 8 DF, p-value: < 2.2e-16
Questions:
- What is the slope coefficient?
- Interpret the y-intercept: