Activity 35

MATH 216: Statistical Thinking

Correlation & Linear Regression

Time Allocation: 15 minutes total (5 min reading, 10 min individual work)

Part 1: Conceptual Understanding (3 minutes)

Instructions: Answer the following questions:

  1. What does the correlation coefficient (\(r\)) measure and what is its range?
  1. How is the slope coefficient (\(b_1\)) interpreted in a linear regression equation?
  1. What is the difference between correlation and causation?

Part 2: Correlation Analysis from R Output (4 minutes)

Analyze correlation coefficients from R output:

Example 1: Study Time vs. Exam Scores

# Simulated data: Study hours vs. Exam scores
study_data <- data.frame(
  study_hours = c(2, 4, 6, 8, 10, 12, 14, 16, 18, 20),
  exam_score = c(65, 72, 78, 82, 85, 88, 90, 92, 94, 95)
)

cor(study_data$study_hours, study_data$exam_score)
[1] 0.9652171

Questions:

  1. What is the correlation coefficient?
  2. How would you describe the strength and direction of this relationship?

Example 2: Temperature vs. Ice Cream Sales

# Simulated data: Temperature vs. Ice cream sales
temp_data <- data.frame(
  temperature = c(65, 70, 75, 80, 85, 90, 95, 100),
  ice_cream_sales = c(50, 65, 80, 95, 110, 125, 140, 155)
)

cor(temp_data$temperature, temp_data$ice_cream_sales)
[1] 1

Questions:

  1. What is the correlation coefficient?
  2. Interpret what this correlation means in context:

Example 3: Age vs. Reaction Time

# Simulated data: Age vs. Reaction time
age_data <- data.frame(
  age = c(20, 25, 30, 35, 40, 45, 50, 55, 60, 65),
  reaction_time = c(0.25, 0.28, 0.32, 0.35, 0.38, 0.42, 0.46, 0.51, 0.57, 0.64)
)

cor(age_data$age, age_data$reaction_time)
[1] 0.989797

Questions:

  1. What is the correlation coefficient?
  2. What does the sign of the correlation tell you about this relationship?

Part 3: Slope and Intercept Interpretation from R Output (3 minutes)

Analyze slope and y-intercept from linear regression output:

Example 1: Height vs. Weight

# Simulated data: Height (inches) vs. Weight (pounds)
height_weight <- data.frame(
  height = c(60, 62, 64, 66, 68, 70, 72, 74, 76, 78),
  weight = c(120, 130, 140, 150, 160, 170, 180, 190, 200, 210)
)

model1 <- lm(weight ~ height, data = height_weight)
summary(model1)

Call:
lm(formula = weight ~ height, data = height_weight)

Residuals:
       Min         1Q     Median         3Q        Max 
-1.095e-13 -8.997e-15  4.136e-15  2.205e-14  6.561e-14 

Coefficients:
              Estimate Std. Error    t value Pr(>|t|)    
(Intercept) -1.800e+02  1.848e-13 -9.742e+14   <2e-16 ***
height       5.000e+00  2.669e-15  1.874e+15   <2e-16 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 4.848e-14 on 8 degrees of freedom
Multiple R-squared:      1, Adjusted R-squared:      1 
F-statistic: 3.511e+30 on 1 and 8 DF,  p-value: < 2.2e-16

Questions:

  1. What is the slope coefficient?
  2. Interpret the y-intercept in context:

Example 2: Advertising vs. Sales

# Simulated data: Advertising budget ($1000s) vs. Sales ($1000s)
advertising_data <- data.frame(
  advertising = c(1, 2, 3, 4, 5, 6, 7, 8, 9, 10),
  sales = c(50, 65, 75, 85, 95, 105, 115, 125, 135, 145)
)

model2 <- lm(sales ~ advertising, data = advertising_data)
summary(model2)

Call:
lm(formula = sales ~ advertising, data = advertising_data)

Residuals:
    Min      1Q  Median      3Q     Max 
-3.2727 -0.3864  0.2273  0.8409  1.4545 

Coefficients:
            Estimate Std. Error t value Pr(>|t|)    
(Intercept)  43.0000     0.9770   44.01 7.84e-11 ***
advertising  10.2727     0.1575   65.24 3.39e-12 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 1.43 on 8 degrees of freedom
Multiple R-squared:  0.9981,    Adjusted R-squared:  0.9979 
F-statistic:  4256 on 1 and 8 DF,  p-value: 3.39e-12

Questions:

  1. What is the slope coefficient?
  2. What does the y-intercept represent in this context?

Example 3: Years of Experience vs. Salary

# Simulated data: Years of experience vs. Salary ($1000s)
experience_data <- data.frame(
  experience = c(0, 2, 4, 6, 8, 10, 12, 14, 16, 18),
  salary = c(40, 45, 50, 55, 60, 65, 70, 75, 80, 85)
)

model3 <- lm(salary ~ experience, data = experience_data)
summary(model3)

Call:
lm(formula = salary ~ experience, data = experience_data)

Residuals:
       Min         1Q     Median         3Q        Max 
-6.656e-15 -1.437e-15 -2.510e-16  1.319e-15  7.838e-15 

Coefficients:
             Estimate Std. Error   t value Pr(>|t|)    
(Intercept) 4.000e+01  2.292e-15 1.745e+16   <2e-16 ***
experience  2.500e+00  2.147e-16 1.164e+16   <2e-16 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 3.9e-15 on 8 degrees of freedom
Multiple R-squared:      1, Adjusted R-squared:      1 
F-statistic: 1.356e+32 on 1 and 8 DF,  p-value: < 2.2e-16

Questions:

  1. What is the slope coefficient?
  2. Interpret the y-intercept: