Activity 4

MATH 216: Statistical Thinking

Distribution Analysis & Z-Scores

Time Allocation: 15 minutes total

Case Study: Use the graphical and numerical summaries of Student Survey Dataset to compare the GPA of students for different Year.

survey_data <- read.csv("https://raw.githubusercontent.com/deepbas/datasets/main/StudentSurvey.csv") |>
   tidyr::drop_na()

ggplot(survey_data, aes(x=GPA, fill=Year)) +
  geom_histogram() +
  facet_wrap(~Year)

survey_data |> 
  group_by(Year) |> 
  summarize(mean = mean(GPA),
            sd = sd(GPA),
            n = n()) |> 
  knitr::kable(caption = "Summary Statistics of GPA for all Years")

Summary Statistics of GPA for all Years
Year	mean	sd	n
FirstYear	3.070759	0.4702584	79
Junior	3.273636	0.3688057	33
Senior	3.171667	0.3754274	36
Sophomore	3.173224	0.3690737	183

Part 1: Distribution Concepts (5 minutes)

Instructions: Provide examples and explanations for each concept:

Symmetric Distribution Example: _________________________
- Why it’s symmetric: ____________________________________
Real-world skewed distribution: _________________________
- Direction of skew: □ Left □ Right
- Reason for skew: ______________________________________
Normal distribution characteristic: ______________________

Part 2: Z-Score Interpretation (5 minutes)

Scenario: A student’s GPA has a z-score of 2.

What does this mean?
- □ GPA is 2 points above mean
- □ GPA is 2 standard deviations above mean
- □ GPA is 2 standard deviations below mean
- □ GPA is 2 points below mean
Explanation: __________________________________________
If mean GPA = 3.0 and SD = 0.5, calculate actual GPA:
- Formula: _________________________
- Calculation: ______________________
- Actual GPA: ______________________

Part 3: Data Analysis (5 minutes)

Analysis Questions:

Which year has the highest average GPA? __________________
Potential explanation for this pattern:
Two potential biases in this dataset: