Activity 1

MATH 216: Statistical Thinking

Name: ___________________________ Date: _________________

Time Allocation: 15 minutes total

High School Student Data Analysis

Dataset Description

The High School and Beyond (hsb2) dataset contains information about 200 high school students, including demographic details and academic performance. This dataset is commonly used in statistics education to explore relationships between socioeconomic factors and test scores.

Key Variables:

  • ID: Student identifier (unique number)
  • Gender: Self-reported gender (Male/Female)
  • Race: Ethnicity (White/Black/Asian/Other)
  • SES: Socioeconomic status (Low/Middle/High)
  • Program: Enrollment track (General/Vocational/Academic)
  • Reading, Math, Science: Standardized test scores (0-100)

Data Preview

First 6 observations from hsb2 dataset
id gender race ses schtyp prog read write math science socst
70 male white low public general 57 52 41 47 57
121 female white middle public vocational 68 59 53 63 61
86 male white high public general 44 33 54 58 31
141 male white high public vocational 63 44 47 53 56
172 male white middle public academic 47 52 57 53 61
113 male white middle public academic 44 52 51 63 61

Part 1: Variable Classification

Instructions: Classify each variable as either Qualitative or Quantitative.

Variable Classification Justification (for starred items)
Gender _______________
Math Score* _______________ _______________________________
SES _______________
Program* _______________ _______________________________
Science Score _______________

Part 2: Critical Thinking Questions

Question 1: If researchers want to analyze gender differences in math performance:

  • Current data type of Gender: _________________________
  • Conversion method: Assign Male = ____, Female = ____
  • Reason for these values: _______________________________

Question 2: Suppose Gender is coded as 0 (Male) and 1 (Female):

  • “Average gender” meaningful? □ Yes □ No
  • Explanation: __________________________________________


  • Why keep as categorical? _______________________________


Part 3: Real-World Application

Scenario: A local school district wants to analyze student performance data.

  • Potential bias source: _________________________________


  • Suggested improvement: ________________________________


  • How it addresses bias: ________________________________



Student Reflection: What was the most challenging part of this activity?