Day 19

Math 216: Statistical Thinking

Bastola

Statistical Estimation: Bridging Samples and Populations

Key Question: How can we use sample data to make reliable inferences about unknown population parameters? Statistical estimation provides the mathematical framework for quantifying uncertainty in our conclusions!

Real-World Applications:

  • Medical Research: Estimate treatment effects from clinical trial data
  • Market Analysis: Determine consumer preferences from survey samples
  • Quality Control: Monitor production processes using sample measurements
  • Environmental Science: Assess pollution levels from limited monitoring stations

Target Parameters


Determining the Target Parameter
Parameter Key Words or Phrases Type of Data
\(\mu\) Mean; average Quantitative
\(p\) Proportion; percentage; fraction; rate Qualitative
\(\sigma^2\) (optional) Variance; variability; spread Quantitative

Types of Statistical Estimators

Statistical Estimation Framework

Point Estimator: A single value that provides our best guess for the population parameter

  • Example: \(\bar{x} = \frac{1}{n} \sum_{i=1}^n x_i\) estimates the population mean \(\mu\)
  • Properties: Unbiased, consistent, efficient

Interval Estimator (Confidence Interval): A range of plausible values that likely contains the true parameter

  • Mathematical Form: \(\text{Estimate} \pm \text{Margin of Error}\)
  • Interpretation: Quantifies uncertainty in our estimation process

Key Insight: While point estimates give us a single “best guess,” confidence intervals provide the precision and reliability of that guess!

The Power of Confidence Intervals: Quantifying Uncertainty

Why Confidence Intervals Matter

Beyond Point Estimates: Confidence intervals provide more information than single values—they quantify the precision and reliability of our estimates!

Key Benefits:

  • Uncertainty Quantification: Express the range of plausible values for the parameter
  • Statistical Precision: Wider intervals indicate more uncertainty, narrower intervals indicate greater precision
  • Decision Support: Help determine if effects are practically significant
  • Method Reliability: The confidence level indicates how often the method produces intervals containing the true parameter

Statistical Significance: Confidence intervals are the foundation for hypothesis testing and statistical inference!

Calculating a Confidence Interval

  • Scenario: Estimating average hospital stay length.
  • Sample Data: Sample mean \(\bar{x}\) from 100 patient records.
  • Central Limit Theorem: Assures that \(\bar{x}\) is approximately normally distributed for large samples.

Confidence Interval Formula

  • 95% Confidence Interval for \(\mu\): \[ 95\% \text{ C.I.} = \left(\bar{x} - 1.96 \frac{\sigma}{\sqrt{n}}, \quad \bar{x} + 1.96 \frac{\sigma}{\sqrt{n}}\right) \]
  • Note: \(\sigma\) is the standard deviation of the population, and \(n\) is the sample size.

Understanding Confidence Intervals

  • Question: Is the true mean \(\mu\) between 3.81 and 5.25?
  • Confidence Understanding:
    • No certainty that \(\mu\) lies within this specific interval from a single sample.
    • If repeated samples are taken, about 95% of such intervals would contain \(\mu\).
  • Correct Interpretation:
    • We don’t say \(\mu\) is definitely in this interval based on one sample; the 95% level reflects how often these intervals capture \(\mu\) across many samples.
  • Terminology:
    • Confidence Coefficient (.95): Proportion of intervals that will contain \(\mu\) over repeated sampling.
    • Confidence Level (95%): Indicates method reliability over many trials.

Understanding CIs

Confidence Intervals

100(1-\(\alpha\)) CI

90% CI

Commonly used values of \(z_{\alpha}\)

The value \(z_\alpha\) is defined as the value of the standard normal random variable \(z\) such that the area \(\alpha\) will lie to its right. In other words, \(P\left(z>z_\alpha\right)=\alpha\).

Large Sample Confidence Interval for \(\mu\)

Confidence Interval Visualization

Theoretical Calculations: pnorm and qnorm Applications

R Functions for Confidence Interval Calculations

Critical Value Calculations:

  • 90% confidence: qnorm(0.95) = 1.645
  • 95% confidence: qnorm(0.975) = 1.960
  • 99% confidence: qnorm(0.995) = 2.576

Probability Calculations:

  • Probability within 1 SD: pnorm(1) - pnorm(-1) = 0.6827
  • Probability within 2 SD: pnorm(2) - pnorm(-2) = 0.9545
  • Probability within 3 SD: pnorm(3) - pnorm(-3) = 0.9973

Confidence Interval Formula: \[\text{CI} = \bar{x} \pm z_{\alpha/2} \cdot \frac{\sigma}{\sqrt{n}}\]

Margin of Error: \[\text{ME} = z_{\alpha/2} \cdot \frac{\sigma}{\sqrt{n}}\]

Confidence Interval Exercises

Practical Exercises Using R

Exercise 1: Basic Confidence Interval Calculation

  • Sample mean: 85, population SD: 12, sample size: 64
  • Calculate 95% CI: 85 ± 1.96 * (12/sqrt(64)) = (82.06, 87.94)
  • R code: 85 + c(-1,1) * qnorm(0.975) * 12/sqrt(64)

Exercise 2: Sample Size Determination

  • Desired margin of error: 2, population SD: 10, 95% confidence
  • Required sample size: n = (1.96 * 10 / 2)^2 = 96.04 → 97
  • R code: ceiling((qnorm(0.975) * 10 / 2)^2)

Confidence Interval Exercises

Practical Exercises Using R

Exercise 3: Confidence Level Impact

  • Same data: mean=50, SD=8, n=36
  • 90% CI: 50 ± 1.645*(8/6) = (47.81, 52.19)
  • 95% CI: 50 ± 1.96*(8/6) = (47.39, 52.61)
  • 99% CI: 50 ± 2.576*(8/6) = (46.57, 53.43)

Exercise 4: Real-World Interpretation

  • Medical study: Treatment reduces blood pressure by 8 mmHg (95% CI: 5 to 11 mmHg)
  • Interpretation: We’re 95% confident the true reduction is between 5 and 11 mmHg
  • Statistical significance: Interval doesn’t include 0 → effect is statistically significant