Day 28

Math 216: Statistical Thinking

Bastola

Effectiveness of two training programs: Paired Data

Pair	Method A	Method B
1	85	83
2	88	89
3	90	87
4	92	84
5	91	92
6	89	90
7	93	85
8	95	91
9	96	98
10	97	94

Pair	Method A	Method B
11	98	100
12	99	101
13	100	99
14	101	111
15	102	111
16	103	106
17	104	109
18	105	103
19	106	111
20	107	114

Comprehensive Paired t-Test Framework

Null Hypothesis (\(H_0\)): No difference between paired measurements \[H_0: \mu_d = 0\] where \(\mu_d = \mu_A - \mu_B\) is the population mean difference
Alternative Hypothesis (\(H_a\)): Statement we want to find evidence for
- Two-tailed test: \(H_a: \mu_d \neq 0\)
- Right-tailed test: \(H_a: \mu_d > 0\)
- Left-tailed test: \(H_a: \mu_d < 0\)
Test Statistic: \(t = \frac{\bar{d} - \mu_{d0}}{s_d/\sqrt{n}}\) where \(\bar{d} = \frac{\sum d_i}{n}\), \(s_d = \sqrt{\frac{\sum (d_i - \bar{d})^2}{n-1}}\), and \(df = n-1\)
Decision Rule: Reject \(H_0\) if \(|t| > t_{\alpha/2}\) (two-tailed) or \(t > t_\alpha\) (right-tailed) or \(t < -t_\alpha\) (left-tailed)

Comparing Two Population Means: Paired Difference

Goal: Judge two programs by the within-pair change, \(d_i = A_i - B_i\), canceling out person-to-person noise.
Design: Same participant (or matched pair) supplies both measures; analysis is done on the single column of differences, not the raw scores.

General Testing Procedure

Calculate Differences: \(d_i = A_i - B_i\) for each pair
Compute Summary Statistics:
- Mean difference: \(\bar{d} = \frac{\sum d_i}{n}\)
- Standard deviation: \(s_d = \sqrt{\frac{\sum (d_i - \bar{d})^2}{n-1}}\)
Check Conditions:
- Normality of differences (QQ-plot or Shapiro-Wilk)
- Random sampling/assignment
Select Test Statistic:

\(t = \frac{\bar{d} - \mu_{d0}}{s_d/\sqrt{n}} \quad \text{with } df = n-1\)

Where \(\mu_{d0}\) is the hypothesized mean difference (0 under \(H_0\))
Make Decision:
- Compare p-value to \(\alpha\) (typically 0.05)
- Interpret confidence interval for \(\mu_d\)

Worked Example 1: Right-Tailed Paired t-Test

Context: Medical study testing if new drug increases blood pressure (n=15)

\(H_0\): \(\mu_d = 0\) (no change in blood pressure)
\(H_a\): \(\mu_d > 0\) (drug increases blood pressure)
Sample: n=15 paired measurements
Test Statistic: \[t = \frac{5.47}{1.30/\sqrt{15}} = 16.26\]
Critical Value: \(t_{0.05, 14} = 1.761\)
Decision: Since \(16.26 > 1.761\), reject \(H_0\)
Conclusion: Strong evidence that drug increases blood pressure

Worked Example 1: R Verification

# Simulated verification
t.test(after, before, paired = TRUE, alternative = "greater")


    Paired t-test

data:  after and before
t = 16.261, df = 14, p-value = 8.705e-11
alternative hypothesis: true mean difference is greater than 0
95 percent confidence interval:
 4.874552      Inf
sample estimates:
mean difference 
       5.466667

Worked Example 2: Two-Tailed Paired t-Test

Context: Educational study testing if teaching method affects test scores (n=12)

\(H_0\): \(\mu_d = 0\) (no difference between methods)
\(H_a\): \(\mu_d \neq 0\) (methods differ)
Sample: n=12 paired measurements
Test Statistic: \[t = \frac{5.08}{1.16/\sqrt{12}} = 15.12\]
Critical Value: \(|t_{0.025, 11}| = 2.201\)
Decision: Since \(|15.12| > 2.201\), reject \(H_0\)
Conclusion: Strong evidence that teaching methods differ

Worked Example 2: R Verification

# Simulated verification
t.test(new_method, traditional, paired = TRUE, alternative = "two.sided")


    Paired t-test

data:  new_method and traditional
t = 15.122, df = 11, p-value = 1.047e-08
alternative hypothesis: true mean difference is not equal to 0
95 percent confidence interval:
 4.343445 5.823221
sample estimates:
mean difference 
       5.083333

Interpretation: The new teaching method appears to significantly improve test scores.

Interpretation Guidance

Significant Result: Reject \(H_0\) if p-value < \(\alpha\)
- “Evidence suggests Method A outperforms Method B (t(19)=-0.77, p=0.774)”
Nonsignificant Result: Fail to reject \(H_0\)
- “No statistically significant difference detected”
Always Report:
- Effect size (mean difference)
- Confidence interval
- Practical significance

Selecting Appropriate Statistical Tests

For Normal Distributions: Apply the paired \(t\)-test.
For Non-Normal Distributions: Use non-parametric methods that do not assume a normal distribution.

Connection to Confidence Intervals

A 95% CI for \(\mu_d\) is constructed as:

\[\bar{d} \pm t^*_{\alpha/2} \frac{s_d}{\sqrt{n}}\]

Interpretation: “We are 95% confident the true mean difference lies between [X, Y]”
Decision Rule: If CI excludes 0, reject \(H_0\) at \(\alpha=0.05\)

# Define the scores for Method A and Method B
methodA <- c(85, 88, 90, 92, 91, 89, 93, 95, 96, 97, 98,
             99, 100, 101, 102, 103, 104, 105, 106, 107)
methodB <- c(83, 89, 87, 84, 92, 90, 85, 91, 98, 94, 100,
             101, 99, 111, 111, 106, 109, 103, 111, 114)

# Calculate differences
differences <- methodA - methodB

# Generate a QQ plot for normality check
qq_norm <- ggplot(data = tibble(differences), aes(sample = differences)) +
  stat_qq() + stat_qq_line() +
  ggtitle("QQ Plot of Differences")

# Generate a histogram for normality check
histogram <- ggplot(data = as.data.frame(differences), aes(x = differences)) +
  geom_histogram(bins = 10, color = "maroon", fill = "gold") +
  ggtitle("Histogram of Differences")

Preliminary Tests in R

# Perform the Anderson-Darling test for normality
library(nortest)
ad.test(differences)


    Anderson-Darling normality test

data:  differences
A = 0.2269, p-value = 0.787

# Calculate standard deviation of differences
sd(differences) # s_d

[1] 4.92336

# Calculate critical value for t-distribution
qt(0.975, df = 20 - 1) # critical value

[1] 2.093024

`t.test` for paired samples

# Perform the paired t-test
t.test(methodA, methodB, paired = TRUE, alternative = "greater")


    Paired t-test

data:  methodA and methodB
t = -0.7721, df = 19, p-value = 0.7752
alternative hypothesis: true mean difference is greater than 0
95 percent confidence interval:
 -2.753597       Inf
sample estimates:
mean difference 
          -0.85

t.test(differences~1, alternative = "greater", data = tibble(differences)) # alternate 1


    One Sample t-test

data:  differences
t = -0.7721, df = 19, p-value = 0.7752
alternative hypothesis: true mean is greater than 0
95 percent confidence interval:
 -2.753597       Inf
sample estimates:
mean of x 
    -0.85

Pair	Method A	Method B
11	98	100
12	99	101
13	100	99
14	101	111
15	102	111
16	103	106
17	104	109
18	105	103
19	106	111
20	107	114

Pair	Method A	Method B
11	98	100
12	99	101
13	100	99
14	101	111
15	102	111
16	103	106
17	104	109
18	105	103
19	106	111
20	107	114