Chapter 7 Inferences for Two Samples

7.1 Large-Sample Inferences on the Difference Between Two Population Means

We don't need any new R to do section 7.1 calculations. e.g. Here are the numbers from problem 7.1.3 on alloy melting points in the 7.1 notes.

n.X = 35
x.bar = 517
s.X = 2.4
n.Y = 47
y.bar = 510.1
s.Y = 2.1
alpha = 1 - 0.99
(z = -qnorm(alpha/2))
## [1] 2.576
(point.estimate = x.bar - y.bar)
## [1] 6.9
(error.margin = z * sqrt(s.X^2/n.X + s.Y^2/n.Y))
## [1] 1.309
low = point.estimate - error.margin
high = point.estimate + error.margin
(interval = c(low, high))
## [1] 5.591 8.209

e.g. Here's problem 7.1.14.a from the 7.1 notes.

n.X = 40
x.bar = 2.6
s.X = 1.4  # X = A crayon strength
n.Y = 40
y.bar = 3.8
s.Y = 1.2  # Y = B crayon strength
delta.0 = 0  # null hypothesis difference of means
(point.estimate = x.bar - y.bar)
## [1] -1.2
(s = sqrt(s.X^2/n.X + s.Y^2/n.Y))
## [1] 0.2915
(z = (point.estimate - delta.0)/s)
## [1] -4.116
(p.value = pnorm(z))
## [1] 1.928e-05

7.2 Inferences on the Difference Between Two Proportions

We don't need any new R to solve 7.2 problems. (I'll save you some reading by not solving examples here.)

7.3 Small-Sample Inferences on the Difference Between Two Means

We used t.test() for a small-sample test of one mean in section 6.4. If we give it an additional y argument, it can handle inference on the difference between two means. t.test(x, y, alternative="two.sided", mu=0, conf.level=.95) tests \( H_0: \mu_X - \mu_Y = \mu_0 \) = mu (which defaults to 0) for samples x and y from two normal populations, while also giving a conf.level (which defaults to .95) confidence interval for \( \mu_X - \mu_Y \). e.g. Here is the Borneo logging example from the 7.3 notes:

unlogged = c(22, 18, 22, 20, 15, 21, 13, 13, 19, 13, 19, 15)
logged = c(17, 4, 18, 14, 18, 15, 15, 10, 12)
t.test(unlogged, logged, alternative = "greater", conf.level = 0.9)
## 
##  Welch Two Sample t-test
## 
## data:  unlogged and logged
## t = 2.114, df = 14.79, p-value = 0.02596
## alternative hypothesis: true difference in means is greater than 0
## 90 percent confidence interval:
##  1.401   Inf
## sample estimates:
## mean of x mean of y 
##     17.50     13.67

While the test statistic, degrees of freedom, and P-value match our work in the 7.3 notes, the confidence interval doesn't. It's a “one-sided” interval (1.401, Inf) that we didn't discuss. To get the two-sided interval we discussed, use alternative="two-sided":

t.test(unlogged, logged, alternative = "two.sided", conf.level = 0.9)
## 
##  Welch Two Sample t-test
## 
## data:  unlogged and logged
## t = 2.114, df = 14.79, p-value = 0.05192
## alternative hypothesis: true difference in means is not equal to 0
## 90 percent confidence interval:
##  0.6517 7.0150
## sample estimates:
## mean of x mean of y 
##     17.50     13.67

Now the interval is right (and the test is wrong, because we don't want a two-sided test).

7.4 Inferences Using Paired Data

We don't need any new R to do section 7.4 calculations, unless subtracting vectors counts as new:

x = c(5, 6, 7)
y = c(3, 2, 1)
(d = x - y)  # a vector of differences: d[i] is x[i] - y[i]
## [1] 2 4 6

Once we have a vector of differences, we can proceed with one-sample methods.

7.5 The F Test for Equality of Variance

R's function var.test(x, y, ratio = 1, alternative = "two.sided", conf.level = .95) tests \( H_0: \frac{\sigma^2_X}{\sigma^2_Y} = \) ratio for two samples x and y from normal populations. e.g. Here is the target-shooting example from the 7.5 notes:

molly = c(31, 40, -37, -30, 13)
jenny = c(-10, 38, 35, 9, 28)
var.test(molly, jenny, ratio = 1, alternative = "two.sided")
## 
##  F test to compare two variances
## 
## data:  molly and jenny
## F = 3.024, num df = 4, denom df = 4, p-value = 0.3092
## alternative hypothesis: true ratio of variances is not equal to 1
## 95 percent confidence interval:
##   0.3149 29.0440
## sample estimates:
## ratio of variances 
##              3.024

F probabilities

For \( X \sim F_{\nu_1, \nu_2}^2 \), the cumulative probability distribution function F(x) = P(X ≤ x) is given by pf(x, nu1, nu2). e.g. to get the textbook's version of the P-value for the target-shooting example (calculated as twice the are right of \( f > 0 \)), we could have used

(p.value = 2 * (1 - pf(3.02, length(molly) - 1, length(jenny) - 1)))
## [1] 0.3097