We don't need any new R to do section 7.1 calculations. e.g. Here are the numbers from problem 7.1.3 on alloy melting points in the 7.1 notes.
n.X = 35
x.bar = 517
s.X = 2.4
n.Y = 47
y.bar = 510.1
s.Y = 2.1
alpha = 1 - 0.99
(z = -qnorm(alpha/2))
## [1] 2.576
(point.estimate = x.bar - y.bar)
## [1] 6.9
(error.margin = z * sqrt(s.X^2/n.X + s.Y^2/n.Y))
## [1] 1.309
low = point.estimate - error.margin
high = point.estimate + error.margin
(interval = c(low, high))
## [1] 5.591 8.209
e.g. Here's problem 7.1.14.a from the 7.1 notes.
n.X = 40
x.bar = 2.6
s.X = 1.4 # X = A crayon strength
n.Y = 40
y.bar = 3.8
s.Y = 1.2 # Y = B crayon strength
delta.0 = 0 # null hypothesis difference of means
(point.estimate = x.bar - y.bar)
## [1] -1.2
(s = sqrt(s.X^2/n.X + s.Y^2/n.Y))
## [1] 0.2915
(z = (point.estimate - delta.0)/s)
## [1] -4.116
(p.value = pnorm(z))
## [1] 1.928e-05
We don't need any new R to solve 7.2 problems. (I'll save you some reading by not solving examples here.)
We used t.test() for a small-sample test of one mean in section 6.4. If we give it an additional y argument, it can handle inference on the difference between two means. t.test(x, y, alternative="two.sided", mu=0, conf.level=.95) tests \( H_0: \mu_X - \mu_Y = \mu_0 \) = mu (which defaults to 0) for samples x and y from two normal populations, while also giving a conf.level (which defaults to .95) confidence interval for \( \mu_X - \mu_Y \). e.g. Here is the Borneo logging example from the 7.3 notes:
unlogged = c(22, 18, 22, 20, 15, 21, 13, 13, 19, 13, 19, 15)
logged = c(17, 4, 18, 14, 18, 15, 15, 10, 12)
t.test(unlogged, logged, alternative = "greater", conf.level = 0.9)
##
## Welch Two Sample t-test
##
## data: unlogged and logged
## t = 2.114, df = 14.79, p-value = 0.02596
## alternative hypothesis: true difference in means is greater than 0
## 90 percent confidence interval:
## 1.401 Inf
## sample estimates:
## mean of x mean of y
## 17.50 13.67
While the test statistic, degrees of freedom, and P-value match our work in the 7.3 notes, the confidence interval doesn't. It's a “one-sided” interval (1.401, Inf) that we didn't discuss. To get the two-sided interval we discussed, use alternative="two-sided":
t.test(unlogged, logged, alternative = "two.sided", conf.level = 0.9)
##
## Welch Two Sample t-test
##
## data: unlogged and logged
## t = 2.114, df = 14.79, p-value = 0.05192
## alternative hypothesis: true difference in means is not equal to 0
## 90 percent confidence interval:
## 0.6517 7.0150
## sample estimates:
## mean of x mean of y
## 17.50 13.67
Now the interval is right (and the test is wrong, because we don't want a two-sided test).
We don't need any new R to do section 7.4 calculations, unless subtracting vectors counts as new:
x = c(5, 6, 7)
y = c(3, 2, 1)
(d = x - y) # a vector of differences: d[i] is x[i] - y[i]
## [1] 2 4 6
Once we have a vector of differences, we can proceed with one-sample methods.
R's function var.test(x, y, ratio = 1, alternative = "two.sided", conf.level = .95) tests \( H_0: \frac{\sigma^2_X}{\sigma^2_Y} = \) ratio
for two samples x and y from normal populations. e.g. Here is the target-shooting example from the 7.5 notes:
molly = c(31, 40, -37, -30, 13)
jenny = c(-10, 38, 35, 9, 28)
var.test(molly, jenny, ratio = 1, alternative = "two.sided")
##
## F test to compare two variances
##
## data: molly and jenny
## F = 3.024, num df = 4, denom df = 4, p-value = 0.3092
## alternative hypothesis: true ratio of variances is not equal to 1
## 95 percent confidence interval:
## 0.3149 29.0440
## sample estimates:
## ratio of variances
## 3.024
For \( X \sim F_{\nu_1, \nu_2}^2 \), the cumulative probability distribution
function F(x) = P(X ≤ x) is given by pf(x, nu1, nu2). e.g. to get
the textbook's version of the P-value for the target-shooting example (calculated as twice the are right of \( f > 0 \)), we could have used
(p.value = 2 * (1 - pf(3.02, length(molly) - 1, length(jenny) - 1)))
## [1] 0.3097