(There's nothing to see here, folks. Move along.)
Since I don't have data handy, I'll make up some men's heights by
getting a random sample of size 30 from a normal distribution with
mean 70 and standard deviation 3 (inches). (Recall from the chapter 4
R notes that we can generate \( n \) random numbers from N(\( \mu \),
\( \sigma^2 \)) via rnorm(n, mu, sigma)
.)
> mens.heights = rnorm(30, 70, 3)
> mens.heights
[1] 69.86 72.09 70.63 66.22 71.48 74.19 64.97 69.50 73.04 74.41 75.36
[12] 71.10 66.18 69.06 67.80 74.88 69.53 72.20 66.83 74.56 67.63 67.57
[23] 72.37 71.03 66.41 75.38 73.29 70.59 70.07 71.79
Here's one way to find a 95% confidence interval for the unknown mean \( \mu \). (We'll, we know \( \mu \) is 70 since I just generated the data! But suppose all we have is the sample, so we don't know \( \mu \).)
> # The parentheses around an expression, as in the next line of code, force
> # R to print its value. This saves me from adding a display line
> # consisting only of 'n'.
> (n = length(mens.heights))
[1] 30
> (x.bar = mean(mens.heights))
[1] 70.67
> (s = sd(mens.heights))
[1] 3.018
> # Note that there aren't parentheses around the next line of code, so the
> # value of alpha isn't displayed. I didn't display it because it's easy to
> # figure out that alpha is .05.
> alpha = 1 - 0.95
> (z = -qnorm(alpha/2))
[1] 1.96
> (error.margin = z * s/sqrt(n))
[1] 1.08
> low = x.bar - error.margin
> high = x.bar + error.margin
> (interval = c(low, high))
[1] 69.59 71.75
e.g. Here's one way to find a 95% “plus-four” confidence interval for the unknown proportion, p, of leaky gas tanks, based on a SRS that shows 13 leaky tanks out of 87 tested.
> n = 87
> X = 13
> (n.tilde = n + 4)
[1] 91
> (p.tilde = (X + 2)/n.tilde)
[1] 0.1648
> alpha = 1 - 0.95
> (z = -qnorm(alpha/2))
[1] 1.96
> (error.margin = z * sqrt(p.tilde * (1 - p.tilde)/n.tilde))
[1] 0.07623
> low = p.tilde - error.margin
> high = p.tilde + error.margin
> (interval = c(low, high))
[1] 0.0886 0.2411
(This is the answer we got in the 5.3 notes.)
e.g. Here's one way to find the interval we made for nitrogen in ancient air.
> nitrogen = c(63.4, 65, 64.4, 63.3, 54.8, 64.5, 60.8, 49.1, 51)
> (n = length(nitrogen))
[1] 9
> (x.bar = mean(nitrogen))
[1] 59.59
> (s = sd(nitrogen))
[1] 6.255
> alpha = 1 - 0.9
> (t = -qt(alpha/2, n - 1))
[1] 1.86
> (error.margin = t * s/sqrt(n))
[1] 3.877
> low = x.bar - error.margin
> high = x.bar + error.margin
> (interval = c(low, high))
[1] 55.71 63.47
The function qt()
, used above, is new. Recall (from section 4.3)
that we used the four functions dnorm()
, pnorm()
, qnorm()
, and
rnorm()
. The “norm
” suffix refers to the normal distribution. The
d
, p
, q
, and r
prefixes refer to “density,” “probability
(cumulative),” “quantile,” and “random.” R has these four d
, p
,
q
, and r
functions for each distribution we'll encounter.
In particular, for the Student's t distribution with df
degrees of
freedom,
pt(t, df)
gives cumulative probability (left tail area) up to t
from the \( t_{df} \) distribution (where df
is “degrees of freedom,” or
n-1)
qt(p, df)
gives the t corresponding to probability p (i.e. the t
cutting off left tail area p)
Another way to find a Student's t confidence interval for data in
vector x
is the function call t.test(x, conf.level=.95)
. Change
the .95
to the required confidence level. Much of the output won't
make sense until we've studied chapter 6, but, for now, notice that
the confidence interval is in there: e.g.
> t.test(nitrogen, conf.level = 0.9)
One Sample t-test
data: nitrogen
t = 28.58, df = 8, p-value = 2.43e-09
alternative hypothesis: true mean is not equal to 0
90 percent confidence interval:
55.71 63.47
sample estimates:
mean of x
59.59