Chapter 5

5.1 Point Estimation

(There's nothing to see here, folks. Move along.)

5.2 Large-Sample Confidence Intervals for a Population Mean

Since I don't have data handy, I'll make up some men's heights by getting a random sample of size 30 from a normal distribution with mean 70 and standard deviation 3 (inches). (Recall from the chapter 4 R notes that we can generate \( n \) random numbers from N(\( \mu \), \( \sigma^2 \)) via rnorm(n, mu, sigma).)

> mens.heights = rnorm(30, 70, 3)
> mens.heights

 [1] 69.86 72.09 70.63 66.22 71.48 74.19 64.97 69.50 73.04 74.41 75.36
[12] 71.10 66.18 69.06 67.80 74.88 69.53 72.20 66.83 74.56 67.63 67.57
[23] 72.37 71.03 66.41 75.38 73.29 70.59 70.07 71.79

Here's one way to find a 95% confidence interval for the unknown mean \( \mu \). (We'll, we know \( \mu \) is 70 since I just generated the data! But suppose all we have is the sample, so we don't know \( \mu \).)

> # The parentheses around an expression, as in the next line of code, force
> # R to print its value. This saves me from adding a display line
> # consisting only of 'n'.
> (n = length(mens.heights))

[1] 30

> (x.bar = mean(mens.heights))

[1] 70.67

> (s = sd(mens.heights))

[1] 3.018

> # Note that there aren't parentheses around the next line of code, so the
> # value of alpha isn't displayed. I didn't display it because it's easy to
> # figure out that alpha is .05.
> alpha = 1 - 0.95
> (z = -qnorm(alpha/2))

[1] 1.96

> (error.margin = z * s/sqrt(n))

[1] 1.08

> low = x.bar - error.margin
> high = x.bar + error.margin
> (interval = c(low, high))

[1] 69.59 71.75

5.3 Confidence Intervals for Proportions

e.g. Here's one way to find a 95% “plus-four” confidence interval for the unknown proportion, p, of leaky gas tanks, based on a SRS that shows 13 leaky tanks out of 87 tested.

> n = 87
> X = 13
> (n.tilde = n + 4)

[1] 91

> (p.tilde = (X + 2)/n.tilde)

[1] 0.1648

> alpha = 1 - 0.95
> (z = -qnorm(alpha/2))

[1] 1.96

> (error.margin = z * sqrt(p.tilde * (1 - p.tilde)/n.tilde))

[1] 0.07623

> low = p.tilde - error.margin
> high = p.tilde + error.margin
> (interval = c(low, high))

[1] 0.0886 0.2411

(This is the answer we got in the 5.3 notes.)

5.4 Small-Sample Confidence Intervals for a Population Mean

e.g. Here's one way to find the interval we made for nitrogen in ancient air.

> nitrogen = c(63.4, 65, 64.4, 63.3, 54.8, 64.5, 60.8, 49.1, 51)
> (n = length(nitrogen))

[1] 9

> (x.bar = mean(nitrogen))

[1] 59.59

> (s = sd(nitrogen))

[1] 6.255

> alpha = 1 - 0.9
> (t = -qt(alpha/2, n - 1))

[1] 1.86

> (error.margin = t * s/sqrt(n))

[1] 3.877

> low = x.bar - error.margin
> high = x.bar + error.margin
> (interval = c(low, high))

[1] 55.71 63.47

The function qt(), used above, is new. Recall (from section 4.3) that we used the four functions dnorm(), pnorm(), qnorm(), and rnorm(). The “norm” suffix refers to the normal distribution. The d, p, q, and r prefixes refer to “density,” “probability (cumulative),” “quantile,” and “random.” R has these four d, p, q, and r functions for each distribution we'll encounter. In particular, for the Student's t distribution with df degrees of freedom,

pt(t, df) gives cumulative probability (left tail area) up to t from the \( t_{df} \) distribution (where df is “degrees of freedom,” or n-1)
qt(p, df) gives the t corresponding to probability p (i.e. the t cutting off left tail area p)

Another way to find a Student's t confidence interval for data in vector x is the function call t.test(x, conf.level=.95). Change the .95 to the required confidence level. Much of the output won't make sense until we've studied chapter 6, but, for now, notice that the confidence interval is in there: e.g.

> t.test(nitrogen, conf.level = 0.9)


    One Sample t-test

data:  nitrogen
t = 28.58, df = 8, p-value = 2.43e-09
alternative hypothesis: true mean is not equal to 0
90 percent confidence interval:
 55.71 63.47
sample estimates:
mean of x 
    59.59