Take-home Final Exam

You should work entirely independently on this final exam. In answering questions in which I ask you to write an S-PLUS function, please give me the function. If I ask you to compute something, tell me the S-PLUS expression you used for the computation. You may put your solutions in a file and e-mail them to me or write them on paper to turn in. I would like to receive this takehome portion of the final exam by noon on Wednesday, April 7 at the very latest.

You may ask me questions of clarification, but should not expect extensive assistance as you might on a homework problem.

The Problems

  1. The file ~larget/496/cereal contains nutritional information on several brands of cold cereal. Read this data into S-PLUS and answer the following questions:
    (a) Find the median sodium content.
    (b) Among cereals with more than the median sodium content, find the mean number of calories.
    (c) For each shelf position, find the mean amount of sugar.
  2. Write a function called std.dev which calculates the sample standard deviation of an array x according to this algorithm:
    1. Find the mean of five randomly chosen elements of x.
    2. Subtract the mean from each element of x.
    3. Find the sample standard deviation of this data by the desk calculator algorithm.
    This function should not use the S-PLUS function var. Use your function to find the standard deviation of x <- 2^(0:31).
  3. Write a function which generates a sequence of uniform pseudo-random numbers by a linear congruential generator,
    xn+1 = a xn + c mod m, un = xn / m,
    where a = 71071, c = 0, m = 2^31, and the initial seed is x0 = 1.

    Write another function which tests the uniformity of this generator by producing a sequence of 10,000 pseudo-random numbers, counting the number of observed numbers in each of 512 equal sized bins, applying the chi-square goodness-of-fit-test, and returning the p-value. (The p-value will be the area to the right of the test statistic under a chi-square distribution with 511 degrees of freedom.)

  4. For the cereal data, fit a multiple regression model of calories as a function of protein, fat, and carbohydrates (which is under carbo in the data set). Report a summary of the fit, including the estimated regression coefficients and standard errors. Plot, on a single piece of paper, two diagnostic graphs: one of the residuals versus the fitted values and another of the absolute value of the residuals versus the fitted values. (See the on-line notes for more information.)
  5. The 5% trimmed mean of a set of data is the mean of the middle 90% after excluding the upper and lower 5%. The file ~larget/496/final5 contains a sample of 100 data points from some distribution. Find the 5% trimmed mean of this sample. (There is a built in S-PLUS function which computes a trimmed mean. Use on-line help to find it.)

    Write a function which implements the bootstrap on this data to estimate a standard error for the population 5% trimmed mean from 200 bootstrap samples.

  6. For extra credit, type
    faces(as.matrix(cereal[1:9,3:10]), labels=row.names(cereal)[1:9])
    and print out the resulting plot.

Last modified: April 1, 1997

Bret Larget, larget@mathcs.duq.edu