Practice problems for R: set 1

SIBS logoSummer Institute for Training in Biostatistics (SIBS)

Here are some exercises to help sharpen your R skills. I'll be adding more problems later. Feel free to discuss your solutions with me (and amongst yourselves), either in class or through e-mail.

Basics

  1. Figure out a way to insert a value at a given position in a vector. Implement this using a function. For example, the function call might look like this:
    insert(x, where, what)
    ## x: initial vector
    ## where: which position to insert in
    ## what: what to insert
    
    A possible test case:
    > y <- 1:10
    > y <- insert(x, 5, 0)
    > y
     [1]  1  2  3  4  0  5  6  7  8  9 10
    
    How would you extend this idea to insert rows in a data frame?
  2. How would you check if two numeric vectors (possibly containing NA's) are the same? The logical comparison == is not enough because comparing anything with an NA will produce an NA (Hint: use is.na)
  3. Create a factor with 5 levels, and then change the levels so that two of the existing levels now have the same name. How does the factor change? (Hint: look at the numeric codes of the result.) What happens if you add a level that does not exist?
  4. Load the juul dataset from the ISwR package and read the corresponding help page.
    • Extract the subset of the data that corresponds to girls between ages 7 and 14 years
    • Plot igf1 vs age for both boys and girls. From a visual inspection, does this relationship seem different for boys and girls?

Probability through simulation

For statisticians, one of the common uses of computers is to approximate (using simulation) probabilities that are difficult to compute theoretically. This is not true in the following example, but let us try to use simulation anyway.

The example is taken from your 541 course notes, page 3-13:

Suppose that two balls are randomly drawn, one after the other, from a container holding four red and two green balls. Define the following events:

We wish to find (approximately) the following probabilities:

The idea here is to (virtually) perform this experiment a large number of times, and compute the proportion of cases in which a particular event occurs. If the number of times the experiment has been repeated is large enough, this proportion should closely approximate the probability of the event (this is known, loosely speaking, as the Law of Large Numbers).

  1. Figure out how to simulate one run of this experiment in R (hint: use the sample function). The result should be a character vector of length 2, e.g. c("red", "green")
  2. Given the colors of the two balls, figure out how to detect if the events A and B have occurred.
  3. Repeat this experiment a large number of times (say 500, but this number should be easy to change). The replicate function can be very useful here.
  4. Use the results to compute approximate values for the probabilities above. The last two are slightly tricky.

Area of a circle

Use the same ideas to approximately calculate the area of a circle of radius 1. Note that this is related to P(X^2 + Y^2 < 1) where X and Y are chosen randomly from the unit square [-1, 1] x [-1, 1]. Uniform random numbers can be generated by runif.

deepayan@stat.wisc.edu
Last modified: Tue Jun 21 09:24:07 CDT 2005