Here are some exercises to help sharpen your R skills. I'll be adding more problems later. Feel free to discuss your solutions with me (and amongst yourselves), either in class or through e-mail.
Basics
-
Figure out a way to insert a value at a given position in a
vector. Implement this using a function. For example, the function
call might look like this:
insert(x, where, what) ## x: initial vector ## where: which position to insert in ## what: what to insert
A possible test case:> y <- 1:10 > y <- insert(x, 5, 0) > y [1] 1 2 3 4 0 5 6 7 8 9 10
How would you extend this idea to insert rows in a data frame? -
How would you check if two numeric vectors (possibly containing
NA's) are the same? The logical comparison
==is not enough because comparing anything with anNAwill produce anNA(Hint: useis.na) - Create a factor with 5 levels, and then change the levels so that two of the existing levels now have the same name. How does the factor change? (Hint: look at the numeric codes of the result.) What happens if you add a level that does not exist?
- Load the juul dataset from the ISwR package and read the
corresponding help page.
- Extract the subset of the data that corresponds to girls between ages 7 and 14 years
- Plot igf1 vs age for both boys and girls. From a visual inspection, does this relationship seem different for boys and girls?
Probability through simulation
For statisticians, one of the common uses of computers is to approximate (using simulation) probabilities that are difficult to compute theoretically. This is not true in the following example, but let us try to use simulation anyway.
The example is taken from your 541 course notes, page 3-13:
Suppose that two balls are randomly drawn, one after the other, from a container holding four red and two green balls. Define the following events:
- A = { the first ball is red }
- B = { the second ball is red }
We wish to find (approximately) the following probabilities:
- P(A)
- P(B)
- P(A and B)
- P(A | B)
- P(B | A)
The idea here is to (virtually) perform this experiment a large number of times, and compute the proportion of cases in which a particular event occurs. If the number of times the experiment has been repeated is large enough, this proportion should closely approximate the probability of the event (this is known, loosely speaking, as the Law of Large Numbers).
- Figure out how to simulate one run of this experiment in R
(hint: use the
samplefunction). The result should be a character vector of length 2, e.g.c("red", "green") - Given the colors of the two balls, figure out how to detect if the events A and B have occurred.
- Repeat this experiment a large number of times (say 500, but
this number should be easy to change). The
replicatefunction can be very useful here. - Use the results to compute approximate values for the probabilities above. The last two are slightly tricky.
Area of a circle
Use the same ideas to approximately calculate the area of a circle
of radius 1. Note that this is related to P(X^2 + Y^2 <
1) where X and Y are chosen randomly
from the unit square [-1, 1] x [-1, 1]. Uniform random
numbers can be generated by runif.
Summer
Institute for Training in Biostatistics (SIBS)