R for STAT 224, by Example

Reassurance

R has a reputation for being powerful but difficult to learn. You don't need to learn R for STAT 224! You will need to use R by copying and pasting my code examples (after the “>” prompts, below) and making minor changes to them. Please work through these notes by copying and pasting their R code into RStudio's Console. If you can't reproduce my output, please seek help from your TA (or me). Then return to these notes when you need R for a homework problem. Please let me know of errors.

Getting Started

Chapter 1

Use R to find the sum 3+4. Type it in the Console pane and press Enter:

> 3 + 4
[1] 7

The [1] in the output is a label. The output is a vector and we're seeing its first and only value, which is 7.

Create a variable by assigning it to a value. Try

> x = 5

To see a variable's value, type its name:

> x
[1] 5

Enter an expression to see its value:

> x + 6
[1] 11
> sqrt(x)
[1] 2.236

Create a vector via the function c():

> c(10, 20, 30)
[1] 10 20 30

Save it in a variable:

> y = c(2, 3, 4, 7)

1.1 Sampling

(There's nothing to see here, folks. Move along.)

1.2 Summary Statistics

Get summary statistics via the functions mean(), sd(), median(), and summary():

> mean(y)
[1] 4
> sd(y)  # standard deviation
[1] 2.16
> median(y)
[1] 3.5
> summary(y)
   Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
   2.00    2.75    3.50    4.00    4.75    7.00 

1.3 Graphical Summaries

stem()

First put the data from our section 1.3 lecture example in a vector:

> x = c(15.2, 15.4, 16.5, 16.9, 17.5, 17.5, 18.1, 18.9, 19.1, 19.4, 19.5, 19.7, 
+     19.9, 20.3, 20.4, 21, 21.2, 21.6, 21.7, 21.8, 21.8, 22.1, 22.1, 22.5, 22.6, 
+     22.7, 22.7, 22.9, 23, 23.2, 23.3, 23.3, 23.4, 23.4, 23.6, 23.7, 23.8, 24.5, 
+     24.6, 24.7, 24.8, 24.8, 25.8, 26, 26.1, 26.5, 27, 28.4, 28.5, 30.2, 30.4)
> x
 [1] 15.2 15.4 16.5 16.9 17.5 17.5 18.1 18.9 19.1 19.4 19.5 19.7 19.9 20.3
[15] 20.4 21.0 21.2 21.6 21.7 21.8 21.8 22.1 22.1 22.5 22.6 22.7 22.7 22.9
[29] 23.0 23.2 23.3 23.3 23.4 23.4 23.6 23.7 23.8 24.5 24.6 24.7 24.8 24.8
[43] 25.8 26.0 26.1 26.5 27.0 28.4 28.5 30.2 30.4

The + in the code block above that produced this vector is a prompt indicating that a command was continued onto a second line. You can paste the entire command, but you'll need to delete the + characters before typing Enter.

The [43] label on the last output line indicates that data starting at index 43 follow (that is, points in positions 43 through 51). Now make the plot:

> stem(x)

  The decimal point is at the |

  14 | 24
  16 | 5955
  18 | 1914579
  20 | 34026788
  22 | 1156779023344678
  24 | 567888
  26 | 0150
  28 | 45
  30 | 24

dotplot()

The dotplot we want requires downloading an R graphics package called lattice (do this only once per lifetime):

> install.packages("lattice")

Load the lattice package into the current R session via require() (do this once per session):

> require(lattice)
Loading required package: lattice

Now make the dotplot:

> dotplot(x)

plot of chunk dotplot

histogram()

> histogram(x, type = "count")

plot of chunk histogram

To plot relative frequencies instead of counts, use type="density":

> histogram(x, type = "density")

plot of chunk density histogram

boxplot()

> boxplot(x)

plot of chunk boxplot

Adding labels

Labels can be added to any of the preceding graphs via the optional parameters main (main title), xlab (x axis label), and ylab (y axis label). For example,

> histogram(x, type = "count", main = "Average Commute by State", xlab = "Time (minutes)", 
+     ylab = "#States")

plot of chunk labeled histogram

Printing a graph

After you see a graph in the Plots tab of the “Files, Plots, Packages, Help” pane, you can use the Plots tab's “Export” menu to choose “Save Plot as PDF …” and then print the “.pdf” file.

densityplot()

A density plot isn't covered in our book or lecture, but it's often more helpful than a histogram:

> densityplot(x)

plot of chunk densityplot