R has a reputation for being powerful but difficult to learn. You don't need to learn R for STAT 224! You will need to use R by copying and pasting my code examples (after the “>” prompts, below) and making minor changes to them. Please work through these notes by copying and pasting their R code into RStudio's Console. If you can't reproduce my output, please seek help from your TA (or me). Then return to these notes when you need R for a homework problem. Please let me know of errors.
Use R to find the sum 3+4. Type it in the Console pane and press Enter:
> 3 + 4
[1] 7
The [1] in the output is a label. The output is a vector and we're seeing its first and only value, which is 7.
Create a variable by assigning it to a value. Try
> x = 5
To see a variable's value, type its name:
> x
[1] 5
Enter an expression to see its value:
> x + 6
[1] 11
> sqrt(x)
[1] 2.236
Create a vector via the function c():
> c(10, 20, 30)
[1] 10 20 30
Save it in a variable:
> y = c(2, 3, 4, 7)
(There's nothing to see here, folks. Move along.)
Get summary statistics via the functions mean(), sd(), median(), and summary():
> mean(y)
[1] 4
> sd(y) # standard deviation
[1] 2.16
> median(y)
[1] 3.5
> summary(y)
Min. 1st Qu. Median Mean 3rd Qu. Max.
2.00 2.75 3.50 4.00 4.75 7.00
stem()First put the data from our section 1.3 lecture example in a vector:
> x = c(15.2, 15.4, 16.5, 16.9, 17.5, 17.5, 18.1, 18.9, 19.1, 19.4, 19.5, 19.7,
+ 19.9, 20.3, 20.4, 21, 21.2, 21.6, 21.7, 21.8, 21.8, 22.1, 22.1, 22.5, 22.6,
+ 22.7, 22.7, 22.9, 23, 23.2, 23.3, 23.3, 23.4, 23.4, 23.6, 23.7, 23.8, 24.5,
+ 24.6, 24.7, 24.8, 24.8, 25.8, 26, 26.1, 26.5, 27, 28.4, 28.5, 30.2, 30.4)
> x
[1] 15.2 15.4 16.5 16.9 17.5 17.5 18.1 18.9 19.1 19.4 19.5 19.7 19.9 20.3
[15] 20.4 21.0 21.2 21.6 21.7 21.8 21.8 22.1 22.1 22.5 22.6 22.7 22.7 22.9
[29] 23.0 23.2 23.3 23.3 23.4 23.4 23.6 23.7 23.8 24.5 24.6 24.7 24.8 24.8
[43] 25.8 26.0 26.1 26.5 27.0 28.4 28.5 30.2 30.4
The + in the code block above that produced this vector is a prompt indicating that a command was continued onto a second line. You can paste the entire command, but you'll need to delete the + characters before typing Enter.
The [43] label on the last output line indicates that data starting at index 43 follow (that is, points in positions 43 through 51). Now make the plot:
> stem(x)
The decimal point is at the |
14 | 24
16 | 5955
18 | 1914579
20 | 34026788
22 | 1156779023344678
24 | 567888
26 | 0150
28 | 45
30 | 24
dotplot()The dotplot we want requires downloading an R graphics package called lattice (do this only once per lifetime):
> install.packages("lattice")
Load the lattice package into the current R session via require() (do this once per session):
> require(lattice)
Loading required package: lattice
Now make the dotplot:
> dotplot(x)
histogram()> histogram(x, type = "count")
To plot relative frequencies instead of counts, use type="density":
> histogram(x, type = "density")
boxplot()> boxplot(x)
Labels can be added to any of the preceding graphs via the optional parameters main (main title), xlab (x axis label), and ylab (y axis label). For example,
> histogram(x, type = "count", main = "Average Commute by State", xlab = "Time (minutes)",
+ ylab = "#States")
After you see a graph in the Plots tab of the “Files, Plots, Packages, Help” pane, you can use the Plots tab's “Export” menu to choose “Save Plot as PDF …” and then print the “.pdf” file.
densityplot()A density plot isn't covered in our book or lecture, but it's often more helpful than a histogram:
> densityplot(x)