R is a powerful software application for interacting with data. It is freely available worldwide. With R, you can create sophisticated graphs, carry out statistical analyses, and create and run simulations. [R is also a programming language with an extensive set of built-in functions so you can write your own code to build your own statistical tools. Advanced users can even incorporate functions written in other languages, such as C, C++, and Fortran.] Further information about R specific to Stat 571 course material can be found in the rewritten R Appendices for the Stat/For/Hort 571 Course Notes.
(a) Download for Windows (advisable only if you have a fast internet connection--a direct campus connection, cable modem, or DSL, as the file is about 20 megabytes): Go to the CRAN homepage at http://cran.us.r-project.org. Click on the link Windows (95 and later), then click on the link base, and finally click on Setup program, which is named something like rw1071.exe. This begins the download. After the download is complete, double click on the downloaded file and follow the installation instructions. [MACOS X and Linux versions are also available at CRAN. Just follow the appropriate link and read the ReadMe.txt files.]
(b) From prepared CD (available through the Instructors, TAs or CALS Lab): Insert the CD into the drive, open the CD and chande to the folder for your system (windows, macosx or linux) double click on the rw1071.exe icon to begin installation (or read instructions for non-Windows). Follow the installation instructions.
(c) The CALS Computer Lab: The CALS Lab is in the basement of the Animal Sciences Building and has its own entrance on the south side of the building. There is always a student consultant on duty. Managers, Peter Crump and Tom Tabone, can be found in Room 148 and Room 152, respectively, between 8am and 4:30pm. Do not hesitate to ask any of these individuals for assistance. Although some of the consultants will not know much about the details of R, all will be able to help you use the machines, print results, etc. The CALS Computer Lab is usually open M-R 8am-10pm, F 8am-5pm, Sat 10am-5pm, Sun 12pm-9pm. [Check holiday schedules at http://www.cals.wisc.edu/calslab.]
We demonstrate a few key R commands using some milk yield data. In this case, you need to enter the data. First, give the file a working name; for purposes here, let us choose myield. Then type
> myield = c(44, 55, 37, 32, 37, 26, 23, 41, 34, 19, 30, 39, 46, 44)The symbol `=' creates an object named `myield' with value being the evaluation of the command on the right hand side. In this case, the command c catenates a comma-separated set of numbers together as a vector. [Warning: if you type a '(' and then do not complete the command by typing a ')', R will continue to wait for the command to be completed and show a string of `+' prompts even if you continue to press the `Enter' key. If you get in trouble, press the `Esc' key to get back to the prompt.]
Now you can perform some manipulations. To see the vector of data, just type
> myieldSee a stem-leaf display of the data by typing
> stem(myield)For the mean and standard deviation, respectively, type
> mean(myield) > sd(myield)Remember to quit your session when you are done. You can quit from the File menu or by typing
> q()in the command window, or by using the `File' pulldown menu.
> myield = read.table(file.choose())This allows you to choose the file using a menu. On Windows or MacOS, select your file and click "Open" (on Unix, you may only have TAB completions of folder and file names). Another way is to use the R "File" menu and select "Change dir ..." to change to the directory where your data are stored. Then type [including the quotes around the file name]:
> myield = read.table("myield.dat")Data we provide can be found in http://www.stat.wisc.edu/~st571-1/data. You can save the data locally on your computer (see comments below on Saving Data Files for Use in R) and use one of the above commands to read as a table. Alternatively, while connected to the Internet you can read the table directly:
> myield = read.table("http://www.stat.wisc.edu/~st571-1/data/myield.dat")[Again, quotes are important!] Of course, you may want a local copy if you are working at home.
Note: The read.table command reads in a table of values and labels the columns as V1, V2.... For the milk yield data, we have only one column, and we can replace the table with a vector:
> myield = myield$V1Data sets later in the course have multiple columns, which will be used during data analysis.
If you capture data using a Web Browser and you simply "Save" the data, it might actually get changed. For instance Internet Explorer assumes you want to save as an HTML FILE. If you just click OK, you will save a file called "myield_dat.htm", which adds lots of junk at the top of the data file (see below). If you instead scroll the "Save as type:" box down to "Text File (*.txt)", you get only what you see. Another way to save data from the Web that usually works is to right-click on the name. If you go ahead and try to read the saved "crane_dat.htm" into R you get the following message:
> myield=read.table(file.choose()) Error in scan(file = file, what = what, sep = sep, quote = quote, dec = dec, : line 3 did not have 5 elementsNot very informative, eh? Sorry about that. Perhaps there is a lesson here: always check your data using a simple text editor. Here is the start of "myield_dat.htm", which you can see using a simple text editor such as WordPad, NotePad or SimpleText (or Emacs):
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN"> <!-- saved from url=(0054)http://www.stat.wisc.edu/~yandell/st571/data/myield.dat --> <HTML><HEAD> <META http-equiv=Content-Type content="text/html; charset=windows-1252"> <META content="MSHTML 6.00.2800.1226" name=GENERATOR></HEAD> <BODY><PRE> 44 55 ...
> ?histprovides many more details on how to use hist using all of its available options.
However, you may not know the name of the R function. What can you do? Try typing
> help.search("histogram")to find out about commands pertinent to the word "histogram" (quotes are important!). After a short pause, you will get something like this (... for lines removed to save space here):
Help files with alias or title matching 'histogram' using fuzzy matching: hist.POSIXt(base) Histogram of a Date-Time Object hist(base) Histograms nclass.Sturges(base) Compute the Number of Classes for a Histogram plot.histogram(base) Plot Histograms n.bins(car) Number of Bins for Histogram ... Type 'help(FOO, package = PKG)' to inspect entry 'FOO(PKG) TITLE'.
There is an extensive set of help pages that you can browse. You can access these help by typing the command
> help.start()or by pulling down the Help menu and selecting the HTML version.
Hint: Use the Courier font for equal width characters so things line up properly!).