Regression in S-PLUS

The textbook gives examples in Chapter 10 on fitting many statistical models in S-PLUS, including regression and analysis of variance. The basic techniques for fitting any kind of model in S-PLUS are very similar. Generally, you ought to:

Put your data into a data frame with each column correctly specified as a quantitative variable, factor, or ordered factor. Section 10.1 in the textbook gives more details on the syntax for doing these tasks in S-PLUS.

Fit a model using the appropriate function. Regression is an example of a linear model, and regression models are fit with the S-PLUS function lm. Here is an example closely related to an example in the text in Section 10.5. Suppose that you have a data frame named auto with quantitative variables mpg (miles per gallon), weight, len (length), and disp (displacement). To predict miles per gallon on the basis of the other variables:

> fit <- lm(mpg ~ weight + len + disp, data=auto)
This creates an object named fit which contains all necessary information about the fit. The textbook gives more details on changing the formula statement to fit alternative models.

Use diagnostic tools to examine the quality of the fitted model. Other functions may then be used to extract information about the model. For example,

> summary(fit)
produces a summary of the fit, including: (1) a five number summary of the residuals; (2) a table of the estimated regression coefficients with standard errors and t statistics and p-values from two sided tests of the hypothesis that the regression coefficient equals zero; (3) other numerical measures of the fit such as the R-squared statistic; and (4) a table showing the correlation of the coefficients. If you wish to extract only the coefficients,
> coef(fit)
does the job. If you want to work with the fitted values or residuals,
> fv <- fitted.values(fit)
> r <- residuals(fit)
will do the trick. The function plot applied to fit will produce six separate diagnostic plots in rapid succession. Setting par(mfrow=c(3,2)) prior to the plot call allows you to see all of these plots on one screen or page. You may exercise more control over what you wish to view by working with the residuals, fitted values, and data directly. For example,
> plot(fitted.values(fit),residuals(fit),xlab="Fitted Values",
+ ylab="Residuals")
> abline(0,0)
produces a plot of the residuals versus the fitted values.

In applied statistics courses, you learn how to examine residual plots for patterns and how to examine the numerical summaries in an effort to build effective models. S-PLUS is well-suited to allowing you to rapidly fit and update different models with the aid of different diagnostic tools. The book Modern Applied Statistics with S-Plus by Venables and Ripley and the book Statistical Models in S edited by Chambers and Hastie (of which I have given you an excerpt) include much more detailed information on practical model fitting using S-PLUS.


Last modified: May 2, 1997

Bret Larget, larget@mathcs.duq.edu