In your final project you will have to analyze a dataset. The dataset contains a number of explanatory variables (some qualitative) and a
response variable (you may need to identify what your response variable is). Your goal is to find an appropriate model. That means you
need to identify the variables that are significant and can be used for future predictions. There are a number of model selection methods
and different methods will result in different “best” model. You need to use those methods and come up with a model that is parsimonious
but has a good predictive power. You also need to do a residual analysis to see if all the model assumptions are valid, to see if there
are outliers and influential observations. You need to check for multicollinearity. A good model should not have multicollinearity
problems.
You need to submit a report. In your report you should describe the dataset, your goal, your findings and what you think about the “best”
model you selected (how good it is, what kind of problems you see with it etc.). Include relevant plots but report should not contain any
SAS output. You should submit your SAS code and relevant SAS output as appendix or supplemental material.
In the Rut Depth project, the goal is to find whether viscosity, surface, base, run, fines and voids are significant predication of rut
depth. Viscosity, surface, base, run, fines and voids are the indicator variables. And the response variable is the rut depth in this
case.
In this project, there exist six indicator variables. One of them is the qualitative variable: run. If run equals to 1, then the first
group equals to 1, otherwise the first group equals to 0. If run equals to 0, then the second group equals to 1, else the second group
equals to 0. Other five variables (Viscosity, surface, base, run, fines and voids) are the quantitative variables.
Because we did not know the relationship between Y and different X’s very clear, we need to use the transformation like y = log(rutdepth)
to get a direct and clear scatterplot, which is able to show the relationship between indicator variables and response variable.
https://www.uvocorp.com/dl/order_file/16657961.html