Best
subsets regression is an exploratory model building regression analysis. It compares all possible models that can be
created based upon an identified set of predictors. The results presented for best subsets, by
default in Minitab, show the two best models for one predictor, two predictors,
three predictors, and so on for the number of possible predictors that were
entered into the best subsets regression.
The output in Minitab presents R2,
adjusted R2, Mallow’s Cp, and S. To determine the best
model, these model fit statistics will be used in conjunction with one another. R2and
adjusted R2measure the
coefficient of multiple determination and are used to determine the amount of
predictability of the criterion variable based upon the set of predictor
variables. Mallow’s Cp is a measure of bias or prediction error. S
is the square root of the mean square error (MSE).
The
decision is not always clear so the researcher must use all the tools available
to make the most informed choice. When
selecting the best subset, we are looking for the highest adjusted R2.
Every increase in the number of
predictors will cause an increase in the R2
value, therefore, when selecting among different numbers of predictors it
is more reasonable to use the adjusted R2, as the
adjusted R2 increases only
if the added predictors improve the model more than chance alone. In
regards to Mallow’s Cp,
where p indicates the number of
parameters in the model, we are looking for a value equal to or less than p.
The number of parameters in each model is equal to the number of
predictors plus one, where the one is the intercept parameter. So if our output reads two variables, we know
that the number of parameters in the model is equal to three. There are a few things to note when
analyzing Mallow’s Cp:
·
The model with the maximum number of
predictors always shows Cp = p
so Mallow’s Cp is not
a good selection tool for the full model.
·
If all models but the full model display
a large Cp then the models
are lacking important predictors that must be identified before going forward.
·
When several models show a Cp near p, then the model with the smallest Cp should be selected to be certain the bias is small.
·
Further, when several models show a Cp near p, then the model with the fewest number of predictors should be
selected.
In addition to these
guidelines, we are also looking for the model with the smallest S. Taking these factors into account should allow
the research to select the most appropriate, best fitting regression
model.
Additional
reading/reference
https://onlinecourses.science.psu.edu/stat501/node/89