Request

To request a blog written on a specific topic, please email James@StatisticsSolutions.com with your suggestion. Thank you!

Friday, September 14, 2012

Best Subsets Regression


Best subsets regression is an exploratory model building regression analysis.  It compares all possible models that can be created based upon an identified set of predictors.  The results presented for best subsets, by default in Minitab, show the two best models for one predictor, two predictors, three predictors, and so on for the number of possible predictors that were entered into the best subsets regression.  The output in Minitab presents R2, adjusted R2, Mallow’s Cp, and S.  To determine the best model, these model fit statistics will be used in conjunction with one another.  R2and adjusted R2measure the coefficient of multiple determination and are used to determine the amount of predictability of the criterion variable based upon the set of predictor variables.   Mallow’s Cp is a measure of bias or prediction error.  S is the square root of the mean square error (MSE).  
The decision is not always clear so the researcher must use all the tools available to make the most informed choice.  When selecting the best subset, we are looking for the highest adjusted R2.  Every increase in the number of predictors will cause an increase in the R2 value, therefore, when selecting among different numbers of predictors it is more reasonable to use the adjusted  R2, as the adjusted R2 increases only if the added predictors improve the model more than chance alone.  In regards to Mallow’s Cp, where p indicates the number of parameters in the model, we are looking for a value equal to or less than p.  The number of parameters in each model is equal to the number of predictors plus one, where the one is the intercept parameter.  So if our output reads two variables, we know that the number of parameters in the model is equal to three.   There are a few things to note when analyzing Mallow’s Cp:
·                     The model with the maximum number of predictors always shows Cp = p so Mallow’s Cp is not a good selection tool for the full model.
·                     If all models but the full model display a large Cp then the models are lacking important predictors that must be identified before going forward.
·                     When several models show a Cp near p, then the model with the smallest Cp should be selected to be certain the bias is small.
·                     Further, when several models show a Cp near p, then the model with the fewest number of predictors should be selected. 
In addition to these guidelines, we are also looking for the model with the smallest S.  Taking these factors into account should allow the research to select the most appropriate, best fitting regression model.  
 
Additional reading/reference
https://onlinecourses.science.psu.edu/stat501/node/89

Monday, September 10, 2012

Binary Logistic Regression


  • Logistic regression is an extension of simple linear regression.
  • Where the dependent variable is dichotomous or binary in nature, we cannot use simple linear regression. Logistic regression is the statistical technique used to predict the relationship between predictors (our independent variables) and a predicted variable (the dependent variable) where the dependent variable is binary (e.g., sex [male vs. female], response [yes vs. no], score [high vs. low], etc…).
  • There must be two or more independent variables, or predictors, for a logistic regression.  The IVs, or predictors, can be continuous (interval/ratio) or categorical (ordinal/nominal).
  • All predictor variables are tested in one block to assess their predictive ability while controlling for the effects of other predictors in the model.
·         Assumptions for a Logistic regression:
1.      adequate sample size (too few participants for too many predictors is bad!);
2.      absence of multicollinearity (multicollinearity = high intercorrelations among the predictors);
3.      no outliers

  • The statistic -2LogL (minus 2 times the log of the likelihood) is a badness-of-fit indicator, that is, large numbers mean poor fit of the model to the data.
  • When taken from large samples, the difference between two values of -2LogL is distributed as chi-square:

Where likelihoodR is for a restricted, or smaller, model and likelihoodF is for a full, or larger, model.
  • LikelihoodF has all the parameters of interest.
  • LikelihoodR is nested in the larger model. (nested = all terms occur in the larger model; necessary condition for model comparison tests).
  • A nested model cannot have as a single IV, some other categorical or continuous variable not contained in the full model. If it does, then it is no longer nested, and we cannot compare the two values of -2LogL to get a chi-square value.
  • The chi-square is used to statistically test whether including a variable reduces badness-of-fit measure.
  • If chi-square is significant, the variable is considered to be a significant predictor in the equation.

Tuesday, September 4, 2012

Creating and Validating an Instrument


To determine if an appropriate instrument is available, a researcher can search literature and commercially available databases to find something suitable to the study.  If it is determined that there are no instruments available that measure the variables in a study, there are four rigorous phases for developing an instrument that accurately measures the variables of interest (Creswell, 2005).  Those four phases are: planning, construction, quantitative evaluation, and validation.  Each phase consists of several steps that must be taken to fully satisfy the requirements for fulfilling a phase. 
            The first phase is planning and the first step of planning includes identifying the purpose of the test and the target group.  In this step, the researcher should identify the purpose of the test, specify the content area to be studied, and identify the target group.  The second step of phase one is to, again, review the literature to be certain no instruments already exist for the evaluation of the variables of interest.  Several areas to look for existing instruments include the ERIC website (www.eric.ed.gov), Mental Measurements Yearbook (Impara & Plake, 1999), and Tests in Print (Murphy, Impara, & Plake, 1999).  Once the researcher is certain no other instruments exist, the researcher should review the literature to determine the operational definitions of the constructs that are to be measured.  This can be an arduous task because operationalizing a variable does not automatically indicate good measurement and therefore the researcher must review multiple literatures to determine an accurate and meaningful construct.  From this information, the researcher should develop open ended questions to present to a sample that is representative of the target group.  The open ended questions aid the researcher in determining areas of concern around the constructs to be measured.  The responses to the open ended questions and the review of the literature should be used in unison to create and modify accurate measures of the constructs.
            The second phase is construction and it begins with identifying the objectives of the instrument and developing a table of specifications.  Those specifications should narrow the purpose and identify the content areas.  In the specification process, each variable should be associated with a concept and an overarching theme (Ford, http://www.blaiseusers.org/2007/papers/Z1%20-%20Survey%20Specifications%20Mgmt%20at%20Stats%20Canada.pdf).  Once the table of specification is completed, the researcher can write the items in the instrument.  The researcher must determine the format to be used, ie. Likert scale, multiple choice, etc.  The format of the questions should be determined by the type of data that needs to be collected.  Depending on the financial resources of the research project, experts within the field may be hired to write the items.  Once the items are written, they need to be reviewed for clarity, formatting, acceptable response options, and wording.  After several reviews of the questions, they should be presented to peers and colleagues in the format the instrument is to be administered.  The peers and colleagues should match the items with the specification table and if there are not exact matches, revisions must be made.  An instrument is content valid when the items adequately reflect the process and content dimensions of the objectives of the instrument (Benson & Clark, 1982).  Again, the instrument should be distributed to a sample that is representative of the target group.  This time the group should take the survey and critique the quality of the individual items and overall instrument. 
            Phase three is quantitative evaluation and includes administration of a pilot study to a representative sample.  It may be helpful to ask the participants for feedback to allow for further refinement of the instrument.  The pilot study provides quantitative data that the researcher can test for internal consistency by conducting Cronbach’s alphas.  The reliability coefficient can range from 0.00 to 1.00, with values of 0.70 or higher indicating acceptable reliability (George and Mallery, 2003).  If the instrument is going to be used to predict future behavior, the instrument needs to be administered to the same sample at two different time periods and the responses will need to be correlated to determine if there is concurrent validity.  These measurements can be examined to aid the researcher in making informed decisions about revisions to the instrument.    
            Phase four is validation.  In this phase the researcher should conduct a quantitative pilot study and analyze the data.  It may be helpful to ask the participants for feedback to allow for further refinement of the instrument.  The pilot study provides quantitative data that the researcher can test for internal consistency by conducting Cronbach’s alphas.  To establish validity, the researcher must determine which concept of validity is important.  The three types of validity include content, criterion-related, and construct.  Content validity is the extent to which the questions on a survey are representative of the questions that could be asked to assess a particular construct.  To examine content validity, the researcher should consult two to three experts.  Criterion-referenced validity is used when the researcher wants to determine if the scores from an instrument are a good predictor of an expected outcome.  In order to assess this type of validity, the researcher must be able to define the expected outcome.  A correlation coefficient of a .60 or above will indicate a significant, positive relationship (Creswell, 2005).  Construct validity is established by determining if the scores recorded by an instrument are meaningful, significant, useful, and have a purpose.  In order to determine if construct validity has been achieved, the scores need to be assessed statistically and practically.  This can be done by comparing the relationship of a question from the scale to the overall scale, testing a theory to determine if the outcome supports the theory, and by correlating the scores with other similar or dissimilar variables.  The use of similar instruments is referred to as convergent validity and the use of dissimilar instruments is divergent validity. 
References
Creswell, J. W. (2005). Educational research: Planning, conducting, and evaluating quantitative and qualitative research (2nd ed.). Upper Saddle River, NJ: .Pearson Education, Inc.  
George, D. & Mallery, P. (2003). SPSS for Windows step by step: a simple guide and reference, 11.0 update (4th ed.). Boston, MA: Allyn and Bacon. 
Murphy, L. L., Impara, J. C., & Plake, B. S. (Eds.). (1999)