Request

To request a blog written on a specific topic, please email James@StatisticsSolutions.com with your suggestion. Thank you!

Friday, September 14, 2012

Best Subsets Regression


Best subsets regression is an exploratory model building regression analysis.  It compares all possible models that can be created based upon an identified set of predictors.  The results presented for best subsets, by default in Minitab, show the two best models for one predictor, two predictors, three predictors, and so on for the number of possible predictors that were entered into the best subsets regression.  The output in Minitab presents R2, adjusted R2, Mallow’s Cp, and S.  To determine the best model, these model fit statistics will be used in conjunction with one another.  R2and adjusted R2measure the coefficient of multiple determination and are used to determine the amount of predictability of the criterion variable based upon the set of predictor variables.   Mallow’s Cp is a measure of bias or prediction error.  S is the square root of the mean square error (MSE).  
The decision is not always clear so the researcher must use all the tools available to make the most informed choice.  When selecting the best subset, we are looking for the highest adjusted R2.  Every increase in the number of predictors will cause an increase in the R2 value, therefore, when selecting among different numbers of predictors it is more reasonable to use the adjusted  R2, as the adjusted R2 increases only if the added predictors improve the model more than chance alone.  In regards to Mallow’s Cp, where p indicates the number of parameters in the model, we are looking for a value equal to or less than p.  The number of parameters in each model is equal to the number of predictors plus one, where the one is the intercept parameter.  So if our output reads two variables, we know that the number of parameters in the model is equal to three.   There are a few things to note when analyzing Mallow’s Cp:
·                     The model with the maximum number of predictors always shows Cp = p so Mallow’s Cp is not a good selection tool for the full model.
·                     If all models but the full model display a large Cp then the models are lacking important predictors that must be identified before going forward.
·                     When several models show a Cp near p, then the model with the smallest Cp should be selected to be certain the bias is small.
·                     Further, when several models show a Cp near p, then the model with the fewest number of predictors should be selected. 
In addition to these guidelines, we are also looking for the model with the smallest S.  Taking these factors into account should allow the research to select the most appropriate, best fitting regression model.  
 
Additional reading/reference
https://onlinecourses.science.psu.edu/stat501/node/89

Monday, September 10, 2012

Binary Logistic Regression


  • Logistic regression is an extension of simple linear regression.
  • Where the dependent variable is dichotomous or binary in nature, we cannot use simple linear regression. Logistic regression is the statistical technique used to predict the relationship between predictors (our independent variables) and a predicted variable (the dependent variable) where the dependent variable is binary (e.g., sex [male vs. female], response [yes vs. no], score [high vs. low], etc…).
  • There must be two or more independent variables, or predictors, for a logistic regression.  The IVs, or predictors, can be continuous (interval/ratio) or categorical (ordinal/nominal).
  • All predictor variables are tested in one block to assess their predictive ability while controlling for the effects of other predictors in the model.
·         Assumptions for a Logistic regression:
1.      adequate sample size (too few participants for too many predictors is bad!);
2.      absence of multicollinearity (multicollinearity = high intercorrelations among the predictors);
3.      no outliers

  • The statistic -2LogL (minus 2 times the log of the likelihood) is a badness-of-fit indicator, that is, large numbers mean poor fit of the model to the data.
  • When taken from large samples, the difference between two values of -2LogL is distributed as chi-square:

Where likelihoodR is for a restricted, or smaller, model and likelihoodF is for a full, or larger, model.
  • LikelihoodF has all the parameters of interest.
  • LikelihoodR is nested in the larger model. (nested = all terms occur in the larger model; necessary condition for model comparison tests).
  • A nested model cannot have as a single IV, some other categorical or continuous variable not contained in the full model. If it does, then it is no longer nested, and we cannot compare the two values of -2LogL to get a chi-square value.
  • The chi-square is used to statistically test whether including a variable reduces badness-of-fit measure.
  • If chi-square is significant, the variable is considered to be a significant predictor in the equation.

Tuesday, September 4, 2012

Creating and Validating an Instrument


To determine if an appropriate instrument is available, a researcher can search literature and commercially available databases to find something suitable to the study.  If it is determined that there are no instruments available that measure the variables in a study, there are four rigorous phases for developing an instrument that accurately measures the variables of interest (Creswell, 2005).  Those four phases are: planning, construction, quantitative evaluation, and validation.  Each phase consists of several steps that must be taken to fully satisfy the requirements for fulfilling a phase. 
            The first phase is planning and the first step of planning includes identifying the purpose of the test and the target group.  In this step, the researcher should identify the purpose of the test, specify the content area to be studied, and identify the target group.  The second step of phase one is to, again, review the literature to be certain no instruments already exist for the evaluation of the variables of interest.  Several areas to look for existing instruments include the ERIC website (www.eric.ed.gov), Mental Measurements Yearbook (Impara & Plake, 1999), and Tests in Print (Murphy, Impara, & Plake, 1999).  Once the researcher is certain no other instruments exist, the researcher should review the literature to determine the operational definitions of the constructs that are to be measured.  This can be an arduous task because operationalizing a variable does not automatically indicate good measurement and therefore the researcher must review multiple literatures to determine an accurate and meaningful construct.  From this information, the researcher should develop open ended questions to present to a sample that is representative of the target group.  The open ended questions aid the researcher in determining areas of concern around the constructs to be measured.  The responses to the open ended questions and the review of the literature should be used in unison to create and modify accurate measures of the constructs.
            The second phase is construction and it begins with identifying the objectives of the instrument and developing a table of specifications.  Those specifications should narrow the purpose and identify the content areas.  In the specification process, each variable should be associated with a concept and an overarching theme (Ford, http://www.blaiseusers.org/2007/papers/Z1%20-%20Survey%20Specifications%20Mgmt%20at%20Stats%20Canada.pdf).  Once the table of specification is completed, the researcher can write the items in the instrument.  The researcher must determine the format to be used, ie. Likert scale, multiple choice, etc.  The format of the questions should be determined by the type of data that needs to be collected.  Depending on the financial resources of the research project, experts within the field may be hired to write the items.  Once the items are written, they need to be reviewed for clarity, formatting, acceptable response options, and wording.  After several reviews of the questions, they should be presented to peers and colleagues in the format the instrument is to be administered.  The peers and colleagues should match the items with the specification table and if there are not exact matches, revisions must be made.  An instrument is content valid when the items adequately reflect the process and content dimensions of the objectives of the instrument (Benson & Clark, 1982).  Again, the instrument should be distributed to a sample that is representative of the target group.  This time the group should take the survey and critique the quality of the individual items and overall instrument. 
            Phase three is quantitative evaluation and includes administration of a pilot study to a representative sample.  It may be helpful to ask the participants for feedback to allow for further refinement of the instrument.  The pilot study provides quantitative data that the researcher can test for internal consistency by conducting Cronbach’s alphas.  The reliability coefficient can range from 0.00 to 1.00, with values of 0.70 or higher indicating acceptable reliability (George and Mallery, 2003).  If the instrument is going to be used to predict future behavior, the instrument needs to be administered to the same sample at two different time periods and the responses will need to be correlated to determine if there is concurrent validity.  These measurements can be examined to aid the researcher in making informed decisions about revisions to the instrument.    
            Phase four is validation.  In this phase the researcher should conduct a quantitative pilot study and analyze the data.  It may be helpful to ask the participants for feedback to allow for further refinement of the instrument.  The pilot study provides quantitative data that the researcher can test for internal consistency by conducting Cronbach’s alphas.  To establish validity, the researcher must determine which concept of validity is important.  The three types of validity include content, criterion-related, and construct.  Content validity is the extent to which the questions on a survey are representative of the questions that could be asked to assess a particular construct.  To examine content validity, the researcher should consult two to three experts.  Criterion-referenced validity is used when the researcher wants to determine if the scores from an instrument are a good predictor of an expected outcome.  In order to assess this type of validity, the researcher must be able to define the expected outcome.  A correlation coefficient of a .60 or above will indicate a significant, positive relationship (Creswell, 2005).  Construct validity is established by determining if the scores recorded by an instrument are meaningful, significant, useful, and have a purpose.  In order to determine if construct validity has been achieved, the scores need to be assessed statistically and practically.  This can be done by comparing the relationship of a question from the scale to the overall scale, testing a theory to determine if the outcome supports the theory, and by correlating the scores with other similar or dissimilar variables.  The use of similar instruments is referred to as convergent validity and the use of dissimilar instruments is divergent validity. 
References
Creswell, J. W. (2005). Educational research: Planning, conducting, and evaluating quantitative and qualitative research (2nd ed.). Upper Saddle River, NJ: .Pearson Education, Inc.  
George, D. & Mallery, P. (2003). SPSS for Windows step by step: a simple guide and reference, 11.0 update (4th ed.). Boston, MA: Allyn and Bacon. 
Murphy, L. L., Impara, J. C., & Plake, B. S. (Eds.). (1999)

Wednesday, April 13, 2011

Dissertation Statistics Help

Dissertation statistics help is a click away! Free online resources, video tutorials, free dissertation templates, SPSS tutoring, research design help, statistics analyses, dissertation newsletters, and much more. Another semester is coming to an end, and you’re not quite there yet. Most students can really help with the proposal (especially chapter 3) and the results chapter 4. Let’s talk about these both.

Dissertation Proposal
For the proposal, the main sticking points are the research questions, data analysis plan, sample size justification, and research design. For the research design, I’ve found http://www.socialresearchmethods.net/kb/ to be a great free resource, and if you don’t have Creswell’s book, Research design: qualitative, quantitative, and mixed methods approaches, get it. When it comes to dissertation statistics help, students don’t realize the following sequence:

Clear research questions - data analysis plan - sample size justification (or power analysis)

Research questions need to be written in statistical language. For some current news examples, (1) is there a relationship between party affiliation (republican vs. democrat) and the government shutdown (yes vs. no)? (2) Does the use of Twitter predict anger in Libya, or (3) are there differences on gold prices by debt fears? These words relationship, predict, and differences infer that you want a data analysis plan with correlations/chi-square, regression analysis, and ANOVA. The data plan needs also to talk about the assumptions of these analyses, and justification why these are the appropriate analysis. Based on the statistical analysis, the sample size can be determined. Each analysis has its own sample size justification. A great free sample size calculator is G-power or if you want a quick write-up you can go to http://www.statisticssolutions.com/products-services/login/standard-membership/sample-sizepower-analysis-calculator-with-write-up where you pick the analysis and the justification is written for you (it’s cheap, quick, and you’re not spending a month figuring it out or paying someone $1000 for it).

Chapter 4: Getting the Dissertation Statistics Help You Need
Statistical help for a dissertation means graduate students get the help to selecting the correct statistical tests and assumptions, conducting the right analyses, the right interpretation, and the presenting the results in the right (usually APA 6th edition) format. Statistics Solutions (the company I have operated for 18 years) have the right research design experience (our Ph.D.’s are in Clinical Psychology and Statistics) expertise in SPSS (we even sell SPSS for about $100), and formatting and teaching experience to assist you. The company has online video tutorials that show you how to conduct, interpret, and report the analyses. We have APA editors or you can visit sites like Purdue University’s great website for APA formatting. We also consult with your qualitative analysis as well as your quantitative analyses.

Final thoughts on Dissertation Statistics Help
Dissertation statistics can be tricky (especially time-series, cluster analysis, SEM and CFA’s). As with anything you read or purchase, check the company out. For example, we have Ph.D.’s (I have my Ph.D. from Miami U in Ohio) and been through the rigorous process, we do our work in-house, we have a references of previous students, and registered with the BBB (an A+ rating) .

I hope this post was helpful and that you pass it along to colleagues that might find it interesting. If you’d like more information about our services you can contact us at http://www.statisticssolutions.com/contact.

Happy Learning!

Dr. James Lani, Ph.D.
CEO, Statistics Solutions

Wednesday, February 3, 2010

Level of measurement

The level of measurement has been classified into basically four categories. It is important for the researcher to understand that the level of measurement is determined partly by arithmetic operations and statistical operations.

Statistics Solutions is the country's leader in level of measurement and dissertation statistics. Contact Statistics Solutions today for a free 30-minute consultation.

Sorted in an ascending order of precision, the four different levels of measurement are the nominal, the ordinal, the interval and the ratio scale.

The first among the four levels of measurement is the nominal level. This level of measurement basically refers to those cases in which the numbers are used to organize the data. The use of words and letters is also done in this level of measurement. Suppose there is data that has two categories of students, namely weak students and strong students. Using this level of measurement, the researcher can easily classify the weak category of students with the letter ‘W,’ and the strong category of students can be denoted with an ‘S.’ This assigning of letters to distinguish the classification is the nominal level of measurement.

The second type of level of measurement is the ordinal level. This level of measurement generally involves those measurements that signify some kind of ordered associations between the number items. If four teams participate in a match, the team that has beaten all three teams would win the match and would be assigned the first rank. Then, the team performing right below the first team would be assigned the second rank, and so on. Thus, this level of measurement also assigns the reasons behind the rank assigned to any particular item. So, this level of measurement indicates the appropriate ordering of the measurements. The researcher should note that in this type of level of measurement, the change or the share between any two types of rankings does not remain the same along the scale.

The next type of level of measurement is that of the interval level of measurement. In this level of measurement, the researcher categorizes and assigns orders to the measurements and also reveals that the distances between each interval on the scale is equivalent along the scale from the low interval to the high interval. One such example is the measurement of anxiety of a student that is in between the score of 10 and 11 is same as if the student is in between the score of 40 and 41. Another appropriate example for this type of level of measurement is that while measuring the temperature in centigrade, the distance between 940C and 960C is similar to the distance between 1000C and 1020C.

The last level of measurement is the ratio level of measurement. In this type of level of measurement, the researcher can observe a value of actual zero as well. This kind of phenomena is quite unlike the other types of level of measurement. However, the researcher should note that this level of measurement has the same property as that of the interval level of measurement. The divisions between the points on the scale have the equivalent distance between them, and the rankings assigned to the items are according to their size in this level of measurement.

The researcher should note that among these levels of measurements, the nominal level is simply used to classify the data, whereas the levels of measurement described by the interval and the ratio are much more exact.

Monday, February 1, 2010

Dissertation Statistics Help

If you are a doctoral student who has started your dissertation, you know that the road ahead of you is lengthy and difficult. This is because the dissertation is lengthy and difficult. In fact, chances are that you have already been overwhelmed by the mere thought of working on and finishing your dissertation. You are not alone, however, as most students who must write a dissertation have had the exact same feelings of dread, anxiety and panic.

There is, of course, a way to deal with this dread, anxiety and panic and this is to acquire dissertation statistics help. Dissertation statistics help is a service provided by dissertation consultants and dissertation statistics help can make every single step of your dissertation easier and more manageable. Dissertation statistics help offers a student individual and personalized help as dissertation statistics help ensures that the student finishes with accuracy, timeliness and success.

Statistics Solutions is the country's leader in dissertation statistics help. Contact Statistics Solutions today for a free 30-minute consultation.

Dissertation statistics help is becoming more and more popular as more and more people attempt to receive their doctoral degrees. Dissertation statistics help has also become popular because it has become apparent that many students are not trained in the difficult field of statistics. The student is trained in his or her area of study, and yet, when it comes to statistics, the student has not received the proper training. Dissertation statistics help steps into this void and dissertation statistics help actually instructs the students in terms of statistics. Dissertation statistics help provides this one-on-one instruction to all students who seek dissertation statistics help. And this training is quite possibly the most valuable and most important service that dissertation statistics help provides. This is true because the student will eventually have to defend his or her dissertation and if the student does not understand the statistics part of his or her dissertation, they will not pass the oral defense part of the dissertation. Thus, dissertation statistics help will provide all of the training necessary to the student so that he or she can pass the oral defense of his or her dissertation.

Before the student gets to the oral defense of his or her dissertation, the student must complete the dissertation. This is made easy and understandable by dissertation statistics help. Dissertation statistics help will provide valuable insight as to how to do every single step of the statistics that are involved with the student’s dissertation. The first step, of course, is to make sure that the topic is valid and can be studied. Dissertation statistics help will make sure that the student’s topic can indeed be studied statistically, and dissertation statistics help will assist the student if his or her topic is not able to be studied and measured statistically. Dissertation statistics help will then go about the process of the statistics with the student. What that means is that dissertation statistics help will guide the student through every single step of the dissertation statistics. This includes first collecting the data (which can be very lengthy and difficult if it is not done correctly), interpreting the results of the data collected (which again can be lengthy and difficult if not done correctly) and applying those results to the dissertation and thesis. Once all of this is complete, dissertation statistics help will also edit and proofread the entire dissertation, just to make sure that the student will indeed succeed when they turn in their dissertation for approval.

Without question, dissertation statistics help is the absolute best way to expedite the process of writing a dissertation. With dissertation statistics help, the student is guaranteed to succeed as he or she has dissertation statistics help assisting him/her every single step of the way.