Request

To request a blog written on a specific topic, please email James@StatisticsSolutions.com with your suggestion. Thank you!
Showing posts with label logistic regression. Show all posts
Showing posts with label logistic regression. Show all posts

Monday, September 10, 2012

Binary Logistic Regression


  • Logistic regression is an extension of simple linear regression.
  • Where the dependent variable is dichotomous or binary in nature, we cannot use simple linear regression. Logistic regression is the statistical technique used to predict the relationship between predictors (our independent variables) and a predicted variable (the dependent variable) where the dependent variable is binary (e.g., sex [male vs. female], response [yes vs. no], score [high vs. low], etc…).
  • There must be two or more independent variables, or predictors, for a logistic regression.  The IVs, or predictors, can be continuous (interval/ratio) or categorical (ordinal/nominal).
  • All predictor variables are tested in one block to assess their predictive ability while controlling for the effects of other predictors in the model.
·         Assumptions for a Logistic regression:
1.      adequate sample size (too few participants for too many predictors is bad!);
2.      absence of multicollinearity (multicollinearity = high intercorrelations among the predictors);
3.      no outliers

  • The statistic -2LogL (minus 2 times the log of the likelihood) is a badness-of-fit indicator, that is, large numbers mean poor fit of the model to the data.
  • When taken from large samples, the difference between two values of -2LogL is distributed as chi-square:

Where likelihoodR is for a restricted, or smaller, model and likelihoodF is for a full, or larger, model.
  • LikelihoodF has all the parameters of interest.
  • LikelihoodR is nested in the larger model. (nested = all terms occur in the larger model; necessary condition for model comparison tests).
  • A nested model cannot have as a single IV, some other categorical or continuous variable not contained in the full model. If it does, then it is no longer nested, and we cannot compare the two values of -2LogL to get a chi-square value.
  • The chi-square is used to statistically test whether including a variable reduces badness-of-fit measure.
  • If chi-square is significant, the variable is considered to be a significant predictor in the equation.

Friday, June 12, 2009

Logistic Regression

Logistic regression is an extension of multiple linear regressions, where the dependent variable is binary in nature. Logistic regression predicts the discreet outcome, such as group membership, from a set of variables that may be continuous, discrete, dichotomous, or of any other type. Logistic regression is an extension of discriminant analysis. Discriminant analyses also predict the group memberships of the dependent variable, much like logistic regression. However, in discriminant analysis, there is an assumption of the relationship between the normal and linear distribution. Often, assumptions of equal variance do not meet. But in logistic regression, there is no assumption related to normal distribution, linear relationship and equal variance. In logistic regression, there may be many independent variables, like multiple-linear regressions.

Statistics Solutions can help with logistic regression and additional dissertation statistics, click here for a free 30-minute consultation.

The model:

In logistic regression, the dependent variable is dichotomous. In logistic regression, we can take the value of 1 with the probability of success q and or the value 0, with the probability of failure 1- q. When there are two dependent variable categories, then it is said to be binary logistic regression. When there are more than two dependent variable categories, then it is a form of multinomial logistic regression. Symbolically, the probability of the dependent variable can be measured by using the following formula:






Where α= the constant of the equation and β= the coefficient of the predictor variable. An alternative form of logistic regression can be represented as the following:





Logistic regression has two main uses. The first use of logistic regression is that it predicts group membership. Second, logistic regression tells us about the relationship and strengths among the variables.

Test statistics in logistics:

1. Wald statistics: In logistic regression, Wald statistics is used to test the significance of each variable. In logistic regression, Wald statistics is simply the Z statistics, which is simply described as the following:





After squaring the Z value, it follows the chi-square distribution. In the case of a small sample size, the likelihood ratio test is more suitable than Wald statistics in logistic regression.

2. Likelihood ratio: The Likelihood ratio test maximizes the value of the likelihood function for the full model. Symbolically it is as follows:




After the log transformation, the likelihood ratio test follows the chi-square distribution. In logistic regression, it is suggested that the likelihood ratio test is used for significance when we are using backward stepwise elimination.

3. Goodness of fit: In logistic regression, goodness of fit is measured by the Hosmer-lemshow test statistics. This statistic basically compares the observed and predicted observation for the goodness of fit model.

Logistic regression and statistical software: Most software, like SPSS, STATA, SAS, and MATLAB, etc. have the option of performing logistic regression. In SAS, there is a procedure to perform logistic regression. SPSS is GUI software and it has the option to perform logistic regression. To perform logistic regression in SPSS, select the analysis menu from SPSS and select “binary logistic regression” from the regression option. If the dependent variable has more than two categories, then select the “multinomial model” from the regression option. If data are in order, then select the “ordinal logistic regression” from the regression option. After clicking on the logistic regression, select “binary variable” as the dependent variable, “others” as the continuous variables and “dichotomous variable” as the independent variable. After selecting the dependent and independent variable, select the model for logistic regression. The user can select to see both backward and forward methods in logistic regression.