Request

To request a blog written on a specific topic, please email James@StatisticsSolutions.com with your suggestion. Thank you!
Showing posts with label multicollinearity. Show all posts
Showing posts with label multicollinearity. Show all posts

Friday, June 26, 2009

Multicollinearity

The term multicollinearity was first used by Ragnar Frisch. Multicollinearity means that there is a perfect or exact relationship between the regression exploratory variables. Linear regression analysis assumes that there is no perfect exact relationship among exploratory variables. In regression analysis, when this assumption is violated, the problem of Multicollinearity occurs.

Statistics Solutions is the country's leader in dissertation statistical consulting and can assist with your regression analysis. Contact Statistics Solutions today for a free 30-minute consultation.

In regression analysis, multicollinearity has the following types:

1. No multicollinearity: When the regression exploratory variables have no relationship with each other, then there is no multicollinearity in the data.
2. Low multicollinearity: When there is a relationship among the exploratory variables, but it is very low, then it is a type of low multicollinearity.
3. Moderate multicollinearity: When the relationship among the exploratory variables is moderate, then it is said to be moderate multicollinearity.
4. High multicollinearity: When the relationship among the exploratory variables is high or there is perfect correlation among them, then it said to be high multicollinearity.
5. Very high multicollinearity: When the relationship among the exploratory variables is exact, then it is the problem of very high multicollinearity, which should be removed from the data when regression analysis is conducted.

Many Factors affect multicollinearity. For example, multicollinearity may exist during the data collection process, or multicollinearity may exist due to the wrong selection of the model. For example, if we take the exploratory variables to be income and house size in our model, then the model will have the problem of multicollinearity because income and house size are highly correlated. Multicollinearity may also occur if we take too many exploratory variables in regression analysis.

Consequences of multicollinearity: If the data has a perfect or exact multicollinearity problem, then the following will be the impact of multicollinearity:

1. In the presence of multicollinearity, variance and covariance will be wider, which will make it difficult to reach a statistical decision for the null and alternative hypothesis.
2. In the presence of multicollinearity, the confidence interval will be wider due to the wider confidence interval. In this case, we will accept the null hypothesis, which should be rejected.
3. In the presence of multicollinearity, the standard error will increase and it makes the value of the t-test smaller. We will accept the null hypothesis that should be rejected.
4. Multicollinearity will increase the R-square as well, which will impact the goodness of fit of the model.

Detection of multicollinearity: The following are the methods that show the presence of multicollinearity:

1. In regression analysis, when R-square of the model is very high but there are very few significant t ratios, this shows multicollinearity in the data.
2. High correlation between exploratory variables also indicates the problem of multicollinearity.
3. Tolerance limit and variance inflating factor: In regression analysis, one-by-one minus correlation of the exploratory variable is called the variance inflating factor. As the correlation between the repressor variable increases, VIF also increases. More VIF shows the presence of multicollinearity. The inverse of VIF is called Tolerance. So the VIF and TOI have a direct connection.

Remedial measure of multicollinearity: In regression analysis, the first step is to detect multicollinearity. If multicollinearity is present in the data, then we can solve this problem by taking several steps. The first step is to drop the variable, which has the specification bias of multicollinearity. By combining the cross sectional data and the time series data, multicollinearity can be removed. If there is a high multicollinearity, then it can be removed by transforming the variable. By taking the first or the second, different variables can be transformed. By adding some new data, multicollinearity can be removed. In multivariate analysis, by taking the common score of the multicollinearity variable, multicollinearity can be removed. In factor analysis, principle component analysis is used to drive the common score of multicollinearity variables. A rule of thumb to detect multicollinearity is that when the VIF is greater than 10, then there is a problem of multicollinearity.

Contact Statistics Solutions today for more information on multicollinearity.

Monday, May 25, 2009

Multicollinearity

Multicollinearity depicts the stage in which the independent variables in the data show high interactions and associations. Therefore, multicollinearity is considered a disturbance that causes volatility in data.

Contact Statistics Solutions today for assistance with identifying multicollinearity in data.

There are reasons behind the outcome of multicollinearity in data.

Multicollinearity can occur due to the improper utilization of dummy variables. Researchers who are not experts can end up causing multicollinearity in the data.

If the researcher includes a variable which is being computed from the other variables in the equation, then this action can cause multicollinearity in the data. For example, if the family’s health is equal to the husband’s health+ wife’s health + child’s health, and the regression includes all four health cases, then this calls for multicollinearity.

If a researcher includes the same type of variables twice in an experiment, then this activity performed by the researcher causes multicollinearity. For example, if two models of Nokia phones are included as different variables in the study, then this causes multicollinearity in the data.

There are certain outcomes of multicollinearity.

The researcher should note that as the level of multicollinearity increases, the value of the standard error gets higher and higher. When there is high multicollinearity in the data, then the confidence intervals for the coefficients tend to be extremely wide and the value of the t-statistics tends to be very small. The researcher should keep in mind that the value of the coefficients should be larger in order to have it statistically significant. In other words, in the presence of multicollinearity, the null hypothesis assumed by the researcher is harder to get rejected.

If the value of the tolerance is closer to the value of one, then this means that there is very little multicollinearity. On the other hand, if the value of the tolerance is closer to zero then this means that there is very high multicollinearity. So, in the latter case, the multicollinearity is considered a threat.

The reciprocal of the tolerance is known as the variance inflation factor (VIF). The variance inflation factor shows the amount of variance of the coefficient estimate that is inflated by multicollinearity.

Multicollinearity is not something that is discreet in nature, but is a matter of degree. The matter of multicollinearity can be detected with the help of certain warning signals.

If the t ratios for each coefficient are statistically significant, and the F statistic is not statistically significant, then this indicates that there is multicollinearity in the data.

It is important to check the stability of the coefficients when two different types of samples are used. If the coefficients differ quiet significantly, then this shows that there is multicollinearity in the data.

If the sign of the variables gets changed, or if some variables are being added, then this signifies the presence of multicollinearity.

In order to address the problem of multicollinearity, one has to make sure that he/she does not conduct improper usage of the dummy variables.

If the sample size is increased from the desired sample size, then this will decrease the value of the standard errors and simultaneously it would decrease the level of multicollinearity.

It is sometimes suggested that the researcher drop the variable that is causing multicollinearity. The researcher should keep in mind that if the most important variable is dropped by the researcher, then this would cause a specification error, which is even worse than multicollinearity.

The most important thing for obtaining a valid inference about the data is to realize the presence of multicollinearity. Additionally, a researcher should be aware of the consequences of multicollinearity.