The term Multiple regression was first used by Pearson in 1908. Multiple regression is a statistical technique used to evaluate and establish a quantitative relationships between multiple dependent and independent variables. in simple regression, only a single dependent variable can be regressed on a single independent variable. In multiple regression however, a number of variables, both metric and non-metric, can be involved and regressed on one another. Multiple regression, like other statistical techniques, requires that certain assumptions be valid and fulfilled in order to complete a valid analysis. These assumptions are:
1. The Independent variable(s) should be constant, where a repeat sample is involved. This implies that while the dependent variable(s) can change as a treatment is applied to it, the independent variable(s) should be held constant.
2. The variance of all error terms or residuals related to each variable should be constant.
3. There should be no autocorrelation between the error terms of independent variables. the existence of autocorrelation can be tested by the Run test or Durbin-Watson tests. Both tests work differently in indicating the presence of autocorrelation but are generally equally acceptable, with some scholars preferring the latter.
4. The number of observations must be greater than the number of parameters to be estimated.
5. There should not be a perfectly linear relationship between the explanatory or independent variable(s). in case there is, the confidence interval becomes wider, leading to a higher possibility that a hypothesis which must be rejected, is accepted. This issue is called multicolinearity and refers to the independence of the independent variables. The existence of multicolinearity can be tested by the VIF (Variance Inflation Factor), which is essentially equivalent to 1 divided by 1 minus the correlation coefficient between the variable(s). Where multicolinearity exists, the said problem has to be eliminated from the dataset in question. There are a number of ways this can be accomplished. A common method is to either drop the variable altogether or append cases in the problematic variable(s). however, another effective method is to use Factor scores based on Factor analysis, which will club together the correlated variables to produce a more valid result.
6. When performing multiple regression, the resulting error term should have a mean value of ‘Zero’, implying a complete prediction. The presence of a residual will indicate that the regression output has not completely predicted the relationship between the variables in question, and can be [substantially] improved.
The use of multiple regression as a statistical technique involves estimation of coefficients for the various variables in question. There are two key methods for estimation based on whether the multiple regression is linear or non-linear, although the latter method listed below can be used in either case
- Ordinary least square (OLS): This method was propounded by German mathematician, Carl Friedrich Gauss. It is a point estimation technique, which means that dependent variables are estimated at a particular point rather than in an interval. This method cannot be used in non-linear multiple regression unless the data are modified to become linear. OLS as a technique is based on the principle of minimizing the error term, as opposed to the Maximum Likelihood method, which is based on probability analysis.
- Maximum likelihood Method: This too is a point estimation method, which does not require that data have a linear relationship. in this method, the error term does not need to be normally distributed. This technique of multiple regression relies on probability as measure of the extent to which the model has fit the data. It is more mathematical in nature so before coming computer most of the researcher prefer OLS technique now these day due to computer it is easy to use this method.
A key advantage of multiple regression, besides being able to use multiple variables, is the ability to use multiple types of variables. For instance, a metric or numerical variable can be regressed on a non-metric or string variable, and vice versa. In addition, combinations of metric and non-metric variables can be regressed on metric and non-metric variables. depending on the specific kind of variable in question, different techniques such as discriminant analysis, logistic regression or SEM (Structured Equation Modeling) can be applied.
Click here for dissertation assistance!