Correlation vs. Regression vs. Mean Differences
- Inferential (parametric and non-parametric) statistics are conducted when the goal of the research is to draw conclusions about the statistical significance of the relationships and/or differences among variables of interest.
- The “relationships” can be tested in different statistically ways, depending on the goal of the research. The three most common meanings of “relationship” between/among variables are:
1.
Strength, or association, between variables = e.g.,
Pearson & Spearman rho correlations
3.
Statistical contribution/prediction on a
variable from another(s) = regression.
- Correlations are the appropriate analyses when the goal of the research is to test the strength, or association, between two variables. There are two main types of correlations: Pearson product-moment correlations, a.k.a. Pearson (r), and Spearman rho (rs) correlations. A Pearson correlation is a parametric test that is appropriate when the two variables are continuous. Like with all parametric tests, there are assumptions that need to be met; for a Pearson correlation: linearity and homoscedasticity. A Spearman correlation is a non-parametric test that is appropriate when at least one of the variables is ordinal.
o
E.g., a Pearson correlation is appropriate for
the two continuous variables: age and height.
o
E.g., a Spearman correlation is appropriate for
the variables: age (continuous) and income level (under 25,000, 25,000 –
50,000, 50,001 – 100,000, above 100,000).
- To test for mean differences by group, there a variety of analyses that can be appropriate. Three parametric examples will be given: Dependent sample t test, Independent sample t test, and an analysis of variance (ANOVA). The assumption of the dependent sample t test is normality. The assumptions of the independent sample t test are normality and equality of variance (a.k.a. homogeneity of variance). The assumptions of an ANOVA are normality and equality of variance (a.k.a. homogeneity of variance).
o
E.g., a dependent t – test is appropriate for testing mean differences on a
continuous variable by time on the same group of people: testing weight
differences by time (year 1 - before diet vs. year 2 – after diet) for the same
participants.
o
E.g., an independent t-test is appropriate for testing mean differences on a continuous
variable by two independent groups: testing GPA scores by gender (males vs.
females)
o
E.g., an ANOVA is appropriate for testing mean
differences on a continuous variable by a group with more than two independent
groups: testing IQ scores by college major (Business vs. Engineering vs.
Nursing vs. Communications)
- To test if a variable(s) offers a significant contribution, or predicts, another variable, a regression is appropriate. Three parametric examples will be given: simple linear regression, multiple linear regression, and binary logistic regression. The assumptions of a simple linear regression are linearity and homoscedasticity. The assumptions of a multiple linear regressions are linearity, homoscedasticity, and the absence of multicollinearity. The assumption of binary logistic regression is absence of multicollinearity.
o
E.g., a simple linear regression is appropriate
for testing if a continuous variable predicts another continuous variable:
testing if IQ scores predict SAT scores
o
E.g., a multiple linear regression is
appropriate for testing if more than one continuous variable predicts another
continuous variable: testing if IQ scores and GPA scores predict SAT scores
o
E.g., a binary logistic regression is
appropriate for testing if more than one variable (continuous or dichotomous) predicts
a dichotomous variable: testing if IQ scores, gender, and GPA scores predict
entrance to college (yes = 1 vs. no = 0).
- In regards to the assumptions mentioned above:
o
Linearity assumes a straight line relationship
between the variables
o
Homoscedasticity assumes that scores are
normally distributed about the regression line
o
Absence of multicollinearity assumes that
predictor variables are not too related
o
Normality assumes that the dependent variables
are normally distributed (symmetrical bell shaped) for each group
o
Homogeneity of variance assumes that groups have
equal error variances