Correlation vs. Regression vs. Mean Differences
- Inferential
(parametric and non-parametric) statistics are conducted when the goal of the
research is to draw conclusions about the statistical significance of the
relationships and/or differences among variables of interest.
-
The “relationships”
can be tested in different statistically ways, depending on the goal of the
research. The three most common
meanings of “relationship” between/among variables are:
2.
Statistical differences on a continuous variable
by group(s) = e.g.,
t-test and
ANOVA
3.
Statistical contribution/prediction on a
variable from another(s) =
regression.
- Correlations are the appropriate analyses when
the goal of the research is to test the strength, or association, between two
variables. There are two main types of
correlations: Pearson product-moment correlations, a.k.a. Pearson (r), and Spearman rho (rs) correlations. A Pearson correlation is a parametric test
that is appropriate when the two variables are continuous. Like with all parametric tests, there are
assumptions that need to be met; for a Pearson correlation: linearity and homoscedasticity.
A Spearman correlation is a
non-parametric test that is appropriate when at least one of the variables is
ordinal.
o
E.g., a Pearson correlation is appropriate for
the two continuous variables: age and height.
o
E.g., a Spearman correlation is appropriate for
the variables: age (continuous) and income level (under 25,000, 25,000 –
50,000, 50,001 – 100,000, above 100,000).
- To test
for mean differences by group, there a variety of analyses that can be
appropriate. Three parametric examples
will be given: Dependent sample t test,
Independent sample t test, and an
analysis of variance (ANOVA). The
assumption of the dependent sample t test
is normality. The assumptions of the
independent sample t test are
normality and equality of variance (a.k.a. homogeneity of variance). The assumptions of an ANOVA are normality and
equality of variance (a.k.a. homogeneity of variance).
o
E.g., a dependent t – test is appropriate for testing mean differences on a
continuous variable by time on the same group of people: testing weight
differences by time (year 1 - before diet vs. year 2 – after diet) for the same
participants.
o
E.g., an independent t-test is appropriate for testing mean differences on a continuous
variable by two independent groups: testing GPA scores by gender (males vs.
females)
o
E.g., an ANOVA is appropriate for testing mean
differences on a continuous variable by a group with more than two independent
groups: testing IQ scores by college major (Business vs. Engineering vs.
Nursing vs. Communications)
- To test if a variable(s) offers a significant
contribution, or predicts, another variable, a regression is appropriate. Three parametric examples will be given:
simple linear regression, multiple linear regression, and binary logistic
regression. The assumptions of a simple
linear regression are linearity and homoscedasticity. The assumptions of a multiple linear
regressions are linearity, homoscedasticity, and the absence of
multicollinearity. The assumption of
binary logistic regression is absence of multicollinearity.
o
E.g., a simple linear regression is appropriate
for testing if a continuous variable predicts another continuous variable:
testing if IQ scores predict SAT scores
o
E.g., a multiple linear regression is
appropriate for testing if more than one continuous variable predicts another
continuous variable: testing if IQ scores and GPA scores predict SAT scores
o
E.g., a binary logistic regression is
appropriate for testing if more than one variable (continuous or dichotomous) predicts
a dichotomous variable: testing if IQ scores, gender, and GPA scores predict
entrance to college (yes = 1 vs. no = 0).
- In regards to the assumptions mentioned above:
o
Linearity assumes a straight line relationship
between the variables
o
Homoscedasticity assumes that scores are
normally distributed about the regression line
o
Absence of multicollinearity assumes that
predictor variables are not too related
o
Normality assumes that the dependent variables
are normally distributed (symmetrical bell shaped) for each group
o
Homogeneity of variance assumes that groups have
equal error variances