# Dissertation Statistics Help

## Request

To request a blog written on a specific topic, please email James@StatisticsSolutions.com with your suggestion. Thank you!

## Monday, April 1, 2013

## Monday, February 25, 2013

### Bonferroni Correction

- Also known as Bonferroni type adjustment

- Made for inflated Type I error (the higher the chance for a false positive; rejecting the null hypothesis when you should not)

- When conducting multiple analyses on the same dependent variable, the chance of committing a Type I error increases, thus increasing the likelihood of coming about a significant result by pure chance. To correct for this, or protect from Type I error, a Bonferroni correction is conducted.

- Bonferroni correction is a conservative test that, although protects from Type I Error, is vulnerable to Type II errors (failing to reject the null hypothesis when you should in fact reject the null hypothesis)

- Alter the
*p*value to a more stringent value, thus making it less likely to commit Type I Error

- To get the Bonferroni
corrected/adjusted
*p*value, divide the original α-value by the number of analyses on the dependent variable. The researcher assigns a new alpha for the set of dependent variables (or analyses) that does not exceed some critical value: α_{critical}= 1 - (1 – α_{altered})^{k}, where k = the number of comparisons on the same dependent variable.

- However, when reporting the new
p-value, the rounded version (of 3 decimal places) is typically reported. This rounded version is not technically
correct; a rounding error. Example: 13 correlation
analyses on the same dependent variable would indicate the need for a Bonferroni correction of (α
_{altered }=.05/13) = .004 (rounded), but α_{critical}= 1 - (1-.004)^{13 }= 0.051, which is not less than 0.05. But with the non-rounded version: (α_{altered }=.05/13) = .003846154, and α_{critical}= 1 - (1 - .003846154)^{13}= 0.048862271, which is in-fact less than 0.05! SPSS does not currently have the capability to set alpha levels beyond 3 decimal places, so the rounded version is presented and used.

- Another example: 9 correlations are
to be conducted between SAT scores and 9 demographic variables. To protect from Type I Error, a Bonferroni
correction should be conducted. The new
p-value will be the alpha-value (α
_{original}= .05) divided by the number of comparisons (9): (α_{altered }= .05/9) = .006. To determine if any of the 9 correlations is statistically significant, the*p*-value must be*p*__<__.006.

## Thursday, January 24, 2013

### Checking the Additional Assumptions of a MANOVA

So a MANOVA is typically seen as an extension of an ANOVA
that has more than one continuous variable. The typical assumptions of an ANOVA
should be checked, such as normality, equality of variance, and univariate
outliers. However, there are additional assumptions that should be checked when
conducting a MANOVA.

The additional assumptions of the MANOVA include:

- Absence of multivariate outliers
- Linearity
- Absence of multicollinearity
- Equality of covariance matrices

Absence of multivariate outliers is checked by assessing
Mahalanobis Distances among the participants. To do this in SPSS, run a
multiple linear regression with all of the dependent variables of the MANOVA as
the independent variables of the multiple linear regression. The dependent
variable would be simply an ID variable. There is an option in SPSS to save the
Mahalanobis Distances when running the regression. Once this is done, sort the
Mahalanobis Distances from greatest to least. To identify an outlier, the
critical chi square value must be known. This is derived from the critical chi
square value at

*p*= .001 with the degrees of freedom being the number of dependent variables. With 3 variables, the critical value is 16.27, so any participants with a Mahalanobis Distance value greater than 16.27 should be removed.
Linearity assumes that all of the dependent variables are
linearly related to each other. This can be checked by conducting a scatterplot
matrix between the dependent variables. Linearity should be met for each group
of the MANOVA separately.

Absence of multicollinearity is checked by conducting
correlations among the dependent variables. The dependent variables should all
be moderately related, but any correlation over .80 presents a concern for
multicollinearity.

Equality of covariance matrices is an assumption checked by
running a Box’s M test. Unlike most tests, the Box’s M test tends to be very
strict, and thus the level of significance is typically .001. So as long as the

*p*value for the test is above .001, the assumption is met.## Friday, December 7, 2012

### The differences in most common statistical analyses

__Correlation vs. Regression vs. Mean Differences__

- Inferential (parametric and non-parametric) statistics are conducted when the goal of the research is to draw conclusions about the statistical significance of the relationships and/or differences among variables of interest.

- The “relationships” can be tested in different statistically ways, depending on the goal of the research. The three most common meanings of “relationship” between/among variables are:

1.
Strength, or association, between variables = e.g.,
Pearson & Spearman rho correlations

3.
Statistical contribution/prediction on a
variable from another(s) = regression.

- Correlations are the appropriate analyses when
the goal of the research is to test the strength, or association, between two
variables. There are two main types of
correlations: Pearson product-moment correlations, a.k.a. Pearson (
*r*), and Spearman rho (*r*) correlations. A Pearson correlation is a parametric test that is appropriate when the two variables are continuous. Like with all parametric tests, there are assumptions that need to be met; for a Pearson correlation: linearity and homoscedasticity. A Spearman correlation is a non-parametric test that is appropriate when at least one of the variables is ordinal._{s}

o
E.g., a Pearson correlation is appropriate for
the two continuous variables: age and height.

o
E.g., a Spearman correlation is appropriate for
the variables: age (continuous) and income level (under 25,000, 25,000 –
50,000, 50,001 – 100,000, above 100,000).

- To test
for mean differences by group, there a variety of analyses that can be
appropriate. Three parametric examples
will be given: Dependent sample
*t*test, Independent sample*t*test, and an analysis of variance (ANOVA). The assumption of the dependent sample*t*test is normality. The assumptions of the independent sample*t*test are normality and equality of variance (a.k.a. homogeneity of variance). The assumptions of an ANOVA are normality and equality of variance (a.k.a. homogeneity of variance).

o
E.g., a dependent

*t*– test is appropriate for testing mean differences on a continuous variable by time on the same group of people: testing weight differences by time (year 1 - before diet vs. year 2 – after diet) for the same participants.
o
E.g., an independent

*t*-test is appropriate for testing mean differences on a continuous variable by two independent groups: testing GPA scores by gender (males vs. females)
o
E.g., an ANOVA is appropriate for testing mean
differences on a continuous variable by a group with more than two independent
groups: testing IQ scores by college major (Business vs. Engineering vs.
Nursing vs. Communications)

- To test if a variable(s) offers a significant contribution, or predicts, another variable, a regression is appropriate. Three parametric examples will be given: simple linear regression, multiple linear regression, and binary logistic regression. The assumptions of a simple linear regression are linearity and homoscedasticity. The assumptions of a multiple linear regressions are linearity, homoscedasticity, and the absence of multicollinearity. The assumption of binary logistic regression is absence of multicollinearity.

o
E.g., a simple linear regression is appropriate
for testing if a continuous variable predicts another continuous variable:
testing if IQ scores predict SAT scores

o
E.g., a multiple linear regression is
appropriate for testing if more than one continuous variable predicts another
continuous variable: testing if IQ scores and GPA scores predict SAT scores

o
E.g., a binary logistic regression is
appropriate for testing if more than one variable (continuous or dichotomous) predicts
a dichotomous variable: testing if IQ scores, gender, and GPA scores predict
entrance to college (yes = 1 vs. no = 0).

- In regards to the assumptions mentioned above:

o
Linearity assumes a straight line relationship
between the variables

o
Homoscedasticity assumes that scores are
normally distributed about the regression line

o
Absence of multicollinearity assumes that
predictor variables are not too related

o
Normality assumes that the dependent variables
are normally distributed (symmetrical bell shaped) for each group

o
Homogeneity of variance assumes that groups have
equal error variances

## Monday, November 19, 2012

### Manipulation Checks (betwen two groups)

- A procedure that can be used to test whether the
levels (or groups) of the IV differ on the DVs.
E.g., a study consists of two different types of primates, where one
primate is “more intelligent” and the other primate is “less intelligent.” The IV is primate intelligence (high
intelligence vs. low intelligence) and the DVs are five different
questionnaires that each measures, or rates, the participants’ attitudes on the
primates. Each questionnaire can measure
different attributes that deal with primate intelligence (ex., problem solving,
memorization, etc…) A manipulation check would assess if the researcher has
effectively “manipulated” primate intelligence.
In this example, an independent sample
*t*test would be the appropriate statistical analysis for the manipulation check: five*t*tests on the five composite scores (from the five different questionnaires) by primate intelligence (high intelligence vs. low intelligence). If the results (per composite score) are statistically significant, than primate intelligence can be said to be effectively manipulated. The IV can be used for further analyses.

- (Can be, but does not have to be?) Included at the end of each questionnaire.

- Checks (each possible comparison) for consistency on the different dependent variables (the questionnaires) by the independent variable (the two groups).

- Included in a study to test the effectiveness of the IV on the different surveys (DVs) in the manner in which the study was intended to be constructed.

Subscribe to:
Posts (Atom)