Dissertation Statistics Help: correlation

Showing posts with label correlation. Show all posts

Monday, February 25, 2013

Bonferroni Correction

Also known as Bonferroni type adjustment

Made for inflated Type I error (the higher the chance for a false positive; rejecting the null hypothesis when you should not)

When conducting multiple analyses on the same dependent variable, the chance of committing a Type I error increases, thus increasing the likelihood of coming about a significant result by pure chance. To correct for this, or protect from Type I error, a Bonferroni correction is conducted.

Bonferroni correction is a conservative test that, although protects from Type I Error, is vulnerable to Type II errors (failing to reject the null hypothesis when you should in fact reject the null hypothesis)

Alter the p value to a more stringent value, thus making it less likely to commit Type I Error

To get the Bonferroni corrected/adjusted p value, divide the original α-value by the number of analyses on the dependent variable. The researcher assigns a new alpha for the set of dependent variables (or analyses) that does not exceed some critical value: α_critical = 1 - (1 – α_altered)^k, where k = the number of comparisons on the same dependent variable.

However, when reporting the new p-value, the rounded version (of 3 decimal places) is typically reported. This rounded version is not technically correct; a rounding error. Example: 13 correlation analyses on the same dependent variable would indicate the need for a Bonferroni correction of (α_altered=.05/13) = .004 (rounded), but α_critical = 1 - (1-.004)¹³= 0.051, which is not less than 0.05. But with the non-rounded version: (α_altered=.05/13) = .003846154, and α_critical = 1 - (1 - .003846154)¹³ = 0.048862271, which is in-fact less than 0.05! SPSS does not currently have the capability to set alpha levels beyond 3 decimal places, so the rounded version is presented and used.

Another example: 9 correlations are to be conducted between SAT scores and 9 demographic variables. To protect from Type I Error, a Bonferroni correction should be conducted. The new p-value will be the alpha-value (α_original = .05) divided by the number of comparisons (9): (α_altered= .05/9) = .006. To determine if any of the 9 correlations is statistically significant, the p-value must be p < .006.

Friday, December 7, 2012

The differences in most common statistical analyses

Correlation vs. Regression vs. Mean Differences

Inferential (parametric and non-parametric) statistics are conducted when the goal of the research is to draw conclusions about the statistical significance of the relationships and/or differences among variables of interest.

The “relationships” can be tested in different statistically ways, depending on the goal of the research. The three most common meanings of “relationship” between/among variables are:

1. Strength, or association, between variables = e.g., Pearson & Spearman rho correlations

2. Statistical differences on a continuous variable by group(s) = e.g., t-test and ANOVA

3. Statistical contribution/prediction on a variable from another(s) = regression.

Correlations are the appropriate analyses when the goal of the research is to test the strength, or association, between two variables. There are two main types of correlations: Pearson product-moment correlations, a.k.a. Pearson (r), and Spearman rho (r_s) correlations. A Pearson correlation is a parametric test that is appropriate when the two variables are continuous. Like with all parametric tests, there are assumptions that need to be met; for a Pearson correlation: linearity and homoscedasticity. A Spearman correlation is a non-parametric test that is appropriate when at least one of the variables is ordinal.

o E.g., a Pearson correlation is appropriate for the two continuous variables: age and height.

o E.g., a Spearman correlation is appropriate for the variables: age (continuous) and income level (under 25,000, 25,000 – 50,000, 50,001 – 100,000, above 100,000).

To test for mean differences by group, there a variety of analyses that can be appropriate. Three parametric examples will be given: Dependent sample t test, Independent sample t test, and an analysis of variance (ANOVA). The assumption of the dependent sample t test is normality. The assumptions of the independent sample t test are normality and equality of variance (a.k.a. homogeneity of variance). The assumptions of an ANOVA are normality and equality of variance (a.k.a. homogeneity of variance).

o E.g., a dependent t – test is appropriate for testing mean differences on a continuous variable by time on the same group of people: testing weight differences by time (year 1 - before diet vs. year 2 – after diet) for the same participants.

o E.g., an independent t-test is appropriate for testing mean differences on a continuous variable by two independent groups: testing GPA scores by gender (males vs. females)

o E.g., an ANOVA is appropriate for testing mean differences on a continuous variable by a group with more than two independent groups: testing IQ scores by college major (Business vs. Engineering vs. Nursing vs. Communications)

To test if a variable(s) offers a significant contribution, or predicts, another variable, a regression is appropriate. Three parametric examples will be given: simple linear regression, multiple linear regression, and binary logistic regression. The assumptions of a simple linear regression are linearity and homoscedasticity. The assumptions of a multiple linear regressions are linearity, homoscedasticity, and the absence of multicollinearity. The assumption of binary logistic regression is absence of multicollinearity.

o E.g., a simple linear regression is appropriate for testing if a continuous variable predicts another continuous variable: testing if IQ scores predict SAT scores

o E.g., a multiple linear regression is appropriate for testing if more than one continuous variable predicts another continuous variable: testing if IQ scores and GPA scores predict SAT scores

o E.g., a binary logistic regression is appropriate for testing if more than one variable (continuous or dichotomous) predicts a dichotomous variable: testing if IQ scores, gender, and GPA scores predict entrance to college (yes = 1 vs. no = 0).

In regards to the assumptions mentioned above:

o Linearity assumes a straight line relationship between the variables

o Homoscedasticity assumes that scores are normally distributed about the regression line

o Absence of multicollinearity assumes that predictor variables are not too related

o Normality assumes that the dependent variables are normally distributed (symmetrical bell shaped) for each group

o Homogeneity of variance assumes that groups have equal error variances

Tuesday, May 26, 2009

Correlation

Correlation, as the name suggests, depicts a relationship between two or more variables under study. Correlation is generally categorized into two types, namely Bivariate Correlation and Partial Correlation.

For a free consultation on correlation or dissertation statistics, click here.

Bivariate Correlation is the one that shows an association between two variables. Correlation is the one that shows the association between two variables while keeping control or adjusting the effect of one or more additional variables.

A Correlation is a degree of measure, which means that a Correlation can be negative, positive, or perfect. A positive Correlation is a type of Correlation in which an increase changes the other variable. In other words, if there is an increase (or decrease) in one variable, then there is a simultaneous increase (decrease) in the other variable. A negative Correlation is a type of Correlation where if there is a decrease (or increase) in one variable, then there is a simultaneous increase (or decrease) in the other variables.

A perfect Correlation is that type of Correlation where a change in one variable affects an equivalent change in the other variable.

A British biometrician named Karl Pearson developed a formula to measure the degree of the Correlation, called the Correlation Coefficient. This Correlation Coefficient is generally depicted as ‘r.’ In mathematical language, the Correlation Coefficient, which was developed by the biometrician Karl Pearson, is defined as the ratio between the covariance of the two variables and the product of the square root of their individual variances. The range of the Correlation Coefficient generally lies between -1 to +1. If the value of the Correlation Coefficient is ‘+1,’ then the variable is said to be positively correlated. If, on the other hand, the value of the Correlation Coefficient is ‘-1,’ then the variable is said to be negatively correlated.

The value of the Correlation Coefficient does not depend upon the change in origin and the change in the scale.

If the value of the Correlation Coefficient is zero, then the variables are said to be uncorrelated. Thus, the variables would be regarded as independent. If there is no Correlation in the variables, then the change in one variable will not affect the change in the other variable at all, and therefore the variables will be independent.

However, the researcher should note that the two independent variables are not in any Correlation if the covariance of the variables is zero. This, however, is not true in the opposite case. This means that if the covariance of the two variables is zero, then it does not necessarily mean that the two variables are independent.

There are certain assumptions that come along with the Correlation Coefficient. The following are the assumptions for the Correlation Coefficient:

The Correlation Coefficient assumes that the variables under study should be linearly correlated.
Correlation coefficient assumes that a cause and effect relationship exists between different forces operating on the items of the two variable series. Such forces assumed by the correlation coefficient must be common to both series.

For the cases where operating forces are entirely independent, then the value of the correlation coefficient must be zero. If the value of the correlation coefficient is not zero, then in such cases, correlation is often termed as chance correlation or spurious correlation. For example, the correlation between the income of a person and the height of a person is a case of spurious correlation. Another example of spurious correlation is the correlation between the size of the shoe and the intelligence of a certain group of people.

A Pearsonian coefficient of correlation between the ranks of two variables, say, x and y, is called rank correlation coefficient between that group of variables.

Request