Request

To request a blog written on a specific topic, please email James@StatisticsSolutions.com with your suggestion. Thank you!

Thursday, September 4, 2008

Chi Square

One of the most common statistical tests we are asked to run at Statistics Solutions is the chi-square, aka Pearson chi-square, cross-tabulation/cross-tab, ect... It seems like there is a lot of confusion about when to use this test and how to use this test. Let’s start out with the “when”.


Chi-square statistical analysis is used when we want to know if there is a relationship between 2 categorical or nominal variables. For example, say I want to know if there is a relationship between males and their level of education. Really, we are looking at a relationship between the variable gender, which is dichotomous (two levels or groups in the variable) with respondents or participants being either male or female, and the variable education, which we’ll say is also dichotomous (high school or below and above high school).


What is the relationship here? We might have hypothesized that there would be a significant relationship between males and education, the nature of which would be men tending to be less educated than women. If our chi-square test is significant - we’ll talk about what makes it significant later – we’ll see some pattern of relationship between these two groups.


Gender * Education Crosstabulation

Count


Education

Total


High School or Below

Above High School


Gender

Male

31

25

56


Female

14

30

44

Total

45

55

100


This is the actual output table we would get if we ran this test. There is no real wrong way to look at the the numbers, since the chi-square is really telling us if the rows are significantly related to the columns.


You can see from the table that 31 participants were male and had an education level of High School or Below and looking at just that column we can see that far more males than females had an education level of High School or Below. There is another number that jumps out at me, and that is the Female row. Notice the 30. Within the Female row or group we could say, 30 had an education level Above High School compared to only 14 with an education level of High School or Below. This is fairly clear, but even more easily seen if we look at the percentages. Let’s look at percentages first within each of the education groups.

Again this is the exact table:



Gender * Education Crosstabulation



Education

Total



High School or Below

Above High School


Gender

Male

Count

31

25

56


% within Education

68.9%

45.5%

56.0%



% of Total

31.0%

25.0%

56.0%


Female

Count

14

30

44



% within Education

31.1%

54.5%

44.0%



% of Total

14.0%

30.0%

44.0%

Total

Count

45

55

100


% within Education

100.0%

100.0%

100.0%


% of Total

45.0%

55.0%

100.0%


This table looks a little confusing, but look a closer look at the names and we can decipher what this means. The numbers of interest are bolded in red. The table shows that 68.9% of the participants/respondents are male and have an education level of High School or Below. You can see that the percentage of males in this education level is much higher than the percentage of females, which is 31.1%. In fact, there are more than twice as many males as females in the High School or Below education level.