Request

To request a blog written on a specific topic, please email James@StatisticsSolutions.com with your suggestion. Thank you!

Monday, May 11, 2009

Kruskal-Wallis Test

The Kruskal-Wallis test is one of the non parametric tests that is used as a generalized form of the Mann Whitney U test. The Kruskal-Wallis test is used to test the null hypothesis which states that ‘k’ number of samples has been drawn from the same population or the identical population with the same or identical median. If Sj is the population median for the jth group or sample in the Kruskal-Wallis test, then the null hypothesis in mathematical form can be written as S1 =S2= ….. = Sk. Obviously, the alternative hypothesis in the Kruskal-Wallis test would be that Si is not equal to Sj. This means that in the Kruskal-Wallis test, at least one pair of groups or samples has different pairs.

In order to apply the Kruskal-Wallis test, one has to write the data in a two way format in such a manner that each column represents each successive sample. In the computation of the Kruskal-Wallis test, each of the ‘N’ observations is replaced in the form of ranks. This means that in the Kruskal-Wallis test, all the values from the ‘k’ number of samples are combined together and are ranked in a single series.

The smallest in the Kruskal-Wallis test is replaced by the rank 1. The next smallest in the Kruskal-Wallis test is replaced by rank 2, and the largest in the Kruskal-Wallis test is replaced by ‘N.’ Here, ‘N’ in the Kruskal-Wallis test is denoted as the total number of the observations in the ‘k’ number of samples. After this, the sum of ranks in each sample or column is found in the Kruskal-Wallis test.

From the sum of the ranks, the researcher in the Kruskal-Wallis test computes the average rank for each sample or group. If the samples are from an identical population in the Kruskal-Wallis test, then the average rank should be about the same. On the other hand, if the samples in the Kruskal-Wallis test are from populations with different medians, then the average rank will differ.

The Kruskal-Wallis test assesses the differences against the average ranks in order to determine whether or not they are likely to have come from samples drawn from the same population.
If the ‘k’ samples in the Kruskal-Wallis test are actually drawn from a same population or an identical population, then the sampling distribution of the Kruskal-Wallis test statistic and the probability of observing the different values of the Kruskal-Wallis test can be tabled.

While conducting the Kruskal-Wallis test, the researcher should keep in mind that if the number of groups exceeds the value of three and if the number of the observations in each group exceeds the number five, then, in such cases, the sampling distribution of the Kruskal-Wallis test is well approximated by the chi square distribution. This approximation gets better in the Kruskal-Wallis test when both the number of groups and the number of the observations in each group gets increased.

There are certain assumptions in the Kruskal-Wallis test.
  • It is assumed in the Kruskal-Wallis test that the observations in the data set are independent of each other.
  • It is assumed in the Kruskal-Wallis test that the distribution of the population should not be necessarily normal and the variances should not be necessarily equal.
  • It is assumed in the Kruskal-Wallis test that the observations must be drawn from the population by the process of random sampling.

The sample sizes in the Kruskal-Wallis test should be as equal as possible, but some differences are allowed.The Kruskal-Wallis test also has one limitation. If the researcher does not find a significant difference in his data while conducting the Kruskal-Wallis test, then he cannot say that the samples are the same.