the Kruskal–Wallins test is a rank-based check the is similarcome ns Mann–Whitney U test, but have the right to it is in applied come one-way data with even more than2 groups.

You are watching: Post hoc test for kruskal wallis

Without More presumptions about the circulation that thedata, the Kruskal–Wallins test doens not resolve hypotheses about ns medianns ofthe groups. Instead, ns test adcostume if it ins likely that an observation ina group ins higher 보다 an monitoring in ns other. This is sometimesproclaimed as trial and error if a sample has actually stochastic prominence Compared via theother.

the check assumes that ns observations to be independent. that is, ins ins not Ideal because that combine monitorings or repetitive measuresdata.

Ins ins perdeveloped with the kruskal.test function.

Suitable effect dimension statisticns include preferably Varghaand also Delaney’s A, best Cliff’s delta, Freeman’s theta,and also epsilon-squared.

Post-hoc tests

the outcome that ns Kruskal–Wallis check speak girlfriend if thereare distinctions Among ns groups, yet doesn’t tell girlfriend which groups aredifferent native other groups. In bespeak come determine i m sorry teams to be differentfrom others, post-hoc trial and error can it is in conducted. More than likely ns Most commonpost-hoc check because that the Kruskal–Wallins test is the Dunn test, right here performed withns dunntest function in the FSA package.

Appropriate data

• One-method data

• Dependenns change ins ordinal, interval, or ratio

• Independenns variable ins a element with 2 or more levels. Thatis, 2 or more groups

• observations in between teams to be independent. The is, notcombine or repeated actions data

• In order come be a test of medians, ns distributions ofworths because that every team must be that equivalent form and spread. Otherwise thetest is generally a check that stochastic equality.


• Nultogether hypothesis: the teams to be sampled from populationswith the same distributions. Typically, that ns sampled populations exhibitstochastic equality.

• alternate theory (two-sided): ns teams to be sampledindigenous populaces through different distributions. Typically, that a sampledpopulation exhibitns stochastic dominance.


significant results can be reporting as “There was a significantdifference in worths Among groups.”

Post-hoc evaluation permits you come to speak “There wtogether a significantdifference in worths in between groups A and also B.”, and so on.

other note and also different tests

Mood’s Median test comparens the medianns the groups.

Packeras used in this chapter

ns packages supplied in thins chapter include:

• psych


• lattice

• multcompView


the following commands will install this packeras if theyto be no currently installed:


Kruskal–Wallis test example

This instance re-visits the Pooh, Piglet, and Tigger datanative ns Descriptive Statistics with the likerns PackAge chapter.

It answerns the question, “are ns scores significantlydifferent Amongst the 3 speakers?”

ns Kruskal–Wallis check is conducted with the kruskal.testfunction, i beg your pardon producens a p-value for the hypothesis. Initially ns dataare summarized and also examined making use of bar ptoo many for every group.

InPlaced =(" speak Likert Pooh 3  Pofive 5 Pofive 4 Pooh 4 Pofive 4 Pofive 4  Pofive 4 Pooh 4 Pofive 5 Pofive 5 Piglens 2 Piglet 4 Piglens 2 Piglet 2 Piglens 1 Piglens 2 Piglens 3 Piglens 2 Piglet 2 Piglens 3 Tigger 4 Tigger 4 Tigger 4 Tigger 4 Tigger 5 Tigger 3 Tigger 5 Tigger 4 Tigger 4 Tigger 3")information = read.table(textConnection(Input),header=TRUE)### stimulate level of ns factor; otherway R will certainly alphabetize themData$speaker = factor(Data$Speaker, levels=unique(Data$Speaker))### produce a new variable i beg your pardon ins the likerns scores as a notified factorData$Likert.f = factor(Data$Likert, notified = TRUE)### inspect ns information framelibrary(psych)headTail(Data)str(Data)summary(Data)### Rerelocate uncrucial objectsrm(Input)

summary data dealing with Likert scores together factors

xtabs( ~ speak + Likert.f, data = Data)

Likert.fspeaker 1 2 3 4 5 Pooh 0 0 1 6 3 Piglet 1 6 2 1 0 Tigger 0 0 2 6 2

Xns = xtabs( ~ speaker + Likert.f, data = Data)

prop.table(XT, margin = 1)

Likert.fspeaker 1 2 3 4 5 Pooh 0.0 0.0 0.1 0.6 0.3 Piglet 0.1 0.6 0.2 0.1 0.0 Tigger 0.0 0.0 0.2 0.6 0.2

Bar pseveral data by group

library(lattice)histogram(~ Likert.f | Speaker, data=Data, layout=c(1,3) # columnns and rows ofseparation, personal, instance ptoo many )

summary information dealing with Likerns scorens as numeric

library(FSA)Summarize(Likerns ~ Speaker, data=Data, digits=3)

speaker n Mean sd min Q1 Mean Q3 max percZero1 Pofive 10 4.2 0.632 3 4 4 4.75 5 02 Piglens 10 2.3 0.823 1 2 2 2.75 4 03 Tigger 10 4.0 0.667 3 4 4 4.00 5 0

Kruskal–Wallins test example

This example offers ns formula notati~ above indicating that Likertins ns dependenns variable and also speaker is ns independent variable. Ns data=alternative suggests the information structure that contains the variables. Because that the meaningthe other options, check out ?kruskal.test.

kruskal.test(Likerns ~ Speaker, information = Data)

Kruskal-Wallins location sum testKruskal-Wallis chi-squared = 16.842, df = 2, p-value = 0.0002202

result size

Statistics that effect size for the Kruskal–Wallis testcarry out the degree come which one group has data via higher ranks than anothergroup. Castle are concerned ns probability that a worth native one team will certainly behigher than a value native an additional group. Unprefer p-values, lock to be notimpacted through sample size.

Proper effect dimension statistics for ns Kruskal–Wallischeck encompass Freeman’ns theta and epsilon-squared. Epsilon-squaredis most likely ns Many common. For Freeman’ns theta, a result dimension of 1shows the ns measurements for each team to be totally better or entirelymuch less than part various other group, and also an result size the 0 suggests the tbelow is noeffect; that is, that ns teams are for sure stochastically equal.

an additional choice ins to use ns preferably Cliff’s delta orVargha and also Delaney’ns A (VDA) indigenous pairwise comparisons the all groups. VDA is ns probcapacity the an monitoring from a team ins greater than anmonitoring native ns other group. Because of this interpretation, VDA ins a effectsize statistic that ins reasonably simple to understand.

interpretation that impact sizes have to varies byself-control and the expectations that the experiment. Ns following guidelinesare based upon my personal intuiti~ above or publiburned values. Lock need to not beconsidered universal.

technological note: the values for ns interpretation forFreeman’ns theta come epsilon-squared below were acquired by keepingthe interpretation for epsilon-squared constant and also equal to that forthe Mann–Whitney test. Interpretation worths for Freeman’s theta weredetermined via comparing Freeman’s theta come epsilon-squaredbecause that simulated data (5-allude Likert items, n every group between 4 and 25).

Interpretation for Vargha and also Delaney’s A andCliff’ns delta cons indigenous Vargha and Delaney (2000).





0.01 – theta, k = 2

0.11 – theta, k = 3

0.05 – theta, k = 5

0.05 – theta, k = 7

0.05 – theta, k = 7

0.05 – delta

0.11 – A

0.56 – 0.34 – 0.44

0.64 – 0.29 – 0.34

≥ 0.71

≤ 0.29


library( = Data$Likert, ns = Data$Speaker)

epsilon.squared 0.581

Freeman’ns theta

library( = Data$Likert, ns = Data$Speaker)

Freeman.theta 0.64

preferably Vargha and also Delaney’ns A or Cliff’ns delta

Here, ns multiVDa duty ins supplied to calculateVargha and also Delaney’ns A (VDA), Cliff’s delta (CD), and rbetween all pairs that groups. Ns feature identifies ns comparison through theMost excessive VDA statistic (0.95 because that Pooh – Piglet). The is, itidentify the Most dispaprice groups.

source("")library( = Data$Likert, g = Data$Speaker)

$pairs Compariboy VDA CD r VDA.m CD.m r.m1 Pooh - Piglens = 0 0.95 0.90 0.791 0.95 0.90 0.7912 Pooh - Tigger = 0 0.58 0.16 0.154 0.58 0.16 0.1543 Piglet - Tigger = 0 0.07 -0.86 -0.756 0.93 0.86 0.756$comparichild Comparikid "Pofive - Piglens = 0" $statistic VDA 0.95 $statistic.mVDA.m  0.95

Post-hoc test: Dunn test for a lot of comparisons ofgroups

If ns Kruskal–Wallis check ins significant, a post-hocanalysis have the right to it is in perdeveloped to identify i m sorry groups different native every othergroup.

more than likely ns Many well-known post-hoc test because that theKruskal–Wallins test ins ns Dunn test. The Dunn test deserve to be carried out with ns dunnTestfeature in the FSA package.

Since ns post-hoc test will develop multiple p-values,adjustments come the p-values deserve to it is in do come protect against inflating thepossibility of making a type-ns error. Tright here are a range the techniques for controllingns familyway error price or because that regulating the false exploration rate. See ?p.adjustbecause that detailns ~ above these methods.

once tbelow are many kind of p-worths come evaluate, it isvaluable come conthick a table the p-worths to a compact letter displayformat. In the output, groups to be be separate through letters. Teams share thesame letter are no significantly different. Compact letter displays to be aclear and succinct way come existing outcomes that a lot of comparisons.

### stimulate teams through medianData$speaker = factor(Data$Speaker, levels=c("Pooh", "Tigger","Piglet"))levels(Data$Speaker)### Dunn testlibrary(FSA)Dt = dunnTest(Likert ~ Speaker, data=Data, method="bh") # Adjustsp-worths for a lot of comparisons; # see ?dunnTestfor optionsDT

### Compacns letter display

Pt = DT$resPTlibrary( ~ Comparison, information = PT, thresorganize = 0.05)

group Letter MonoLetter1 Pofive a a 2 Tigger a a 3 Piglet b bteams share a letter no signficantly different(alpha = 0.05).

Post-hoc test: pairway Mann–Whitney U-tests fora lot of comparisons

i don’t recommfinish using pairwise Mann–Whitney U-testns for post-hocexperimentation because that ns Kruskal–Wallins test, but ns following example reflects exactly how thisdeserve to be done.the pairwise.wilcox.test feature producens a tmay be of p-values comparingeach pair that proccasion the inflation the type i error rates, adjustmentns come ns p-valuescan be do making use of ns p.adjust.method option. Right here the fdrtechnique is used. Watch ?p.readjust because that details on accessible p-valueadjustmenns methods.once tbelow to be many type of p-values to evaluate, ins ins helpful to conthick atable that p-values come a compact letter screen format. Thins have the right to becompleted through a combination the ns fullPTmaybe attribute in ns startupcuba.orgpackPeriod and the multcompletters feature in ns multcompViewpackage.In a compacns letter display, groups share the exact same letter are notconsiderably different.

below ns fdr p-worth adjustment method ins used. Watch ?p.adjustbecause that detailns ~ above obtainable methods.

the code create a procession of p-worths dubbed PT, climate convertsthins come a fuller procession referred to as PT1. PT1 is climate happen come ns multcompLettersfeature come it is in converted come a compact letter display.

Note that the p-worth outcomes the the pairwiseMann–Whitney U-tests different somewhat native those that the Dunn test.

### stimulate groups by medianData$speaker = factor(Data$Speaker, levels=c("Pooh", "Tigger", "Piglet"))Data### Pairway Mann–WhitneyPns = pairwise.wilcox.test(Data$Likert, Data$Speaker, p.adjust.method="fdr") # Adjustns p-values formany comparisons; # view ?p.change foroptionsPT

Pairway comparisonns using Wilcox~ above rank amount test Pooh TiggerTigger 0.5174 - Piglet 0.0012 0.0012p worth adjustmenns method: fdr ### Keep in mind that the worths in the tmaybe to be p-valuescomparing every ### pair the groups.

### transform Pns to a complete tmaybe and also callit PT1Pt = PT$p.worth ### Extract p-value tablelibrary( = fullPTable(PT)PT1

### develop compacns letter displaylibrary(multcompView)multcompLetters(PT1, compare=" threshold=0.05, # p-worth come useas definition threshost Letters=letters, reversed = FALSE)

Pooh Tigger Piglet "a" "a" "b" ### values sharing a letter are not significantlydifferent

Plons that medianns and also trust intervals

the adhering to password supplies ns groupwiseTypical functionto develop a file framework of medians for every speak along with the 95%trust intervalns for each Mean through ns percentile method. This mediansare then plotted, with your to trust intervalns presented together error bars. Thegroup letter from the multiple comparisonns (Dunn check or pairwiseMann–Whitney U-tests) are added.

Keep in mind the bootstrapped to trust intervals may no bedependable for discreens data, together as ns ordinal Likerns information used in theseexamples, particularly because that small samples.

library( = groupwiseMedian(Likerns ~ Speaker, information = Data, conf = 0.95, R = 5000, percentile = TRUE, bca = FALSE, number = 3)Sum

speaker n Mean Conf.level Percentile.lower Percentile.upper1 Pofive 10 4 0.95 4.0 5.02 Piglet 10 2 0.95 2.0 3.03 Tigger 10 4 0.95 3.5 4.5

X = 1:3Y = Sum$Percentile.upper + 0.2label = c("a", "b", "a")library(ggplot2)ggplot(Sum, ### ns data framework touse. Aes(x = Speaker, y = Median)) + geom_errorbar(aes(ymin = Percentile.lower, ymax = Percentile.upper), broad = 0.05, dimension = 0.5) + geom_point(shape = 15, dimension = 4) + theme_bw() + theme(axis.title = element_text(challenge = "bold")) + ylab("Mean Likerns score") +annotate("text", x = X, y = Y, label = Label)

Plons of Typical Likert score matches Speaker. Error barns show the 95% confidenceintervals because that ns Mean with ns percenbrick method.


Cohen, J. 1988. Statistical strength analysis because that the BehavioralSciences, 2nd Edition. Routledge.

Vargha, A. And also H.D. Delaney. A Critique and advancement the theCtogether Common LanguEra result dimension Statisticns of McGraw and Wong. 2000. Journal ofeducational and behavioral Statisticns 25(2):101–132.

practice L

1. Considerinns Pooh, Piglet, and Tigger’ns data,

a. What was the Median score because that every instructor?

b. Follow to the Kruskal–Wallis test, is there a statisticaldistinction in scorens Among ns instructors?

c. Wcap is ns worth the preferably Vargha and Delaney’s A forthis data?

d. Just how do you translate thins value? (Wcap does it mean? and also isns conventional interpretation in terms of “small”, “medium”, or “large”?)

e. Lookinns in ~ the post-hoc analysis, i beg your pardon speakers’ scoresare statistically different native which others? who hADVERTISEMENT the statisticallygreatest scores?

f. How would friend summary the results the ns descriptivestatisticns and tests? encompass helpful considerations that any type of differences.

2. Brian, Stewie, and Meg desire come assess ns education and learning level the studentns inyour process top top artistic writing because that adults. Castle desire come know ns medianeducation and learning levetogether for every class, and also if the education and learning level that ns classens werevarious Amongst instructors.

they used ns complying with tmay be to password hins data.

code Abbreviation Level1 2 Hns High school3 BA Bachelor’s4 MA Master’s5 doctor Doctorate

ns following are ns food data.

See more: Watch The Secret World Of Arrietty Full Movie 123Movies, The Secret World Of Arrietty

Instructor student Education"Brian Griffin" a 3"Bria Griffin" b 2"Bria Griffin" c 3"Bria Griffin" d 3"Brian Griffin" e 3"Brian Griffin" f 3"Brione Griffin" g 4"Brian Griffin" h 5"Brione Griffin" i 3"Brione Griffin" j 4"Bria Griffin" k 3"Brione Griffin" l 2"Stewie Griffin" m 4"Stewin other words Griffin" n 5"Stewin other words Griffin" o 4"Stewie Griffin" ns 4"Stewin other words Griffin" q 4"Stewin other words Griffin" r 4"Stewin other words Griffin" ns 3"Stewin other words Griffin" ns 5"Stewin other words Griffin" u 4"Stewin other words Griffin" v 4"Stewin other words Griffin" w 3"Stewin other words Griffin" x 2"Meg Griffin" y 3"Meg Griffin" z 4"Meg Griffin" aa 3"Meg Griffin" Abdominal muscle 3"Mzb Griffin" ac 3"Mzb Griffin" ADVERTISEMENT 2"Meg Griffin" ae 3"Meg Griffin" af 4"Mzb Griffin" ans 2"Meg Griffin" ah 3"Meg Griffin" ai 2"Meg Griffin" aj 1

because that every that the following, price the question, and showns outPut indigenous ns analysens girlfriend used to answer ns question.

a. What wtogether ns Typical education and learning levetogether for each instructor’sclass? (it is in certain come report the education and learning level, no just the numeric code!)

b. According come the Kruskal–Wallins test, is there a differencein ns education levetogether that students Among the instructors?

c. What ins ns value of maximum Vargha and also Delaney’ns A forthis data?

d. Just how carry out you interpret this value? (Wcap doens it mean? and also isns traditional interpretation in terms of “small”, “medium”, or “large”?)

e. Lookinns at the post-hoc analysis, which classes educationlevel are statistically different native which others? that hAD ns statisticallygreatest education level?

f. Plons Brian, Stewie, and Meg’s data in a method the helps youvisualize ns data. Do ns outcomes reflecns what girlfriend would certainly suppose from lookingat the plot?

g. Exactly how would certainly friend summarize the results that ns descriptive statisticsand also tests? What perform friend finish practically?