+ - 0:00:00
Notes for current slide
Notes for next slide

Differences in Multiple Groups

Aly Lamuri
Indonesia Medical Education and Research Institute

1 / 11

Overview

  • Unpaired test
  • Paired test
  • Final thoughts
  • Conceptual remarks
  • Case examples
1 / 11

Unpaired test

  • Kruskal-Wallis H Test
  • Differences in multiple groups
  • Analogous to one-way ANOVA (not its alternative!)
1 / 11
  • ANOVA is not sensitive to non-normality, unless the data is highly skewed
  • Kruskal-Wallis is often overused in that regard
  • Kruskal-Wallis is somewhat limited as you cannot assign multiple independent variables nor adjusting for covariates

Assumptions and limitations

  • I.I.D
  • Does not assume normality
  • Requires homogeneous intergroup variances
1 / 11
  • Just like ANOVA, Kruskal-Wallis requires a homogeneous intergroup variances
  • When the data presents with a heterogeneous intergroup variance, Kruskal-Wallis performs worse than ANOVA
  • In such a case, please consider using Welch's ANOVA (oneway.test in R)
  • Welch's method is not available for factorial ANOVA

Procedure

  • Pool and sort all data element
  • Assign rank to sorted data
  • Adjust ranks on tied data
  • Calculate H statistics
2 / 11

Procedure

  • Pool and sort all data element
  • Assign rank to sorted data
  • Adjust ranks on tied data
  • Calculate H statistics

H=[12n(n+1)i=1kRi2ni]3(n+1)

2 / 11
  • Only need to understand three arguments
  • n: Total observed value
  • k: Total number of groups
  • R: Rank from pooled data

Procedure

  • Pool and sort all data element
  • Assign rank to sorted data
  • Adjust ranks on tied data
  • Calculate H statistics

H=[12n(n+1)i=1kRi2ni]3(n+1)

Hχ2(k1)

2 / 11
  • Only need to understand three arguments
  • n: Total observed value
  • k: Total number of groups
  • R: Rank from pooled data

Example, please?

# CO2 dataset in R
str(DNase)
## Classes 'nfnGroupedData', 'nfGroupedData', 'groupedData' and 'data.frame': 176 obs. of 3 variables:
## $ Run : Ord.factor w/ 11 levels "10"<"11"<"9"<..: 4 4 4 4 4 4 4 4 4 4 ...
## $ conc : num 0.0488 0.0488 0.1953 0.1953 0.3906 ...
## $ density: num 0.017 0.018 0.121 0.124 0.206 0.215 0.377 0.374 0.614 0.609 ...
## - attr(*, "formula")=Class 'formula' language density ~ conc | Run
## .. ..- attr(*, ".Environment")=<environment: R_EmptyEnv>
## - attr(*, "labels")=List of 2
## ..$ x: chr "DNase concentration"
## ..$ y: chr "Optical density"
## - attr(*, "units")=List of 1
## ..$ x: chr "(ng/ml)"
3 / 11
  • We will use DNase dataset
  • It is a result obtained from an ELISA assay
  • Run: The assay run
  • conc: Protein concentration
  • density: Optical density in the assay

Example, please?

with(DNase, tapply(density, Run, shapiro.test)) %>%
lapply(broom::tidy) %>% lapply(data.frame) %>% {do.call(rbind, .)} %>%
knitr::kable() %>% kable_minimal()
statistic p.value method
10 0.891 0.059 Shapiro-Wilk normality test
11 0.888 0.051 Shapiro-Wilk normality test
9 0.889 0.053 Shapiro-Wilk normality test
1 0.883 0.044 Shapiro-Wilk normality test
4 0.877 0.035 Shapiro-Wilk normality test
8 0.876 0.033 Shapiro-Wilk normality test
5 0.879 0.037 Shapiro-Wilk normality test
7 0.883 0.043 Shapiro-Wilk normality test
6 0.880 0.039 Shapiro-Wilk normality test
2 0.869 0.027 Shapiro-Wilk normality test
3 0.880 0.039 Shapiro-Wilk normality test
3 / 11

Example, please?

with(DNase, car::leveneTest(conc, Run))
## Levene's Test for Homogeneity of Variance (center = median)
## Df F value Pr(>F)
## group 10 0 1
## 165
3 / 11

Example, please?

kruskal.test(conc ~ Run, data=DNase)
##
## Kruskal-Wallis rank sum test
##
## data: conc by Run
## Kruskal-Wallis chi-squared = 0, df = 10, p-value = 1
rstatix::kruskal_effsize(conc ~ Run, data=DNase)
## # A tibble: 1 x 5
## .y. n effsize method magnitude
## * <chr> <int> <dbl> <chr> <ord>
## 1 conc 176 -0.0606 eta2[H] moderate
3 / 11

Example, please?

dunn.test::dunn.test(DNase$conc, DNase$Run)
## Kruskal-Wallis rank sum test
##
## data: x and group
## Kruskal-Wallis chi-squared = 0, df = 10, p-value = 1
##
##
## Comparison of x by group
## (No adjustment)
## Col Mean-|
## Row Mean | 1 10 11 2 3 4
## ---------+------------------------------------------------------------------
## 10 | 0.000000
## | 0.5000
## |
## 11 | 0.000000 0.000000
## | 0.5000 0.5000
## |
## 2 | 0.000000 0.000000 0.000000
## | 0.5000 0.5000 0.5000
## |
## 3 | 0.000000 0.000000 0.000000 0.000000
## | 0.5000 0.5000 0.5000 0.5000
## |
## 4 | 0.000000 0.000000 0.000000 0.000000 0.000000
## | 0.5000 0.5000 0.5000 0.5000 0.5000
## |
## 5 | 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000
## | 0.5000 0.5000 0.5000 0.5000 0.5000 0.5000
## |
## 6 | 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000
## | 0.5000 0.5000 0.5000 0.5000 0.5000 0.5000
## |
## 7 | 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000
## | 0.5000 0.5000 0.5000 0.5000 0.5000 0.5000
## |
## 8 | 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000
## | 0.5000 0.5000 0.5000 0.5000 0.5000 0.5000
## |
## 9 | 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000
## | 0.5000 0.5000 0.5000 0.5000 0.5000 0.5000
## Col Mean-|
## Row Mean | 5 6 7 8
## ---------+--------------------------------------------
## 6 | 0.000000
## | 0.5000
## |
## 7 | 0.000000 0.000000
## | 0.5000 0.5000
## |
## 8 | 0.000000 0.000000 0.000000
## | 0.5000 0.5000 0.5000
## |
## 9 | 0.000000 0.000000 0.000000 0.000000
## | 0.5000 0.5000 0.5000 0.5000
##
## alpha = 0.05
## Reject Ho if p <= alpha/2
3 / 11

Overview

  • Unpaired test
  • Paired test
  • Final thoughts
  • Conceptual remarks
  • Case examples
3 / 11

Paired test

  • Analogous to repeated measure ANOVA
  • Calculate within subject differences
  • Compare between group differences
4 / 11

Procedure

  • Rank observation in the same subjects
  • Sum all ranks within the same group
  • Calculate the statistics
5 / 11

Procedure

  • Rank observation in the same subjects
  • Sum all ranks within the same group
  • Calculate the statistics

Q=[12NNk(k+1)i=1kRi2]3N(k+1)

5 / 11
  • N: Number of rows (block)
  • k: Number of columns (treatment / repetition)
  • R: Ranked values

Procedure

  • Rank observation in the same subjects
  • Sum all ranks within the same group
  • Calculate the statistics

Q=[12NNk(k+1)i=1kRi2]3N(k+1)

Qχ2(k1)

5 / 11
  • N: Number of rows (block)
  • k: Number of columns (treatment / repetition)
  • R: Ranked values

Example, please?

str(warpbreaks)
## 'data.frame': 54 obs. of 3 variables:
## $ breaks : num 26 30 54 25 70 52 51 26 67 18 ...
## $ wool : Factor w/ 2 levels "A","B": 1 1 1 1 1 1 1 1 1 1 ...
## $ tension: Factor w/ 3 levels "L","M","H": 1 1 1 1 1 1 1 1 1 2 ...
6 / 11

Example, please?

wp <- aggregate(warpbreaks$breaks,
by = list(
w = warpbreaks$wool,
t = warpbreaks$tension
), FUN = mean
)
friedman.test(x ~ w | t, data=wp)
##
## Friedman rank sum test
##
## data: x and w and t
## Friedman chi-squared = 0.3, df = 1, p-value = 0.6
rstatix::friedman_effsize(x ~ w | t, data=wp)
## # A tibble: 1 x 5
## .y. n effsize method magnitude
## * <chr> <int> <dbl> <chr> <ord>
## 1 x 3 0.111 Kendall W small
6 / 11

Overview

  • Unpaired test
  • Paired test
  • Final thoughts
  • Conceptual remarks
  • Case examples
6 / 11
  • We have learnt most of the basics for statistical test, both in parametric and non-parametric approaches
  • We may need to contemplate what we have learnt so far

Excerpts on non-parametric test

  • Limited if compared to parametric tests
  • Whenever possible, use parametric tests
  • However, the non-parametric test is better in ordinal data
7 / 11
  • In the case of having an ordinal data as your dependent variable, parametric test is practically unusable
  • We use parametric tests to handle numeric data

Parametric test with non-normal data?

  • You may consider this approach
  • Need a substantially large sample size
  • Please be careful with skewed data
  • Homogeneity of intergroup variances is an important assumption!
8 / 11

Further analysis after ANOVA

  • Post-hoc analysis is a must
  • ANOVA is an explanatory statistical model
  • Residual analysis to test model goodness of fit
  • Assumption:
    • Residual normality
    • Homogeneity of residual variances
9 / 11
  • In residual analyses, we need to satisfy both assumptions
  • You can check normality using a QQ-plot or statistical test
  • Homogeneous residual variance is homoscedasticity
  • Test for homoscedasticity: Breusch-Pagan or Harrison-McCabe test

Overview

  • Unpaired test
  • Paired test
  • Final thoughts
  • Conceptual remarks
  • Case examples
9 / 11

Conceptual remarks

  • Distribution: discrete and continuous
  • Hypotheses: H0 and H1
  • Statistical tests
  • Independent and dependent variables
10 / 11
  • Mention about binomial, Poisson, normal, and χ2 distribution
  • Explain more on differences between independent and dependent variables

What test should we use?

  • Categoric DV and categoric IV
  • Numeric DV and categoric IV (2 groups)
  • Numeric DV and categoric IV (>2 groups)
11 / 11
  • How about numeric DV and numeric IV?
  • And how if we have categoric DV and numeric IV?

Overview

  • Unpaired test
  • Paired test
  • Final thoughts
  • Conceptual remarks
  • Case examples
11 / 11

Overview

  • Unpaired test
  • Paired test
  • Final thoughts
  • Conceptual remarks
  • Case examples
1 / 11
Paused

Help

Keyboard shortcuts

, , Pg Up, k Go to previous slide
, , Pg Dn, Space, j Go to next slide
Home Go to first slide
End Go to last slide
Number + Return Go to specific slide
b / m / f Toggle blackout / mirrored / fullscreen mode
c Clone slideshow
p Toggle presenter mode
t Restart the presentation timer
?, h Toggle this help
Esc Back to slideshow