count: false class: bg-main1 hide-slide-number split-70 .column[.right.vmiddle.content[ .font3[.amber[Differences] in Multiple Groups] ]] .bg-main4.column[.vmiddle.content[ .amber[Aly Lamuri] Indonesia Medical Education and Research Institute ]] --- name: overview layout: true class: bg-main4 middle split-30 hide-slide-number .column[.vmiddle.right.content[ .amber.font3[Overview] ]] --- template: overview count: false .bg-main1.column[.vmiddle.content[ - .amber[Unpaired test] - Paired test - Final thoughts - Conceptual remarks - Case examples ]] --- layout: false class: bg-main3 # Unpaired test .font2[ - Kruskal-Wallis H Test - Differences in multiple groups - Analogous to one-way ANOVA (.amber[not] its alternative!) ] ??? - ANOVA is not sensitive to non-normality, unless the data is highly skewed - Kruskal-Wallis is often overused in that regard - Kruskal-Wallis is somewhat limited as you cannot assign multiple independent variables nor adjusting for covariates --- count: false class: bg-main3 # Assumptions and limitations .font2[ - I.I.D - Does not assume normality - Requires homogeneous intergroup variances ] ??? - Just like ANOVA, Kruskal-Wallis requires a homogeneous intergroup variances - When the data presents with a heterogeneous intergroup variance, Kruskal-Wallis performs worse than ANOVA - In such a case, please consider using Welch's ANOVA (`oneway.test` in `R`) - Welch's method is not available for factorial ANOVA --- class: bg-main3 # Procedure .font2[ - Pool and sort all data element - Assign rank to sorted data - Adjust ranks on tied data - Calculate H statistics ] -- .font2[ $$ H = \left[ \frac{12}{\color{orange}{n}(\color{orange}{n}+1)} \displaystyle \sum_{i=1}^\color{red}{k} \frac{\color{yellow}{R}_i^2}{\color{orange}{n}_i} \right] - 3 (\color{orange}{n}+1) $$ ] ??? - Only need to understand three arguments - `\(n\)`: Total observed value - `\(k\)`: Total number of groups - `\(R\)`: Rank from pooled data -- .font2[ `$$H \sim \chi^2(k-1)$$` ] --- layout: true class: bg-main3 # Example, please? --- ```r # CO2 dataset in R str(DNase) ``` ``` ## Classes 'nfnGroupedData', 'nfGroupedData', 'groupedData' and 'data.frame': 176 obs. of 3 variables: ## $ Run : Ord.factor w/ 11 levels "10"<"11"<"9"<..: 4 4 4 4 4 4 4 4 4 4 ... ## $ conc : num 0.0488 0.0488 0.1953 0.1953 0.3906 ... ## $ density: num 0.017 0.018 0.121 0.124 0.206 0.215 0.377 0.374 0.614 0.609 ... ## - attr(*, "formula")=Class 'formula' language density ~ conc | Run ## .. ..- attr(*, ".Environment")=<environment: R_EmptyEnv> ## - attr(*, "labels")=List of 2 ## ..$ x: chr "DNase concentration" ## ..$ y: chr "Optical density" ## - attr(*, "units")=List of 1 ## ..$ x: chr "(ng/ml)" ``` ??? - We will use `DNase` dataset - It is a result obtained from an ELISA assay - `Run`: The assay run - `conc`: Protein concentration - `density`: Optical density in the assay --- count: false ```r with(DNase, tapply(density, Run, shapiro.test)) %>% lapply(broom::tidy) %>% lapply(data.frame) %>% {do.call(rbind, .)} %>% knitr::kable() %>% kable_minimal() ``` <table class=" lightable-minimal" style='font-family: "Trebuchet MS", verdana, sans-serif; margin-left: auto; margin-right: auto;'> <thead> <tr> <th style="text-align:left;"> </th> <th style="text-align:right;"> statistic </th> <th style="text-align:right;"> p.value </th> <th style="text-align:left;"> method </th> </tr> </thead> <tbody> <tr> <td style="text-align:left;"> 10 </td> <td style="text-align:right;"> 0.891 </td> <td style="text-align:right;"> 0.059 </td> <td style="text-align:left;"> Shapiro-Wilk normality test </td> </tr> <tr> <td style="text-align:left;"> 11 </td> <td style="text-align:right;"> 0.888 </td> <td style="text-align:right;"> 0.051 </td> <td style="text-align:left;"> Shapiro-Wilk normality test </td> </tr> <tr> <td style="text-align:left;"> 9 </td> <td style="text-align:right;"> 0.889 </td> <td style="text-align:right;"> 0.053 </td> <td style="text-align:left;"> Shapiro-Wilk normality test </td> </tr> <tr> <td style="text-align:left;"> 1 </td> <td style="text-align:right;"> 0.883 </td> <td style="text-align:right;"> 0.044 </td> <td style="text-align:left;"> Shapiro-Wilk normality test </td> </tr> <tr> <td style="text-align:left;"> 4 </td> <td style="text-align:right;"> 0.877 </td> <td style="text-align:right;"> 0.035 </td> <td style="text-align:left;"> Shapiro-Wilk normality test </td> </tr> <tr> <td style="text-align:left;"> 8 </td> <td style="text-align:right;"> 0.876 </td> <td style="text-align:right;"> 0.033 </td> <td style="text-align:left;"> Shapiro-Wilk normality test </td> </tr> <tr> <td style="text-align:left;"> 5 </td> <td style="text-align:right;"> 0.879 </td> <td style="text-align:right;"> 0.037 </td> <td style="text-align:left;"> Shapiro-Wilk normality test </td> </tr> <tr> <td style="text-align:left;"> 7 </td> <td style="text-align:right;"> 0.883 </td> <td style="text-align:right;"> 0.043 </td> <td style="text-align:left;"> Shapiro-Wilk normality test </td> </tr> <tr> <td style="text-align:left;"> 6 </td> <td style="text-align:right;"> 0.880 </td> <td style="text-align:right;"> 0.039 </td> <td style="text-align:left;"> Shapiro-Wilk normality test </td> </tr> <tr> <td style="text-align:left;"> 2 </td> <td style="text-align:right;"> 0.869 </td> <td style="text-align:right;"> 0.027 </td> <td style="text-align:left;"> Shapiro-Wilk normality test </td> </tr> <tr> <td style="text-align:left;"> 3 </td> <td style="text-align:right;"> 0.880 </td> <td style="text-align:right;"> 0.039 </td> <td style="text-align:left;"> Shapiro-Wilk normality test </td> </tr> </tbody> </table> --- count: false ```r with(DNase, car::leveneTest(conc, Run)) ``` ``` ## Levene's Test for Homogeneity of Variance (center = median) ## Df F value Pr(>F) ## group 10 0 1 ## 165 ``` --- count: false ```r kruskal.test(conc ~ Run, data=DNase) ``` ``` ## ## Kruskal-Wallis rank sum test ## ## data: conc by Run ## Kruskal-Wallis chi-squared = 0, df = 10, p-value = 1 ``` ```r rstatix::kruskal_effsize(conc ~ Run, data=DNase) ``` ``` ## # A tibble: 1 x 5 ## .y. n effsize method magnitude ## * <chr> <int> <dbl> <chr> <ord> ## 1 conc 176 -0.0606 eta2[H] moderate ``` --- count: false ```r dunn.test::dunn.test(DNase$conc, DNase$Run) ``` ``` ## Kruskal-Wallis rank sum test ## ## data: x and group ## Kruskal-Wallis chi-squared = 0, df = 10, p-value = 1 ## ## ## Comparison of x by group ## (No adjustment) ## Col Mean-| ## Row Mean | 1 10 11 2 3 4 ## ---------+------------------------------------------------------------------ ## 10 | 0.000000 ## | 0.5000 ## | ## 11 | 0.000000 0.000000 ## | 0.5000 0.5000 ## | ## 2 | 0.000000 0.000000 0.000000 ## | 0.5000 0.5000 0.5000 ## | ## 3 | 0.000000 0.000000 0.000000 0.000000 ## | 0.5000 0.5000 0.5000 0.5000 ## | ## 4 | 0.000000 0.000000 0.000000 0.000000 0.000000 ## | 0.5000 0.5000 0.5000 0.5000 0.5000 ## | ## 5 | 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 ## | 0.5000 0.5000 0.5000 0.5000 0.5000 0.5000 ## | ## 6 | 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 ## | 0.5000 0.5000 0.5000 0.5000 0.5000 0.5000 ## | ## 7 | 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 ## | 0.5000 0.5000 0.5000 0.5000 0.5000 0.5000 ## | ## 8 | 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 ## | 0.5000 0.5000 0.5000 0.5000 0.5000 0.5000 ## | ## 9 | 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 ## | 0.5000 0.5000 0.5000 0.5000 0.5000 0.5000 ## Col Mean-| ## Row Mean | 5 6 7 8 ## ---------+-------------------------------------------- ## 6 | 0.000000 ## | 0.5000 ## | ## 7 | 0.000000 0.000000 ## | 0.5000 0.5000 ## | ## 8 | 0.000000 0.000000 0.000000 ## | 0.5000 0.5000 0.5000 ## | ## 9 | 0.000000 0.000000 0.000000 0.000000 ## | 0.5000 0.5000 0.5000 0.5000 ## ## alpha = 0.05 ## Reject Ho if p <= alpha/2 ``` --- template: overview count: false .bg-main1.column[.vmiddle.content[ - Unpaired test - .amber[Paired test] - Final thoughts - Conceptual remarks - Case examples ]] --- layout: false class: bg-main3 # Paired test .font2[ - Analogous to repeated measure ANOVA - Calculate .amber[*within*] subject differences - Compare .amber[*between*] group differences ] --- class: bg-main3 # Procedure .font2[ - Rank observation in the same subjects - Sum all ranks within the same group - Calculate the statistics ] -- .font2[ $$ Q = \left[ \frac{12 \color{red}{N}}{\color{red}{N} \color{yellow}{k} ( \color{yellow}{k}+1)} \displaystyle \sum_{i=1}^{\color{yellow}{k}} \color{orange}{R}_i^2 \right] - 3 \color{red}{N}( \color{yellow}{k}+1) $$ ] ??? - `\(N\)`: Number of rows (block) - `\(k\)`: Number of columns (treatment / repetition) - `\(R\)`: Ranked values -- .font2[ `$$Q \sim \chi^2(k-1)$$` ] --- layout: true class: bg-main3 # Example, please? --- ```r str(warpbreaks) ``` ``` ## 'data.frame': 54 obs. of 3 variables: ## $ breaks : num 26 30 54 25 70 52 51 26 67 18 ... ## $ wool : Factor w/ 2 levels "A","B": 1 1 1 1 1 1 1 1 1 1 ... ## $ tension: Factor w/ 3 levels "L","M","H": 1 1 1 1 1 1 1 1 1 2 ... ``` --- count: false ```r wp <- aggregate(warpbreaks$breaks, by = list( w = warpbreaks$wool, t = warpbreaks$tension ), FUN = mean ) friedman.test(x ~ w | t, data=wp) ``` ``` ## ## Friedman rank sum test ## ## data: x and w and t ## Friedman chi-squared = 0.3, df = 1, p-value = 0.6 ``` ```r rstatix::friedman_effsize(x ~ w | t, data=wp) ``` ``` ## # A tibble: 1 x 5 ## .y. n effsize method magnitude ## * <chr> <int> <dbl> <chr> <ord> ## 1 x 3 0.111 Kendall W small ``` --- template: overview count: false .bg-main1.column[.vmiddle.content[ - Unpaired test - Paired test - .amber[Final thoughts] - Conceptual remarks - Case examples ]] ??? - We have learnt most of the basics for statistical test, both in parametric and non-parametric approaches - We may need to contemplate what we have learnt so far --- layout: false class: bg-main3 # Excerpts on non-parametric test .font2[ - Limited if compared to parametric tests - Whenever possible, use parametric tests - However, the non-parametric test is better in ordinal data ] ??? - In the case of having an ordinal data as your dependent variable, parametric test is practically unusable - We use parametric tests to handle numeric data --- class: bg-main3 # Parametric test with non-normal data? .font2[ - You may consider this approach - Need a substantially large sample size - Please be careful with skewed data - .amber[Homogeneity] of intergroup variances is an important assumption! ] --- class: bg-main3 # Further analysis after ANOVA .font2[ - Post-hoc analysis is a must - ANOVA is an explanatory statistical model - Residual analysis to test model goodness of fit - Assumption: - .amber[Residual] normality - Homogeneity of .amber[residual] variances ] ??? - In residual analyses, we need to satisfy both assumptions - You can check normality using a QQ-plot or statistical test - Homogeneous residual variance is homoscedasticity - Test for homoscedasticity: Breusch-Pagan or Harrison-McCabe test --- template: overview count: false .bg-main1.column[.vmiddle.content[ - Unpaired test - Paired test - Final thoughts - .amber[Conceptual remarks] - Case examples ]] --- class: bg-main3 # Conceptual remarks .font2[ - Distribution: discrete and continuous - Hypotheses: `\(H_0\)` and `\(H_1\)` - Statistical tests - Independent and dependent variables ] ??? - Mention about binomial, Poisson, normal, and `\(\chi^2\)` distribution - Explain more on differences between independent and dependent variables --- class: bg-main3 # What test should we use? .font2[ - Categoric DV and categoric IV - Numeric DV and categoric IV (2 groups) - Numeric DV and categoric IV (>2 groups) ] ??? - How about numeric DV and numeric IV? - And how if we have categoric DV and numeric IV? --- template: overview count: false .bg-main1.column[.vmiddle.content[ - Unpaired test - Paired test - Final thoughts - Conceptual remarks - .amber[Case examples] ]]