Slide

A parametric test requires us to assume or hypothesize parameters in a population. Often, a small sample size or a highly-skewed distribution does not resemble a normal distribution. In such a case, it becomes impertinent to assume normality in our data. Even though parametric tests are quite robust against non-normal data to a certain degree, it still requires a large number of sample. With larger \(n\) and homogeneous intergroup variance, the parametric test may have a sufficient power to correctly reject the \(H_0\). However, if we cannot satisfy the required assumption, we need to drop our hypothesized claim of population parameters. In other words, we are employing a non-parametric test to measure observed differences.

Non-Parametric Test

Given a small sample size, it is difficult for us to ascertain normality. As we have previously discussed, a large sample size merits a high statistical power. Conversely, when we have a low sample size, it is just natural to expect a lower statistical power. Asides from normality assumption, parametric tests are also sensitive to a severe skewness, because the average value will not represent the central tendency. Non-parametric test neither assume normality nor symmetricity, thus providing a legible approach for data unsuitable for parametric tests. However, having a more lenient assumption means to neglect some information in the data, resulting in a lower statistical power compared to the parametric test given its assumptions fulfilled. Following figures represent how the \(\mu\) not describing the central tendency in a skewed data (bottom).

As a summary, we should consider using a non-parametric test in the case of having a small sample size. If our data is not asymptotically normal, then employing a non-parametric test might be a more appropriate step. The presence of extreme outliers or severe skewness may impair the parametric test, so using a non-parametric test is desirable. In conducting a non-parametric test, we hypothesize on the difference in our observation compared to its reference. To describe the difference, we shall use its median value \(M\), therefore:

  • \(H_0:\ M_1 = M_2\)
  • \(H_1:\ M_1 \neq M_2\)

One-Sample Test

Similar to the parametric test, in one-sample test we only have one group of observation. We would like to know whether our group deviates from the hypothesized median. We may employ two type of test, i.e. a one-sample sign test and one-sample Wilcoxon signed rank test. Only the one-sample Wilcoxon test is analogous to the one-sample T-Test

One-Sample Sign Test

A one-sample sign test does not assume normality nor symmetric distribution. It is useful in a skewed data, where the statistics follow a binomial distribution. In fact, it is an extension to the binomial test, which we have discussed in the previous lecture. In this test, we know that our statistics \(B_s \sim B(n, p)\) with \(0=0.5\). We set the probability \(p=0.5\) because the random chance of having \(M_1 \neq M_0\) is 0.5, as the median is the midpoint.

To calculate one-sample sign test, we need to:

  • Find the residual between our observation and hypothesized median
  • Omit all 0
  • Disregard the magnitude, take only its sign
  • Calculate the frequency of positive and negative signs
  • Let \(B_s\) be the resultant \(\to B_s \sim B(n, 0.5)\)

Essentially, we only have two outcome of interest, i.e. having a positive or negative sign. Assuming I.I.D in each of our observation, we have a Bernoulli trial with \(p=0.5\), such that we can model our probability as a Binomial distribution.

Example, please?

# Generate a skewed data using a Chi-squared distribution
set.seed(1)
x <- rchisq(10, 4) %T>% print()
|  [1] 1.66 7.14 6.93 4.10 7.77 5.08 4.58 2.30 1.36 1.67

In this example, we have \(X \sim \chi^2(4)\), presenting as a skewed data. Should we let \(H_0\) be \(M=5\) and we are interested to conduct a two-tailed test, we can proceed using a one-sample sign test.

# Set M and find the residual (difference)
M <- 5
diff <- {x - M}

# Make a data frame
tbl <- data.frame(x=x, abs.diff=abs(diff), sign=sign(diff))
x abs.diff sign
1.66 3.338 -1
7.14 2.142 1
6.93 1.926 1
4.10 0.898 -1
7.77 2.771 1
5.08 0.081 1
4.58 0.424 -1
2.30 2.701 -1
1.36 3.638 -1
1.67 3.329 -1
# Perform a binomial test
res <- lapply(c(-1, 1), function(sign) {
    binom.test(sum(tbl$sign==sign), nrow(tbl), 0.5) %>%
        broom::tidy()
})

# Two-tailed test on sign=-1
knitr::kable(res[[1]], format="simple")
estimate statistic p.value parameter conf.low conf.high method alternative
0.6 6 0.754 10 0.262 0.878 Exact binomial test two.sided
# Two-tailed test on sign=1
knitr::kable(res[[2]], format="simple")
estimate statistic p.value parameter conf.low conf.high method alternative
0.4 4 0.754 10 0.122 0.738 Exact binomial test two.sided

As we have demonstrated, a non-parametric test is straightforward and simple. Statistics obtained from a one-sample sign test only taking into account the difference and neglecting the magnitude. This test is suitable to avoid incorrectly rejecting the \(H_0\) due to biased calculation in the presence of severe skewness.

One-Sample Wilcoxon Signed Rank Test

Similar to the one-sample sign test, one-sample Wilcoxon test also does not assume normality. Contrarily, it assumes a symmetric distribution because one-sample Wilcoxon test is not suitable for a severely-skewed data. We may start to wonder, what distribution has a symmetric shape but not normal? Well, that is a fair question, because so far we rarely discuss about non-normally distributed symmetric data. Though, we have briefly mentioned one of them in the second lecture, the uniform distribution. Of course, there are more examples such as Cauchy distribution, a generalized normal distribution (which is not normal!), and so on. We will not dig too much into this topic, but please be aware that we may have a symmetric but non-normally distributed data.

One-sample Wilcoxon test has a similar procedure to one-sample sign test. However, we assign ranks based on computed differences. The statistics is the resultants of signed rank, where we only consider the minimum value between both the sum of positive and negative ranks. Thus, we sometimes refer one-sample Wilcoxon as a sum rank signed test.

Example, please?

To keep a consistent remark, we shall re-use the previous data. Please be advised, we need to ascertain symmetricity before proceeding with a one-sample Wilcoxon test. To do so, we may refer to a skewness measure of our data. When we have a negative value, it means our data is left skewed, vice versa. Acquiring the skewness, we may use the range of \([-1, 1]\) to decide that our data does not present with a severe skewness impairing its symmetricity.

# Generate a skewed data using a Chi-squared distribution
set.seed(1)
x <- rchisq(10, 4) %T>% print()
|  [1] 1.66 7.14 6.93 4.10 7.77 5.08 4.58 2.30 1.36 1.67
# Add columns to data frame
tbl$ranked <- rank(tbl$abs.diff)
x abs.diff sign ranked
1.66 3.338 -1 9
7.14 2.142 1 5
6.93 1.926 1 4
4.10 0.898 -1 3
7.77 2.771 1 7
5.08 0.081 1 1
4.58 0.424 -1 2
2.30 2.701 -1 6
1.36 3.638 -1 10
1.67 3.329 -1 8
# Calculate the statistics
W <- tapply(tbl$ranked, tbl$sign, sum) %>% min() %T>% print()
| [1] 17
# Find the p-value for a two-tailed test
psignrank(W, nrow(tbl)) * 2
| [1] 0.322
# Built-in test
wilcox.test(x, data=tbl, mu=5)
| 
|   Wilcoxon signed rank exact test
| 
| data:  x
| V = 17, p-value = 0.3
| alternative hypothesis: true location is not equal to 5

Two-Sample Test

Mann-Whitney U test is an unpaired two-sample Wilcoxon test. As in other non-parametric tests, it does not assume normality, although it requires the data be I.I.D. Mann-Whitney U test can also handle skewed data with a small sample size. The concept in Mann-Whitney U test is the sum of ranks, where we pooled all the data elements from both groups. Then we sorted pooled values, starting from the smallest to largest. Similar to one-sample Wilcoxon test, we have to assign a rank to each value in order to compare both groups.

Example, please?

# We will use x as the first group
x
|  [1] 1.66 7.14 6.93 4.10 7.77 5.08 4.58 2.30 1.36 1.67
# Assign x+4 as the second group, make a data frame
tbl <- data.frame(
    obs=c(x, x+4), 
    group=rep(c("1", "2"), each=length(x)) %>% factor()
) %T>% str()
| 'data.frame': 20 obs. of  2 variables:
|  $ obs  : num  1.66 7.14 6.93 4.1 7.77 ...
|  $ group: Factor w/ 2 levels "1","2": 1 1 1 1 1 1 1 1 1 1 ...
# Goodness of fit test to determine the distribution
tapply(tbl$obs, tbl$group, ks.test, pnorm) %>% lapply(broom::tidy) %>%
    lapply(data.frame) %>% {do.call(rbind, .)} %>% kable(format="simple")
statistic p.value method alternative
0.913 0 Exact one-sample Kolmogorov-Smirnov test two-sided
1.000 0 Exact one-sample Kolmogorov-Smirnov test two-sided

As we can see, both data presented with a non-normal distribution. Thus, comparing both observations will require a non-parametric test. Here we employ Mann-Whitney U test and compute the effect size.

wilcox.test(obs ~ group, data=tbl, conf.int=TRUE)
| 
|   Wilcoxon rank sum exact test
| 
| data:  obs by group
| W = 12, p-value = 0.003
| alternative hypothesis: true location shift is not equal to 0
| 95 percent confidence interval:
|  -6.74 -1.26
| sample estimates:
| difference in location 
|                     -4
rstatix::wilcox_effsize(obs ~ group, data=tbl)
| # A tibble: 1 × 7
|   .y.   group1 group2 effsize    n1    n2 magnitude
| * <chr> <chr>  <chr>    <dbl> <int> <int> <ord>    
| 1 obs   1      2        0.642    10    10 large

Paired Test

In previous tests, we always assume I.I.D. In case of paired data, i.e. when an observation influencing another, we need to take into consideration their relationship. Unfortunately, Mann-Whitney U could not discern such a relation between data point, as it only looks at the difference in rank. In paired Wilcoxon test, we do not assume I.I.D., and procedure-wise it is akin to one-sample Wilcoxon test. To understand this concept, please kindly recall how paired T-Test relates to one-sample T-Test. Although paired Wilcoxon test does not assume the shape of our distribution, a symmetric data is still a plus.

To conduct a paired Wilcoxon test, we first measure the difference between paired data points by taking their residuals. Similar observations will result in 0 residual, which we have to omit out of our observation. We assign rank to the absolute value of computed residual then calculate the statistics as we previously demonstrated in the one-sample Wilcoxon test.

Example, please?

In the previous lecture about ANOVA, we are using ChickWeight dataset. We happened to observe a non-normally distributed data for two group of times. Here, we will further examine both groups to understand the difference.

# We will use the ChickWeight dataset
str(ChickWeight)
| Classes 'nfnGroupedData', 'nfGroupedData', 'groupedData' and 'data.frame':    578 obs. of  4 variables:
|  $ weight: num  42 51 59 64 76 93 106 125 149 171 ...
|  $ Time  : num  0 2 4 6 8 10 12 14 16 18 ...
|  $ Chick : Ord.factor w/ 50 levels "18"<"16"<"15"<..: 15 15 15 15 15 15 15 15 15 15 ...
|  $ Diet  : Factor w/ 4 levels "1","2","3","4": 1 1 1 1 1 1 1 1 1 1 ...
|  - attr(*, "formula")=Class 'formula'  language weight ~ Time | Chick
|   .. ..- attr(*, ".Environment")=<environment: R_EmptyEnv> 
|  - attr(*, "outer")=Class 'formula'  language ~Diet
|   .. ..- attr(*, ".Environment")=<environment: R_EmptyEnv> 
|  - attr(*, "labels")=List of 2
|   ..$ x: chr "Time"
|   ..$ y: chr "Body weight"
|  - attr(*, "units")=List of 2
|   ..$ x: chr "(days)"
|   ..$ y: chr "(gm)"
# Assess normality
tapply(ChickWeight$weight, ChickWeight$Time, shapiro.test) %>% lapply(broom::tidy) %>%
    lapply(data.frame) %>% {do.call(rbind, .)} %>% knitr::kable(format="simple")
statistic p.value method
0 0.890 0.000 Shapiro-Wilk normality test
2 0.873 0.000 Shapiro-Wilk normality test
4 0.973 0.315 Shapiro-Wilk normality test
6 0.982 0.648 Shapiro-Wilk normality test
8 0.980 0.577 Shapiro-Wilk normality test
10 0.981 0.616 Shapiro-Wilk normality test
12 0.983 0.686 Shapiro-Wilk normality test
14 0.973 0.325 Shapiro-Wilk normality test
16 0.986 0.830 Shapiro-Wilk normality test
18 0.991 0.975 Shapiro-Wilk normality test
20 0.991 0.968 Shapiro-Wilk normality test
21 0.986 0.869 Shapiro-Wilk normality test

After assessing normality, we see that both \(T=0\) and \(T=2\) not to follow a normal distribution.

# Subset the dataset to exclude normally distributed data
tbl <- subset(ChickWeight, subset={ChickWeight$Time %in% c(0, 2)})

# Make Time as a factor
tbl$Time %<>% factor(levels=c(0, 2))

Since we are interested to observe differences in non-normal distribution, here we subset the data to only include observation in group \(T_0\) and \(T_2\). Then we turn our variables into a factor, where we set \(T_0\) as our group of reference.

# Perform a paired Wilcoxon test
wilcox.test(weight ~ Time, data=tbl, paired=TRUE, conf.int=TRUE)
| 
|   Wilcoxon signed rank test with continuity correction
| 
| data:  weight by Time
| V = 8, p-value = 1e-09
| alternative hypothesis: true location shift is not equal to 0
| 95 percent confidence interval:
|  -9.0 -7.5
| sample estimates:
| (pseudo)median 
|           -8.5
rstatix::wilcox_effsize(weight ~ Time, data=tbl, paired=TRUE)
| # A tibble: 1 × 7
|   .y.    group1 group2 effsize    n1    n2 magnitude
| * <chr>  <chr>  <chr>    <dbl> <int> <int> <ord>    
| 1 weight 0      2        0.862    50    50 large

Lastly, we can perform a paired Wilcoxon test and its associated effect size.