Parametric: Mean in Two Groups

.bg-main4.column[.vmiddle.content[
.amber[Aly Lamuri]  
Indonesia Medical Education and Research Institute
]]

---

---

.bg-main1.column[.vmiddle.content[
- .amber[Mean difference]
- One sample T-Test
- Unpaired T-Test
- Paired T-Test
- Effect size
]]

---

# Concept recall

`$$\bar{X} \xrightarrow{d} N(\mu, \frac{\sigma}{\sqrt{n}}) \tag{2}$$`
]

???

- Scaling and centering data following a normal distribution results in a
  standardized normal distribution (Z-distribution)
- Central limit theorem: sample mean from any distribution will follow
  normal distribution

???

Ideal: by knowing the parameter `$\mu$` and `$\sigma$`, we can directly compare our
sample mean to its corresponding population

???

Solution: use its statistics `$\bar{x}$` as an estimate

---

# Mean difference

---

.font2[
- Simply `$\bar{x} - \mu_0$`
- But there bound to be errors in our samples
- What do we do?
- An adjustment to .amber[standard error]
]

???

- One-sample mean difference test: compare mean from our data to previously
  reported mean in the population

---

`\begin{align}
SE &= \frac{\sigma}{\sqrt{n}} \tag{Standard Error} \\
\\
z  &= \frac{\bar{x} - \mu_0}{SE} \\
   &= \frac{\bar{x} - \mu_0}{^{\sigma}/\tiny{\sqrt{n}}} \tag{One-sample Test}
\end{align}`

???

- `$\mu_0$` is the expected (hypothesized) mean, often it is something we acquire
  from previous publications
- `$\sigma$` is the **known** standard deviation in **population**

<br> <br>
.font2[How do we get the p-value?]

(.amber[Hint:] Z-statistics follows the Z-distribution)

???

To get the p-value, we calculate acquired `$z$` statistics as a quantile of the
Z-distribution

---

# Example, please?

In a population of third-year electrical engineering students, we know the
.amber[average final score] of a particular course is .pink[70]. In measuring
students' comprehension, UKRIDA has established a standardized examination with
a .amber[standard deviation] of .pink[10]. We are interested to see whether
students registered to this year course have different average, where 18
students averagely scored 75 on the final exam.

]

---

---

.font2[
`\begin{align}
SE &= \frac{10}{\sqrt{18}} &= 2.36 \\
z  &= \frac{75 - 70}{2.36} &= 2.12
\end{align}`
]

---

# Where does it located in Z-distribution?

---

# Where is it relative to the significance value?

---

# What is the p-value?

.font2[
- Recall the hypothesis `$H_a: \bar{x} \neq \mu_0$`
- We need to conduct a two-tailed test to determine the p-value
- First we find the cummulative probability of `$z$` which satisfies:

`$$P(Z \leqslant 2.12\ |\ \mu,\sigma): Z \sim N(0, 1)$$`
]

```r
2 * {1 - pnorm(2.12, 0, 1)}
```

```
## [1] 0.034
```

---

# What do we learn?

.font2[
- Z-test requires the data to follow .amber[normal distribution]
- We do not need to know the parameter `$\mu \to$` We can hypothesize the value
- But we need the parameter `$\sigma$` to correctly compute `$z$`
]

---

# Any hidden message?

.font2[
- Yes!
- Z-test assumes normality
- So we need to perform goodness of fit or normality test
]

---

# Student's T-distribution

---

`\begin{align}
Let\ & X \sim t_\nu \tag{Notation} \\
P(X=x) &= \frac{\Gamma \big( \frac{\nu+1}{2} \big)}{\sqrt{\nu \pi}\ \Gamma \big( \frac{\nu}{2} \big)} \bigg( 1 + \frac{x^2}{\nu} \bigg) \tag{PDF} \\
\nu &= n - 1 \\
\\
Let\ & T \sim t_\nu \\
T &= Z \sqrt{\frac{\nu}{V}} \tag{Relationship}
\end{align}`

.font2[
- `$T:$` T-distribution with `$\nu$` degree of freedom
- `$Z:$` A standardized normal distribution
- `$V:$` A `$\chi^2$` distribution with `$\nu$` degree of freedom
]

???

- Importance: Use in parametric mean difference between two groups
- Student's T-distribution only depends on 1 parameter: `$\nu$` degree of freedom
- Degree of freedom is total subjects subtracted by 1

---

---

.bg-main1.column[.vmiddle.content[
- Mean difference
- .amber[One sample T-Test]
- Unpaired T-Test
- Paired T-Test
- Effect size
]]

---

# One sample T-Test

.font2[
- Similar to one sample Z-test
- Applied to data with unknown `$\sigma$`
- Use `$s$` as an estimate of the population parameter

`$$t = \frac{\bar{x}-\mu}{^s \big/ \tiny{\sqrt{n}}}$$`
]

---

# Example, please?

```r
set.seed(1)
x <- rnorm(20, mean=120, sd=20)
```

---

```r
summary(x)
```

```
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##    75.7   112.3   127.2   123.8   135.2   151.9
```

```r
sd(x)
```

```
## [1] 18.3
```

---

.font2[
- We let `$x \sim N(120, 20)$`
- But our `$\bar{x}$` is 123.81
- With an `$s$` of 18.265
- And a `$\nu$` of 19
- Does our sample differ from the population?
]

---

## Formulate the hypothesis

---

## Determine the t-statistics

```r
t <- {{mean(x) - 120} / {sd(x) / sqrt(20)}} %T>% print()
```

```
## [1] 0.933
```

---

## Locate `$t$` statistics in the T-distribution

---

## P-value in a one-tailed test

```r
1 - pt(t, df=19)
```

```
## [1] 0.181
```

---

## P-value in a two-tailed test

```r
2 * {1 - pt(t, df=19)}
```

```
## [1] 0.363
```

---

# How is the result in `R`?

```r
t.test(x, mu=120)
```

```
## 
## 	One Sample t-test
## 
## data:  x
## t = 0.9, df = 19, p-value = 0.4
## alternative hypothesis: true mean is not equal to 120
## 95 percent confidence interval:
##  115 132
## sample estimates:
## mean of x 
##       124
```

## Conclusion

- Fail to reject `$H_0$`
- `$\bar{x} = \mu_0 = 120$`

---

.bg-main1.column[.vmiddle.content[
- Mean difference
- One sample T-Test
- .amber[Unpaired T-Test]
- Paired T-Test
- Effect size
]]

---

# Unpaired T-Test

.font2[
- T-Test conducted on two sample groups
- Aims to determine significance on the mean difference
- Types: .amber[Student's] and .amber[Welch's] T-Test
- Assumes .amber[normality] *or* a large sample size
- Also robust in non-normal data, .pink[*unless*] highly-skewed or have outliers
]

???

- Problem with robustness: we often use simulation with specified parameters
- But we don't know such parameters in real-world data
- It's more acceptable when we don't violate the normality assumption

.font2[
`\begin{align}
H_0 &: \bar{x}_1 - \bar{x}_2 = d \\
H_a &: \bar{x}_1 - \bar{x}_2 \neq d \\
d &= \mu_1 - \mu_2 = 0
\end{align}`
]

???

In a unique case, we may find `$d \neq 0$`

---

# Student's T-Test

.font2[
- T-Test with a pooled variance
- .amber[Requires:] Equal variance in two samples (tested with Levene's test)
- Uses pooled variance `$\to$` compute statistics and the degree of freedom
]

`\begin{align}
t &= \frac{\bar{x}_1 - \bar{x}_2 - d}{s_p \sqrt{\frac{1}{n_1} + \frac{1}{n_2}}} \tag{Statistics} \\
s_p &= \sqrt{\frac{(n_1 - 1) s_1^2 + (n_2 - 1) s_2^2}{\nu}} \tag{Pooled variance} \\
\nu &= n_1 + n_2 - 2 \tag{Degree of freedom}
\end{align}`

---

# Welch's T-Test

`\begin{align}
t &= \frac{\bar{x}_1 - \bar{x}_2 - d} {\sqrt{\frac{s_1^2}{n_1} + \frac{s_2^2}{n_2}}} \tag{Statistics} \\
\nu &= \frac{(n_1-1)(n_2-1)}{(n_2-1)C^2 + (1-C^2)(n_1-1)} \tag{Degree of freedom} \\
\\
C &= \frac{\frac{s_1^2}{n_1}}{\frac{s_1^2}{n_1} \frac{s_2^2}{n_2}}
\end{align}`

.font2[
For the record:
- Still assumes .amber[normality]!
- Not-normally distributed data still accepted .pink[if] the sample size is .amber[large enough]
- How large is large enough? Re-check previous discussions on the .amber[CLT]
]

---

# Example, please?

---

.font2[
Suppose we are collecting data on body height. Our population of interest will
be students registered in UKRIDA, where we categorize sex as female and male.
We acquire a normally distributed data from both sexes, where:

- `$female \sim N(155, 15)$`
- `$male \sim N(170, 12)$`

We have a sample of 25 females and 30 males, and would like conduct a
hypothesis test on mean difference.
]

---

```r
set.seed(5)
tbl <- data.frame(
	"height" = c(rnorm(30, 170, 8), rnorm(25, 155, 16)),
	"sex" = c(rep("male", 30), rep("female", 25))
)

tapply(tbl$height, tbl$sex, summary)
```

```
## $female
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##     123     146     154     158     173     190 
## 
## $male
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##     152     165     168     170     177     184
```

```r
tapply(tbl$height, tbl$sex, sd)
```

```
## female   male 
##  17.98   7.93
```

---

???

Does it follow the normal distribution?

---

```r
tapply(tbl$height, tbl$sex, shapiro.test)
```

```
## $female
## 
## 	Shapiro-Wilk normality test
## 
## data:  X[[i]]
## W = 1, p-value = 0.7
## 
## 
## $male
## 
## 	Shapiro-Wilk normality test
## 
## data:  X[[i]]
## W = 1, p-value = 0.2
```

???

Yes, each group follows a normal distribution

---

```r
car::leveneTest(tbl$height ~ tbl$sex)
```

```
## Levene's Test for Homogeneity of Variance (center = median)
##       Df F value  Pr(>F)    
## group  1    14.3 0.00039 ***
##       53                    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
```

---

```r
t.test(height ~ sex, data=tbl, var.equal=FALSE)
```

```
## 
## 	Welch Two Sample t-test
## 
## data:  height by sex
## t = -3, df = 32, p-value = 0.004
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
##  -19.87  -4.07
## sample estimates:
## mean in group female   mean in group male 
##                  158                  170
```

---

```r
t.test(height ~ sex, data=tbl, var.equal=TRUE)
```

```
## 
## 	Two Sample t-test
## 
## data:  height by sex
## t = -3, df = 53, p-value = 0.002
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
##  -19.27  -4.67
## sample estimates:
## mean in group female   mean in group male 
##                  158                  170
```

???

- The p-value is *much lower* compared to Welch's T-Test
- A low p-value does not mean a bad outcome per se, but we need to be cautious
- Especially when we have violation on required assumptions
- Always be wary of statistical error!

---

???

- Visualizing your findings is important
- Both figures give similar information, but conveyed in different fashions

---

.bg-main1.column[.vmiddle.content[
- Mean difference
- One sample T-Test
- Unpaired T-Test
- .amber[Paired T-Test]
- Effect size
]]

---

# Paired T-Test

.font2[
- Previous analysis implicitly assumes .pink[independence]
- Some observations may violate such an assumption
- Cases:
  - Difference between multiple measurements
  - Probability events where each instance influence another
]

---

# Measuring mean differences

---

.font2[
- Two-sample T-Test does not take into account difference within subjects
- In fact, it is a violation of its stringent assumption
- To measure difference within same subjects, we need to reduce the complexity
]

---

.font2[
- Suppose `$\mu_1$` and `$\mu_2$` represent measurement in `$t_1$` and `$t_2$`
- Both measures represent same subject (within comparison)
- We can calculate the difference between both samples:

`$$\mu_d = \mu_1 - \mu_2$$`
]

---

`\begin{align}
H_0 &: \mu_d = 0 \\
H_a &: \mu_d \neq 0
\end{align}`
]

---

# Example, please?

---

In the current investigation, we are looking for the effect of a certain
anti-hipertensive drug. First we measure the blood pressure baseline, then
prescribe the drug to all subjects. Then, we re-measure the blood pressure
after one month. Each subject has a unique identifier, so we can specify mean
differences within paired samples. Suppose we have the following scenario in 30
sampled subjects:

- `$X_1 \sim N(140, 12)$`
- `$X_2 \sim N(130, 17)$`

]

---

.font2.center[
Set our hypotheses:

`\begin{align}
H_0 &: \bar{x}_d = 0 \\
H_a &: \bar{x}_d \neq 0
\end{align}`
]

---

```r
set.seed(1)
tbl <- data.frame(
	"bp" = c(rnorm(30, 140, 12), rnorm(30, 133, 17)),
	"time" = c(rep("Before", 30), rep("After", 30)) %>%
		factor(levels=c("Before", "After"))
)

# Measure the mean of mean difference
md <- with(tbl, bp[time=="Before"] - bp[time=="After"])

# Calculate t-statistics
t <- {mean(md)} / {sd(md) / sqrt(30)} %T>% print()
```

```
## [1] 3.12
```

```r
# Obtain p-value for a two-sided test
2 * {1 - pt(t, df=29)}
```

```
## [1] 0.076
```

---

```r
# Comparison to built-in one-sample T-Test
t.test(md, mu=0)
```

```
## 
## 	One Sample t-test
## 
## data:  md
## t = 2, df = 29, p-value = 0.08
## alternative hypothesis: true mean is not equal to 0
## 95 percent confidence interval:
##  -0.64 12.10
## sample estimates:
## mean of x 
##      5.73
```

---

```r
# Comparison to built-in paired T-Test
t.test(bp ~ time, data=tbl, paired=TRUE)
```

```
## 
## 	Paired t-test
## 
## data:  bp by time
## t = 2, df = 29, p-value = 0.08
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
##  -0.64 12.10
## sample estimates:
## mean of the differences 
##                    5.73
```

---

# Choosing an .amber[appropriate] statistical test

.font2[
- One-sample test:
  - Known `$\sigma \to$` use Z-Test
  - Unknown `$\sigma \to$` use one-sample T-Test
]

.font2[
- Two-sample test `$\to$` do Levene's test
  - Equal variance: Student's method (pooled variance)
  - Unequal variance: Welch's method
]

---

.bg-main1.column[.vmiddle.content[
- Mean difference
- One sample T-Test
- Unpaired T-Test
- Paired T-Test
- .amber[Effect size]
]]

---

# Effect size

`\begin{align}
d &= \frac{\bar{x}_1 - \bar{x}_2}{s_p} \tag{Cohen's D} \\
s_p &= \sqrt{\frac{(s_1^2 + s_2^2)}{2}} \tag{Pooled SD}
\end{align}`

---

# Example, please?

---

```r
set.seed(1)
tbl <- data.frame(
	"bp" = c(rnorm(30, 140, 12), rnorm(30, 133, 17)),
	"time" = c(rep("Before", 30), rep("After", 30)) %>%
		factor(levels=c("Before", "After"))
)

# Measure the mean of mean difference
md <- with(tbl, bp[time=="Before"] - bp[time=="After"])

# Calculate t-statistics
t <- {mean(md)} / {sd(md) / sqrt(30)} %T>% print()
```

```
## [1] 3.12
```

```r
# Obtain p-value for a two-sided test
2 * {1 - pt(t, df=29)}
```

```
## [1] 0.076
```

---

```r
# Calculate pooled standard deviation
sp <- sqrt({with(tbl,
	tapply(bp, time, var, simplify=FALSE)) %>% {do.call(add, .)}
} / 2) %T>% print()
```

```
## [1] 12.4
```

```r
# Measure Cohen's distance
{with(tbl,
	tapply(bp, time, mean, simplify=FALSE)) %>% {do.call(subtract, .)}
} / sp
```

```
## [1] 0.464
```

---

```r
# Calculate power using the `psych` package
d <- psych::cohen.d(tbl ~ time) %T>% print()
```

```
## Call: psych::cohen.d(x = tbl ~ time)
## Cohen d statistic of difference between two means
##    lower effect upper
## bp -0.99  -0.47  0.05
## 
## Multivariate (Mahalanobis) distance between groups
## [1] 0.47
## r equivalent of difference between two means
##    bp 
## -0.23
```

---

```r
# Power analysis using previous information
pwr::pwr.t.test(n=30, d=d$cohen.d[[2]], sig.level=0.05, type="paired")
```

```
## 
##      Paired t test power calculation 
## 
##               n = 30
##               d = 0.472
##       sig.level = 0.05
##           power = 0.704
##     alternative = two.sided
## 
## NOTE: n is number of *pairs*
```

---