Sample Size and Statistical Power

.column.bg-main4[.vmiddle.content[
.amber[Aly Lamuri]  
Indonesia Medical Education and Research Institute
]]

---

# Recap: Hypothesis and significance

---

name: overview
layout: true
class: bg-main4 split-30 hide-slide-number
count: false

---

.column.bg-main2[.vmiddle.content[
- .amber[More on p-value]
- Type of statistical error
- Power analysis as a measure of `$\alpha$` and `$\beta$`
- Equation in calculating sample size
- Random sampling
]]

---

# .amber[P-value:] core concepts

- .font2[We can reject the null when we get a p-value < 0.05. ]

--
.font2[But why?]

--
- .font2[.amber[0.05] simply reflects a .amber[5%] chance]

--
- .font2[...of having a correct null hypothesis]

--
- .font2[Or the way I like to say it: ]

--
.amber.font2[probability value]

--
- .font2[When the probability is .small enough, we reject the null]

--
- .font2[Well, that's not too hard! :)]

---

# .amber[P-value:] a visual example

---

```r
set.seed(1)
coin <- sample(c("H", "T"), 10, replace=TRUE, prob=rep(1/2, 2)) %T>% print()
```

```
##  [1] "T" "T" "H" "H" "T" "H" "H" "H" "H" "T"
```

- We can formulate our hypothesis as:
  - `$H_0: P(X=x) = 0.5$`
  - `$H_a: P(X=x) \neq 0.5$`
--

- As always, we set `H` as our outcome of interest
- Since it is a Bernoulli trial, we assumes it conforms the binomial distribution

---

```r
binom.test(x=sum(coin == "H"), n=length(coin), p=0.5)
```

```
## 
## 	Exact binomial test
## 
## data:  sum(coin == "H") and length(coin)
## number of successes = 6, number of trials = 10, p-value = 0.8
## alternative hypothesis: true probability of success is not equal to 0.5
## 95 percent confidence interval:
##  0.2624 0.8784
## sample estimates:
## probability of success 
##                    0.6
```

- We have seen this numerous times now

--
- But we have yet to unravel the secret behind this magic!

--
- Why did we .pink[fail to reject] the null hypothesis?

--
- Or rather, why does the .amber[p-value > 0.05?]

.amber[Question:] What's the probability of having 6 `H` out of 10 Bernoulli
trials? .pink[Is it < 5%?]

---

]]

```r
dbinom(6, 10, 0.5)
```

```
## [1] 0.2051
```

We can manually calculate the p-value as the .amber[sum] of `$P(X \geqslant 6)$`

```r
2 * (dbinom(6:10, 10, 0.5) %>% sum())
```

```
## [1] 0.7539
```

---

]]

```r
dbinom(60, 100, 0.5)
```

```
## [1] 0.01084
```

And we the p-value would be:

```r
2 * (dbinom(60:100, 100, 0.5) %>% sum())
```

```
## [1] 0.05689
```

---

# .amber[P-value:] take home notes

.font2[
- Theoretically, p-value is *difficult* to understand
- But in practice, it tells you the .amber[probability] of having a *correct* `$H_0$`
- Low p-value `$\to$` reject `$H_0$`
]

---

---

.column.bg-main2[.vmiddle.content[
- More on p-value
- .amber[Type of statistical error]
- Power analysis as a measure of `$\alpha$` and `$\beta$`
- Equation in calculating sample size
- Random sampling
]]

---

# Significance level

- 0.05 is our significance level `$\alpha$`
- .amber[Higher] `$\alpha \to$` more chance to reject the `$H_0 \to$` .pink[incorrect rejection?]
- .amber[More] sample `$\to$` more chance to reject the `$H_0 \to$` .pink[incorrect rejection?]

---

???

- When p-value < 0.05, we reject the `$H_0$`
- And 0.05 is a number we agreed upon
- We know *why* we choose the number, but *what* is 0.05?

---

# Example, please?

Suppose we are conducting a study on .amber[a potential cancer therapy]. We
knew giving the patient a placebo may affect .amber[their recovery] rate by
.pink[50%]. We are certain giving the new **treatment will increase the
.amber[probability]**. Tested on .pink[50] patients, .pink[35] showed signs of
better quality of life.

# Concept check

- Considering I.I.D, does it follow a Bernoulli trial?
- If so, how do we model its distribution?
- What are the parameters for our distribution?
- What formal test can we use to determine significance?

---

## Modelling the distribution

##  Stating the hypothesis

`\begin{align}
H_0 &: P(X=35) = 0.5 \\
H_a &: P(X=35) > 0.5
\end{align}`

---

## Statistical test

```r
binom.test(35, 50, 0.5, alternative="greater")
```

```
## 
## 	Exact binomial test
## 
## data:  35 and 50
## number of successes = 35, number of trials = 50, p-value = 0.003
## alternative hypothesis: true probability of success is greater than 0.5
## 95 percent confidence interval:
##  0.5763 1.0000
## sample estimates:
## probability of success 
##                    0.7
```

---

---

---

---

.font2[
- Shaded region of `$\alpha$` determine the probability of getting a type I error
- On the other hand, `$\beta$` reflects the type II error
- However, the value of `$\beta$` depends on `$H_a$` distribution
]

???

- Assuming `$H_a$` coming from similar distribution as `$H_0$`, we just need to determine its parameter
- `$P$` as the parameter could be anything as long as `$P > 0.5$`
- For our convenience, we shall have `$P = 0.7$` to construct the second distribution

---

---

.column[.content.vmiddle.center[
<img src="https://mk0codingwithmaxskac.kinstacdn.com/wp-content/uploads/2019/12/type-1-error-type-2-statistical-power-comic.png" width="100%">
]]

{{content}}
]]

---

## Type I

- Incorrectly rejecting the `$H_0$`
- Reflected as `$\alpha \to$` shaded area to the right of `$H_0$` distribution
- A false positive

## Type II

- Incorrectly accepting the `$H_0$`
- Reflected as `$\beta \to$` shaded area to the left of `$H_a$` distribution
- A false negative

---

.column.bg-main2[.vmiddle.content[
- More on p-value
- Type of statistical error
- .amber[Power analysis]
- Equation in calculating sample size
- Random sampling
]]

---

# `$Power = 1 - \beta$`

.font2[
- Correctly reject the `$H_0$` when it is actualy false
- Prospective vs. retrospective?
- Help you determine the minimum required sample
]

???

- Retrospective: to see whether or not we have conducted a correct procedure to reject the `$H_0$`
- Prospective: to calculate a sufficient minimal sample size needed

# Caveats

.font2[
- Depends on formal methods to use
- Does not generalize well
- Give a best case scenario estimate
]

???

- There are other method to calculate the sample size
- We don't have to solely rely on power analysis

---

# Things to consider...

.font2.column.center.split-two[
  .row[.vmiddle.content[
Power
  ]]

.font2.column.center.split-two[
  .row[.vmiddle.content[
Effect size
  ]]

.font2.column.center.split-two[
  .row[.vmiddle.content[
Power
  ]]

.font2.column.center.split-two[
  .row.bg-main5[.amber.vmiddle.content[
Effect size
  ]]

???

- These four are inter-related
- Adjustment on one governs the others
- Each one is a function of another

---

# Effect size

.font2[
- .amber[Disclaimer:] this is just an overview, *not* an in-detailed explanation
- Effect size measures a true difference between two hypotheses
- Numerous conventions exists
- Higher effect size `$\to$` higher power 
- One of the most difficult to obtain!
]

---

# Obtaining an effect size

---

???

Literature review:

- Published articles may have done similar investigation on different population
- Use their data to estimate the desired effect size
- Meta-analysis technique is sometimes applicable to make a better estimate

---

???

Pilot study:

- By conducting a pilot study, we can get data reflecting our future study
- Time consuming, but giving a closer estimate
- A good chance to resolve any unanticipated issue

---

???

Cohen's recommendation:

- Depends on what formal test to use
- Separated into small, medium and large effect size

---

# Example, please?

---

<img src="index_files/figure-html/stat.err.eg4-1.png" width="90%" />
???

We will re-examine our last example on a novel cancer drug

---

.font2[
`\begin{align}
sig &= x:P(X=1-\alpha\ |\ n, H_0) \\
\beta &= P(X \leqslant sig\ |\ n, H_1) \\
Power &= 1 - \beta
\end{align}`
]

???

We can calculate power when we know the probability function *and* its parameters

---

```r
# Set H0, sample size, significance level (alpha)
h0 <- 0.5; size <- 50; alpha.rate <- 0.05

# Find significance value
alpha.value <- qbinom(1 - alpha.rate, size, prob=h0) %T>% print()
```

```
## [1] 31
```

```r
# Determine H1
h1 <- 0.7

# Calculate beta
beta.value <- dbinom(0:alpha.value, size, prob=h1) %>% sum() %T>% print()
```

```
## [1] 0.1406
```

```r
# Calculate power
1 - beta.value
```

```
## [1] 0.8594
```

---

???

- Of course we can "reverse engineer" the calculation to obtain required sample size using known power
- But the math can be quite... challenging :)
- Thankfully, we have some ready-to-use packages to do the computation for us (yay to them!)

---

.column.bg-main2[.vmiddle.content[
- More on p-value
- Type of statistical error
- Power analysis as a measure of `$\alpha$` and `$\beta$`
- .amber[Equation in calculating sample size]
- Random sampling
]]

---

# Equation in calculating sample size

.font2[
- As in calculating effect sizes, we have numerous equations to apply
- None fits all size, depends on our research context
- We will see popular ones used in general and biomedical science
]

---

# General equation

---

.font2[
`$n$`: Number of minimal sample size  
`$Z_{1 - \frac{\alpha}{2}}$`: Significance value in a standardized normal distribution  
`$Z_{1-\beta}$`: Power value in a standardized normal distribution  
`$ES$`: Effect size
]

???

For different purposes, we need different effect size estimation

---

## Dichotomous outcome, one sample

---

## Dichotomous outcome, two independent samples

---

## Continuous outcome, one sample

---

## Continuous outcome, two independent samples

---

## Continuous outcome, two matched samples

---

# Problems

.font2[
- Different study design may require different solution
- Different field of knowledge has its own preferences
- What do we do as biomedical scientists?
]

J. Charan and T. Biswas. .amber[“How to calculate sample size for different study designs in medical research?”] In: _Indian Journal of Psychological Medicine_ 35.2 (2013), p. 121. DOI: 10.4103/0253-7176.116232.

---

# Cross-sectional

## Qualitative variable

`$$n = \frac{Z_{1-\frac{\alpha}{2}}^2 \cdot p (1-p)}{d^2}$$`

## Quantitative variable

`$$n = \frac{Z_{1-\frac{\alpha}{2}}^2 \cdot \sigma^2}{d^2}$$`

`$Z_{1 - \frac{\alpha}{2}}$`: Significance value in a standardized normal distribution  
`$d$`: Absolute error as determined by the researcher  
`$p$`: Estimated proportion  
`$\sigma$`: Standard deviation

???

Statistics obtained from literature review or a pilot study

---

# Case-control

## Qualitative variable

`$$n = \frac{r+1}{r} \frac{(p^*)(1-p^*)(Z_{\beta} + Z_{\frac{\alpha}{2}})^2}{(p_1 - p_2)^2}$$`

## Quantitative variable

`$$n = \frac{r+1}{r} \frac{\sigma^2(Z_{\beta} + Z_{\frac{\alpha}{2}})^2}{(p_1 - p_2)^2}$$`

`$r$`: Ratio of control to case  
`$p^*$`: Average of exposed samples proportion  
`$\sigma$`: Standard deviation from previous publication  
`$p_1 - p_2$`: Difference in proportion as previously reported  
`$Z_{\beta}$`: `$\beta$` value in a standardized normal distribution

???

`$\beta$` value depends on power, i.e. 0.84 for 80% of power and 1.28 for 90%

---

# Clinical trial / experimental

## Qualitative variable

`$$n = \frac{2 P(1-P) \cdot (Z_{\frac{\alpha}{2}} + Z_{\beta})^2}{(p_1-p_2)^2}$$`

## Quantitative variable

`$$n = \frac{2\sigma^2 \cdot (Z_{\frac{\alpha}{2}} + Z_{\beta})^2}{d^2}$$`

`$\sigma$`: Standard deviation from previous publication  
`$P$`: Pooled prevalence from both groups  
`$p_1 - p_2$`: Difference in proportion as previously reported

---

.column.bg-main2[.vmiddle.content[
- More on p-value
- Type of statistical error
- Power analysis as a measure of `$\alpha$` and `$\beta$`
- Equation in calculating sample size
- .amber[Random sampling]
]]

---

# Random sampling

## Non-Probability

## Probability

---

# Non-Probability random sampling

## Convenience

- Based on availability
- Representativeness is unknown
- Useful in preliminary study

## Quota

- As in convenient sampling
- We set the desired proportion of our sample
- Proportion based on specific criteria, e.g. age, sex, etc.

---

# Probability random sampling

## Simple

- Random sample from a list of all subjects in a population
- Each subject has an equal chance to participate
- Useful in a small population

## Systematic

- Subject selection not entirely random
- As in random sampling, requires an enumeration of all subjects
- Systematically select the subject based on a certain criteria, e.g. every `$n_{th}$` subject

## Stratified / cluster

- Split subjects into stratified / clustered groups
- Do random sampling from each group
- Stratified `$\to$` preserves ordinality, i.e. the order is important

---

.amber.font5[Query?]