Sample Size and Statistical Power
Aly Lamuri
Indonesia Medical Education and Research Institute
Overview
Let's revisit our last example of a coin toss
set.seed(1)coin <- sample(c("H", "T"), 10, replace=TRUE, prob=rep(1/2, 2)) %T>% print()
## [1] "T" "T" "H" "H" "T" "H" "H" "H" "H" "T"Let's revisit our last example of a coin toss
set.seed(1)coin <- sample(c("H", "T"), 10, replace=TRUE, prob=rep(1/2, 2)) %T>% print()
## [1] "T" "T" "H" "H" "T" "H" "H" "H" "H" "T"Let's revisit our last example of a coin toss
set.seed(1)coin <- sample(c("H", "T"), 10, replace=TRUE, prob=rep(1/2, 2)) %T>% print()
## [1] "T" "T" "H" "H" "T" "H" "H" "H" "H" "T"H as our outcome of interestLet's revisit our last example of a coin toss
set.seed(1)coin <- sample(c("H", "T"), 10, replace=TRUE, prob=rep(1/2, 2)) %T>% print()
## [1] "T" "T" "H" "H" "T" "H" "H" "H" "H" "T"H as our outcome of interest...but, does it?
binom.test(x=sum(coin == "H"), n=length(coin), p=0.5)
## ## Exact binomial test## ## data: sum(coin == "H") and length(coin)## number of successes = 6, number of trials = 10, p-value = 0.8## alternative hypothesis: true probability of success is not equal to 0.5## 95 percent confidence interval:## 0.2624 0.8784## sample estimates:## probability of success ## 0.6binom.test(x=sum(coin == "H"), n=length(coin), p=0.5)
## ## Exact binomial test## ## data: sum(coin == "H") and length(coin)## number of successes = 6, number of trials = 10, p-value = 0.8## alternative hypothesis: true probability of success is not equal to 0.5## 95 percent confidence interval:## 0.2624 0.8784## sample estimates:## probability of success ## 0.6binom.test(x=sum(coin == "H"), n=length(coin), p=0.5)
## ## Exact binomial test## ## data: sum(coin == "H") and length(coin)## number of successes = 6, number of trials = 10, p-value = 0.8## alternative hypothesis: true probability of success is not equal to 0.5## 95 percent confidence interval:## 0.2624 0.8784## sample estimates:## probability of success ## 0.6binom.test(x=sum(coin == "H"), n=length(coin), p=0.5)
## ## Exact binomial test## ## data: sum(coin == "H") and length(coin)## number of successes = 6, number of trials = 10, p-value = 0.8## alternative hypothesis: true probability of success is not equal to 0.5## 95 percent confidence interval:## 0.2624 0.8784## sample estimates:## probability of success ## 0.6binom.test(x=sum(coin == "H"), n=length(coin), p=0.5)
## ## Exact binomial test## ## data: sum(coin == "H") and length(coin)## number of successes = 6, number of trials = 10, p-value = 0.8## alternative hypothesis: true probability of success is not equal to 0.5## 95 percent confidence interval:## 0.2624 0.8784## sample estimates:## probability of success ## 0.6binom.test(x=sum(coin == "H"), n=length(coin), p=0.5)
## ## Exact binomial test## ## data: sum(coin == "H") and length(coin)## number of successes = 6, number of trials = 10, p-value = 0.8## alternative hypothesis: true probability of success is not equal to 0.5## 95 percent confidence interval:## 0.2624 0.8784## sample estimates:## probability of success ## 0.6Question: What's the probability of having 6 H out of 10 Bernoulli
trials? Is it < 5%?

P(X=6):X∼B(10,0.5)
dbinom(6, 10, 0.5)
## [1] 0.2051We can manually calculate the p-value as the sum of P(X⩾6)
2 * (dbinom(6:10, 10, 0.5) %>% sum())
## [1] 0.7539Question: How if we preserve the ratio of event (3:5) using more trials?

P(X=60):X∼B(100,0.5)
dbinom(60, 100, 0.5)
## [1] 0.01084And we the p-value would be:
2 * (dbinom(60:100, 100, 0.5) %>% sum())
## [1] 0.05689Question: We preserved the ratio, why has the probability changed?
Overview
Suppose we are conducting a study on a potential cancer therapy. We knew giving the patient a placebo may affect their recovery rate by 50%. We are certain giving the new treatment will increase the probability. Tested on 50 patients, 35 showed signs of better quality of life.
Suppose we are conducting a study on a potential cancer therapy. We knew giving the patient a placebo may affect their recovery rate by 50%. We are certain giving the new treatment will increase the probability. Tested on 50 patients, 35 showed signs of better quality of life.
Cured∼B(50,0.5)
Cured∼B(50,0.5)
H0:P(X=35)=0.5Ha:P(X=35)>0.5
binom.test(35, 50, 0.5, alternative="greater")
## ## Exact binomial test## ## data: 35 and 50## number of successes = 35, number of trials = 50, p-value = 0.003## alternative hypothesis: true probability of success is greater than 0.5## 95 percent confidence interval:## 0.5763 1.0000## sample estimates:## probability of success ## 0.7



Overview
Power
Sample size
Effect size
Alpha
Power
Sample size
Effect size
Alpha
Literature review:
Pilot study:
Cohen's recommendation:

We will re-examine our last example on a novel cancer drug
Let X∼B(n,p)
Let X∼B(n,p)
sig=x:P(X=1−α | n,H0)β=P(X⩽sig | n,H1)Power=1−β
We can calculate power when we know the probability function and its parameters
# Set H0, sample size, significance level (alpha)h0 <- 0.5; size <- 50; alpha.rate <- 0.05# Find significance valuealpha.value <- qbinom(1 - alpha.rate, size, prob=h0) %T>% print()
## [1] 31# Determine H1h1 <- 0.7# Calculate betabeta.value <- dbinom(0:alpha.value, size, prob=h1) %>% sum() %T>% print()
## [1] 0.1406# Calculate power1 - beta.value
## [1] 0.8594
Overview
n=(Z1−α2+Z1−βES)2
n: Number of minimal sample size
Z1−α2: Significance value in a standardized normal distribution
Z1−β: Power value in a standardized normal distribution
ES: Effect size
For different purposes, we need different effect size estimation
n=(Z1−α2+Z1−βES)2
H0:p=p0ES=p1−p0√p(1−p)
n=(Z1−α2+Z1−βES)2
H0:p1=p2ES=|p1=p2|√p(1−p)
n=(Z1−α2+Z1−βES)2
H0:μ=μ0ES=|μ1=μ0|σ
n=(Z1−α2+Z1−βES)2
H0:μ1=μ2ES=|μ1=μ2|σ
n=(Z1−α2+Z1−βES)2
H0:μd=0ES=μdσd
J. Charan and T. Biswas. “How to calculate sample size for different study designs in medical research?” In: Indian Journal of Psychological Medicine 35.2 (2013), p. 121. DOI: 10.4103/0253-7176.116232.
n=Z21−α2⋅p(1−p)d2
n=Z21−α2⋅σ2d2
Z1−α2: Significance value in a standardized normal distribution
d: Absolute error as determined by the researcher
p: Estimated proportion
σ: Standard deviation
Statistics obtained from literature review or a pilot study
n=r+1r(p∗)(1−p∗)(Zβ+Zα2)2(p1−p2)2
n=r+1rσ2(Zβ+Zα2)2(p1−p2)2
r: Ratio of control to case
p∗: Average of exposed samples proportion
σ: Standard deviation from previous publication
p1−p2: Difference in proportion as previously reported
Zβ: β value in a standardized normal distribution
β value depends on power, i.e. 0.84 for 80% of power and 1.28 for 90%
n=2P(1−P)⋅(Zα2+Zβ)2(p1−p2)2
n=2σ2⋅(Zα2+Zβ)2d2
σ: Standard deviation from previous publication
P: Pooled prevalence from both groups
p1−p2: Difference in proportion as previously reported
Overview
Query?
Keyboard shortcuts
| ↑, ←, Pg Up, k | Go to previous slide |
| ↓, →, Pg Dn, Space, j | Go to next slide |
| Home | Go to first slide |
| End | Go to last slide |
| Number + Return | Go to specific slide |
| b / m / f | Toggle blackout / mirrored / fullscreen mode |
| c | Clone slideshow |
| p | Toggle presenter mode |
| t | Restart the presentation timer |
| ?, h | Toggle this help |
| Esc | Back to slideshow |