+ - 0:00:00
Notes for current slide
Notes for next slide

Hypothesis Test: Proportional Difference

Aly Lamuri
Indonesia Medical Education and Research Institute

1 / 19

Overview

  • Proportional difference
  • Exact test
  • Approximation
  • Paired sample
  • Applying Yates' correction
1 / 19

Proportional difference

  • Concept recall: proportion in population and sample?
  • So far, we relied on the binomial test
  • It is an exact measure of one proportion
  • Another test to consider: proportion test
1 / 19
  • An exact measure: we exactly measure the p-value
  • It is computationally demanding
  • Hard to conduct with a large sample size
  • In such cases, we may want to choose approximation

Proportional difference

  • Concept recall: proportion in population and sample?
  • So far, we relied on the binomial test
  • It is an exact measure of one proportion
  • Another test to consider: proportion test

Example?

LetXB(n,p) Test the probability of having: P(X=6 | 10,0.5)

1 / 19

Proportional difference

  • Concept recall: proportion in population and sample?
  • So far, we relied on the binomial test
  • It is an exact measure of one proportion
  • Another test to consider: proportion test

Example?

LetXB(n,p) Test the probability of having: P(X=6 | 10,0.5)

H0:P(X=6)=0.5Ha:P(X=6)0.5

1 / 19

Proportional difference

  • Concept recall: proportion in population and sample?
  • So far, we relied on the binomial test
  • It is an exact measure of one proportion
  • Another test to consider: proportion test
Binomial test
estimate statistic p.value parameter conf.low conf.high method alternative
0.6 6 0.754 10 0.262 0.878 Exact binomial test two.sided
Proportion test
estimate statistic p.value parameter conf.low conf.high method alternative
0.6 0.1 0.752 1 0.274 0.863 1-sample proportions test with continuity correction two.sided
1 / 19

Proportional difference

  • Concept recall: proportion in population and sample?
  • So far, we relied on the binomial test
  • It is an exact measure of one proportion
  • Another test to consider: proportion test

Another example?

LetXB(n,p) Test the probability of having: P(X=60 | 100,0.5)

1 / 19

Proportional difference

  • Concept recall: proportion in population and sample?
  • So far, we relied on the binomial test
  • It is an exact measure of one proportion
  • Another test to consider: proportion test

Another example?

LetXB(n,p) Test the probability of having: P(X=60 | 100,0.5)

H0:P(X=60)=0.5Ha:P(X=60)0.5

1 / 19

Proportional difference

  • Concept recall: proportion in population and sample?
  • So far, we relied on the binomial test
  • It is an exact measure of one proportion
  • Another test to consider: proportion test
Binomial test
estimate statistic p.value parameter conf.low conf.high method alternative
0.6 60 0.057 100 0.497 0.697 Exact binomial test two.sided
Proportion test
estimate statistic p.value parameter conf.low conf.high method alternative
0.6 3.61 0.057 1 0.497 0.695 1-sample proportions test with continuity correction two.sided
1 / 19

Proportional difference

  • Concept recall: proportion in population and sample?
  • So far, we relied on the binomial test
  • It is an exact measure of one proportion
  • Another test to consider: proportion test

What do we learn?

  • With a low sample size, an exact test is more appropriate
  • When sample size n: approximation gives closer estimates
  • An approximation relies on lower computational power
1 / 19

But...

  • Often we are more interested in multiple variables
  • We may want to see proportional differences in multiple groups
  • In such cases, neither binomial test nor proportion test can help us!
2 / 19

But...

  • Often we are more interested in multiple variables
  • We may want to see proportional differences in multiple groups
  • In such cases, neither binomial test nor proportion test can help us!

What can we do?

  • Visualize our problem as a contingency table
  • Use a more appropriate statistical test:
    • Fisher's exact test
    • Pearson's Chi-square
2 / 19
  • Remember the last time we talked about Chi-square distribution?
  • We'll use a lot of them in later sections :)

Contingency table

  • A table outlining our problem :)
  • Each element represents a count of variables in our interest

How does it look like?

3 / 19

Fun fact: The contingency table is also called a cross tabulation

Contingency table

  • A table outlining our problem :)
  • Each element represents a count of variables in our interest

How does it look like?


Outcome 1 Outcome 2
Exposure 1 a b
Exposure 2 c d

3 / 19

Fun fact: The contingency table is also called a cross tabulation

Contingency table

  • A table outlining our problem :)
  • Each element represents a count of variables in our interest

Example?

3 / 19

Contingency table

  • A table outlining our problem :)
  • Each element represents a count of variables in our interest

Example?

We are conducting a market research in Jakarta, where we aim to see how people express their preferences in choosing chain store outlets. We categorized participants based on their place of residency, i.e. in suburban and urban area. The mini-market chain of our interest would be Indomaret and Alfamart. We observed 30 out of 50 respondents in suburban area choose Indomaret, compared to 20 out of 50 respondents in urban area.

3 / 19

Contingency table

  • A table outlining our problem :)
  • Each element represents a count of variables in our interest


Indomaret Alfamart
Suburban 30 20
Urban 20 30

3 / 19

Contingency table

  • A table outlining our problem :)
  • Each element represents a count of variables in our interest


Indomaret Alfamart
Suburban 30 20
Urban 20 30

How do we test for differences?

3 / 19
  • We can do exact test and approximation
  • Fisher's exact test
  • Pearson's Chi-square (Approximation)
  • We will see what limitation each approach has and their use cases

Overview

  • Proportional difference
  • Exact test
  • Approximation
  • Paired sample
  • Applying Yates' correction
3 / 19

Fisher's exact test

  • Follows a hypergeometric distribution
  • Concept recall: what is a geometric distribution?
  • Extending previous concepts: what is a hypergeometric distribution?
4 / 19
  • Geometric distribution: get 1 success after n number of trials with replacement
  • Hypergeometric distribution: get k successes after n number of trials without replacement
  • We shall see the hypergeometric distribution as an extension to binomial distribution
  • In binomial distribution, we only consider identical probability (probability of an event with replacement)

Fisher's exact test

  • Follows a hypergeometric distribution
  • Concept recall: what is a geometric distribution?
  • Extending previous concepts: what is a hypergeometric distribution?

How do we formulate the hypothesis?

4 / 19
  • Geometric distribution: get 1 success after n number of trials with replacement
  • Hypergeometric distribution: get k successes after n number of trials without replacement
  • We shall see the hypergeometric distribution as an extension to binomial distribution
  • In binomial distribution, we only consider identical probability (probability of an event with replacement)

Fisher's exact test

  • Follows a hypergeometric distribution
  • Concept recall: what is a geometric distribution?
  • Extending previous concepts: what is a hypergeometric distribution?

How do we formulate the hypothesis?

H0:p1^=p2^Ha:p1^p2^

4 / 19
  • Geometric distribution: get 1 success after n number of trials with replacement
  • Hypergeometric distribution: get k successes after n number of trials without replacement
  • We shall see the hypergeometric distribution as an extension to binomial distribution
  • In binomial distribution, we only consider identical probability (probability of an event with replacement)
  • pi^: Proportion in group i

How do we calculate the probability?

(1)P=(a+ba)(a+bb)(na+b)(2)=(c+dc)(c+dd)(nc+d)(3)=(a+b)! (c+d)! (a+c)! (b+d)!a! b! c! d! n!n=a+b+c+d

5 / 19

You may choose any of those equations

In code, please?

fisher.eq <- function(abcd) { # abcd is a list of 4 elements
a <- abcd[1]; b <- abcd[2]; c <- abcd[3]; d <- abcd[4]
choose(a+b, a) * choose(a+b, b) / choose(a+b+c+d, a+b)
}
6 / 19

In code, please?

fisher.eq <- function(abcd) { # abcd is a list of 4 elements
a <- abcd[1]; b <- abcd[2]; c <- abcd[3]; d <- abcd[4]
choose(a+b, a) * choose(a+b, b) / choose(a+b+c+d, a+b)
}

Let's solve our case!


Indomaret Alfamart
Suburban 30 20
Urban 20 30

6 / 19
  • Fisher's is an exact test
  • Which mean, we need take ALL possible outcomes into account

Fisher's equation solution



6 / 19

Calculate the p-value

One-tailed test

sum(tbl$probability)
## [1] 0.0357
7 / 19

Calculate the p-value

One-tailed test

sum(tbl$probability)
## [1] 0.0357

Two-tailed test

sum(tbl$probability) * 2
## [1] 0.0713
7 / 19

Let R do the hard stuff for us

One-tailed test

fisher.test(survey, alternative="greater")
##
## Fisher's Exact Test for Count Data
##
## data: survey
## p-value = 0.04
## alternative hypothesis: true odds ratio is greater than 1
## 95 percent confidence interval:
## 1.06 Inf
## sample estimates:
## odds ratio
## 2.23
7 / 19

Let R do the hard stuff for us

Two-tailed test

fisher.test(survey, alternative="two.sided")
##
## Fisher's Exact Test for Count Data
##
## data: survey
## p-value = 0.07
## alternative hypothesis: true odds ratio is not equal to 1
## 95 percent confidence interval:
## 0.94 5.41
## sample estimates:
## odds ratio
## 2.23
7 / 19

A wild homework has appeared!

Perform Fisher's exact test on following scenario:

  • a: 40
  • b: 15
  • c: 15
  • d: 20

Task:

  • Find the p-value for one-tailed test
  • Find the p-value for two-tailed test
8 / 19

A wild homework has appeared!

Perform Fisher's exact test on following scenario:

  • a: 40
  • b: 15
  • c: 15
  • d: 20

Task:

  • Find the p-value for one-tailed test
  • Find the p-value for two-tailed test

Rules:

  • Apply Fisher's equation to solve the problem
  • You may use calculator or code on your own
  • Present me the table of your calculation
  • Do not use pre-existing package! (numpy is allowed though)
8 / 19

Can you get similar solution when computing by hands?

c(40, 15, 15, 20) %>% matrix(nrow=2) %>% fisher.test()
##
## Fisher's Exact Test for Count Data
##
## data: .
## p-value = 0.007
## alternative hypothesis: true odds ratio is not equal to 1
## 95 percent confidence interval:
## 1.33 9.58
## sample estimates:
## odds ratio
## 3.5
8 / 19

Overview

  • Proportional difference
  • Exact test
  • Approximation
  • Paired sample
  • Applying Yates' correction
8 / 19

Approximating Fisher's solution

  • There are several approaches we may follow
  • Pearson's Chi-square and G-test are popular ones
  • We will only look into Chi-square
9 / 19
  • Different method of Chi-square computation exists
  • We have goodness of fit and test of independence
  • Choose your method wisely

Why an approximation?

  • As we have seen, an exact calculation is arduous
  • Larger sample size requires a higher computational power
  • And it often applies only for 2×2 contingency table
  • An approximation is more flexible
  • It can do even an m×n contingency table
9 / 19

Chi-square test of independence

statχ2(k)

  • Statistical computation follows a Chi-square distribution
  • Degree of freedom k depends on the number of classes X,Y
10 / 19

Chi-square test of independence

statχ2(k)

  • Statistical computation follows a Chi-square distribution
  • Degree of freedom k depends on the number of classes X,Y

Example

  • Outcome 1: 2 classes, outcome 2: 2 classes k=1
  • Outcome 1: 2 classes, outcome 2: 3 classes k=2
  • Outcome 1: 3 classes, outcome 2: 3 classes k=4
10 / 19

Calculating Chi-square

χ2=i,j(OijEij)2EijEij=OiOjOi+Oj

O: Observed outcome
E: Expected outcome
i,j: Elements in the contingency table

11 / 19
Outcome 1 Outcome 2
Exposure 1 a b
Exposure 2 c d

Calculating Expected outcome

Eij=OiOjOi+Oj

E11=(a+b)(a+c)a+b+c+d

11 / 19
Outcome 1 Outcome 2
Exposure 1 a b
Exposure 2 c d

Calculating Expected outcome

Eij=OiOjOi+Oj

E12=(a+b)(b+d)a+b+c+d

11 / 19
Outcome 1 Outcome 2
Exposure 1 a b
Exposure 2 c d

Calculating Expected outcome

Eij=OiOjOi+Oj

E21=(c+d)(a+c)a+b+c+d

11 / 19
Outcome 1 Outcome 2
Exposure 1 a b
Exposure 2 c d

Calculating Expected outcome

Eij=OiOjOi+Oj

E22=(c+d)(b+d)a+b+c+d

11 / 19
Indomaret Alfamart
Suburban 30 20
Urban 20 30

Calculating Expected outcome

Eij=OiOjOi+OjE11=25E12=25E21=25E22=25

12 / 19
Indomaret Alfamart
Suburban 30 20
Urban 20 30

Calculating χ2 statistics

χ2=i,j(OijEij)2Eij=(3025)225+(2025)225+(3025)225+(2025)225=4

13 / 19
Indomaret Alfamart
Suburban 30 20
Urban 20 30

Determining p-value

1 - pchisq(4, df=1)
## [1] 0.0455
13 / 19
Indomaret Alfamart
Suburban 30 20
Urban 20 30

Built-in function in R

chisq.test(survey, correct=FALSE)$p.value
## [1] 0.0455
13 / 19

Overview

  • Proportional difference
  • Exact test
  • Approximation
  • Paired sample
  • Applying Yates' correction
13 / 19

Paired samples in the contingency table

  • What is a paired sample?
  • Why use different measure?
  • How do we solve it?
14 / 19
  • When we do a longitudinal study
  • We have the same sample, but measured in different time
  • We need to take into account differences occurring overtime
  • Pearson's Chi-square could not address this issue
  • Solution: McNemar's Chi-square

Paired samples in the contingency table

  • What is a paired sample?
  • Why use different measure?
  • How do we solve it?

Example?

Suppose we continue our market research, where we ask exactly same subjects three months later. We expected no changes in their preferences of chain-store outlets. It turned out, regardless of their area of residence, 25 people who previously preferred go to Indomaret now shop in Alfamart. Meanwhile, 20 people who used to visit Alfamart now prefer Indomaret.

14 / 19
  • When we do a longitudinal study
  • We have the same sample, but measured in different time
  • We need to take into account differences occurring overtime
  • Pearson's Chi-square could not address this issue
  • Solution: McNemar's Chi-square

Contingency table


Indomaret Alfamart
Indomaret 25 25
Alfamart 20 30


Hypothesis

H0:pt0^=pt1^H1:pt0^pt1^

15 / 19

McNemar's Chi-square

χ2=(bc)2b+c

16 / 19

McNemar's Chi-square

χ2=(bc)2b+c

mcnemar.test(survey2)
##
## McNemar's Chi-squared test with continuity correction
##
## data: survey2
## McNemar's chi-squared = 0.4, df = 1, p-value = 0.6
16 / 19

Overview

  • Proportional difference
  • Exact test
  • Approximation
  • Paired sample
  • Applying Yates' correction
17 / 19

Yates' correction

  • Only applied to approximation test
  • Alleviates bias in a 2×2 contingency table
  • Especially useful when having low count (< 10)
  • In extremely low sample count (< 5), use an exact test instead
18 / 19

Yates' correction

  • Only applied to approximation test
  • Alleviates bias in a 2×2 contingency table
  • Especially useful when having low count (< 10)
  • In extremely low sample count (< 5), use an exact test instead

χ2=i,j(|OijEij|0.5)2Eij

18 / 19

Lesson learnt

  • Large sample (> 10 in each cell) use approximation
  • Low sample use an exact test
  • 2×2 contingency with approximation apply Yates' correction
  • Low sample with m×n contingency table split or do simulation
19 / 19

Query?

19 / 19

Overview

  • Proportional difference
  • Exact test
  • Approximation
  • Paired sample
  • Applying Yates' correction
1 / 19
Paused

Help

Keyboard shortcuts

, , Pg Up, k Go to previous slide
, , Pg Dn, Space, j Go to next slide
Home Go to first slide
End Go to last slide
Number + Return Go to specific slide
b / m / f Toggle blackout / mirrored / fullscreen mode
c Clone slideshow
p Toggle presenter mode
t Restart the presentation timer
?, h Toggle this help
Esc Back to slideshow