+ - 0:00:00
Notes for current slide
Notes for next slide

A Gentle Introduction to Biostatistics

Aly Lamuri
Indonesia Medical Education and Research Institute

1 / 27

About me

Aly Lamuri

FKUI 2011

Newcastle 2018

Academic writer

Research assistant

1 / 27

About me

Aly Lamuri

FKUI 2011

Newcastle 2018

Academic writer

Research assistant

1 / 27

About me

Aly Lamuri

FKUI 2011

Newcastle 2018

Academic writer

Research assistant

1 / 27

About me

Aly Lamuri

FKUI 2011

Newcastle 2018

Academic writer

Research assistant

1 / 27

About me

Aly Lamuri

FKUI 2011

Newcastle 2018

Academic writer

Research assistant

1 / 27

Outline

  • Lecture overview
  • Parameters in population
  • Statistics in acquired samples
  • Descriptive statistics
1 / 27

Overview

  • Three hours of lecture? Please, no.
  • Session: lecture, Q&A, journal discussion
  • Fourteen sessions in total

Aim

Understand the basic of statistics

Objectives

  • Types of data and distribution
  • Test of differences
  • Correlation
  • Simple linear model
2 / 27
  • Encourage active participation (+icebreaking)
  • Use zoom feature: raise hand, voice chat, etc
  • Kahoot!

Overview

  • Three hours of lecture? Please, no.
  • Session: lecture, Q&A, journal discussion
  • Fourteen sessions in total

Aim

Understand the basic of statistics

Objectives

  • Types of data and distribution
  • Test of differences
  • Correlation
  • Simple linear model
2 / 27

Overview

  • Three hours of lecture? Please, no.
  • Session: lecture, Q&A, journal discussion
  • Fourteen sessions in total

Aim

Understand the basic of statistics

Objectives

  • Types of data and distribution
  • Test of differences
  • Correlation
  • Simple linear model
2 / 27

Agreement

  • Attend all lectures (at minimum: 80%)
3 / 27

Agreement

  • Attend all lectures (at minimum: 80%)
  • Be on time, even if you are late make it no longer than 10 minutes
3 / 27

Agreement

  • Attend all lectures (at minimum: 80%)
  • Be on time, even if you are late make it no longer than 10 minutes
  • Turn on the front camera during the beginning of each lecture.
3 / 27

Agreement

  • Attend all lectures (at minimum: 80%)
  • Be on time, even if you are late make it no longer than 10 minutes
  • Turn on the front camera during the beginning of each lecture. And smile, because I'm taking a screenshot of your attendance :)
3 / 27

Agreement

  • Attend all lectures (at minimum: 80%)
  • Be on time, even if you are late make it no longer than 10 minutes
  • Turn on the front camera during the beginning of each lecture. And smile, because I'm taking a screenshot of your attendance :)
  • Pay attention during classes.
3 / 27

Agreement

  • Attend all lectures (at minimum: 80%)
  • Be on time, even if you are late make it no longer than 10 minutes
  • Turn on the front camera during the beginning of each lecture. And smile, because I'm taking a screenshot of your attendance :)
  • Pay attention during classes. I may randomly ask you a question
3 / 27

Agreement

  • Attend all lectures (at minimum: 80%)
  • Be on time, even if you are late make it no longer than 10 minutes
  • Turn on the front camera during the beginning of each lecture. And smile, because I'm taking a screenshot of your attendance :)
  • Pay attention during classes. I may randomly ask you a question
  • Actively participate in class
3 / 27

Agreement

  • Attend all lectures (at minimum: 80%)
  • Be on time, even if you are late make it no longer than 10 minutes
  • Turn on the front camera during the beginning of each lecture. And smile, because I'm taking a screenshot of your attendance :)
  • Pay attention during classes. I may randomly ask you a question
  • Actively participate in class
  • You may ask for permission to temporarily leave the class.
3 / 27

Agreement

  • Attend all lectures (at minimum: 80%)
  • Be on time, even if you are late make it no longer than 10 minutes
  • Turn on the front camera during the beginning of each lecture. And smile, because I'm taking a screenshot of your attendance :)
  • Pay attention during classes. I may randomly ask you a question
  • Actively participate in class
  • You may ask for permission to temporarily leave the class. But please return, because the class is gonna miss you :(
3 / 27

Agreement

  • Attend all lectures (at minimum: 80%)
  • Be on time, even if you are late make it no longer than 10 minutes
  • Turn on the front camera during the beginning of each lecture. And smile, because I'm taking a screenshot of your attendance :)
  • Pay attention during classes. I may randomly ask you a question
  • Actively participate in class
  • You may ask for permission to temporarily leave the class. But please return, because the class is gonna miss you :(
  • Turn on the front camera during presentation
3 / 27

Agreement

  • Attend all lectures (at minimum: 80%)
  • Be on time, even if you are late make it no longer than 10 minutes
  • Turn on the front camera during the beginning of each lecture. And smile, because I'm taking a screenshot of your attendance :)
  • Pay attention during classes. I may randomly ask you a question
  • Actively participate in class
  • You may ask for permission to temporarily leave the class. But please return, because the class is gonna miss you :(
  • Turn on the front camera during presentation
  • Do the assignment
3 / 27

HBU?

4 / 27
  • What's your expectation in attending this biostatistics course?

Outline

  • Lecture overview
  • Parameters in population
  • Statistics in acquired samples
  • Descriptive statistics
4 / 27

Population

All observable subjects inhabiting a certain location


5 / 27

Population

All observable subjects inhabiting a certain location


Parameters

Quantitative summary of a population


5 / 27

Population

All observable subjects inhabiting a certain location


Parameters

Quantitative summary of a population


Be more specific, please?

5 / 27

Population

All observable subjects inhabiting a certain location


Parameters

Quantitative summary of a population


Be more specific, please?

Notation and meanings

X: Data element
N: Number of element
P: Proportion
M: Median
μ: Average
σ: Standard deviation
σ2: Variance
ρ: Correlation coefficient

5 / 27

Sample

A subset of an observable population


6 / 27

Sample

A subset of an observable population


Statistics

Quantitative summary of a sample


6 / 27

Sample

A subset of an observable population


Statistics

Quantitative summary of a sample


Be more specific, please?

6 / 27

Sample

A subset of an observable population


Statistics

Quantitative summary of a sample


Be more specific, please?

Notation and meanings

x: Data element
n: Number of element
p: Proportion
m: Median
x¯: Average
s: Standard deviation
s2: Variance
r: Correlation coefficient

6 / 27

Spot the differences!


Statistics Meanings Parameters
`x` Data element `X`
`n` Number of element `N`
`p` Proportion `P`
`m` Median `M`
`x¯` Average `μ`
`s` Standard deviation `σ`
`s2` Variance `σ2`
`r` Correlation coefficient `ρ`
7 / 27

Outline

  • Lecture overview
  • Parameters in population
  • Statistics in acquired samples
  • Descriptive statistics
7 / 27

Data Element

  • Suppose we have height data from a particular student group, denoted by X
8 / 27

Data Element

  • Suppose we have height data from a particular student group, denoted by X
  • Let X{x1,x2,...,xn}
8 / 27

Data Element

  • Suppose we have height data from a particular student group, denoted by X
  • Let X{x1,x2,...,xn}
  • Then each of {x1,x2,...,xn} is the corresponding element of the data
8 / 27

Data Element

  • Suppose we have height data from a particular student group, denoted by X
  • Let X{x1,x2,...,xn}
  • Then each of {x1,x2,...,xn} is the corresponding element of the data
  • Easy, right? Let's see some numbers.
8 / 27

Data Element

  • Suppose we have height data from a particular student group, denoted by X
  • Let X{x1,x2,...,xn}
  • Then each of {x1,x2,...,xn} is the corresponding element of the data
  • Easy, right? Let's see some numbers.
set.seed(1)
X <- rnorm(10, mean=160, sd=10)
print(X)
## [1] 153.7 161.8 151.6 176.0 163.3 151.8 164.9 167.4 165.8 156.9
8 / 27

Data Element

  • Suppose we have height data from a particular student group, denoted by X
  • Let X{x1,x2,...,xn}
  • Then each of {x1,x2,...,xn} is the corresponding element of the data
  • Easy, right? Let's see some numbers.
set.seed(1)
X <- rnorm(10, mean=160, sd=10)
print(X)
## [1] 153.7 161.8 151.6 176.0 163.3 151.8 164.9 167.4 165.8 156.9
print(X[7])
## [1] 164.9
8 / 27

Number of Element

  • From previous example, we understood x1...nX
9 / 27

Number of Element

  • From previous example, we understood x1...nX
  • ...and each x as an element data
9 / 27

Number of Element

  • From previous example, we understood x1...nX
  • ...and each x as an element data
  • But what about n?
9 / 27

Number of Element

  • From previous example, we understood x1...nX
  • ...and each x as an element data
  • But what about n?
  • Turns out, n is the total number of element!
9 / 27

Number of Element

  • From previous example, we understood x1...nX
  • ...and each x as an element data
  • But what about n?
  • Turns out, n is the total number of element!
  • A quick demo:
length(X)
## [1] 10
9 / 27

Proportion

  • From our synthetic data, we observed some people with a taller stature
10 / 27

Proportion

  • From our synthetic data, we observed some people with a taller stature
  • Let 165 be a threshold, where people above 165 cm considered as tall
10 / 27

Proportion

  • From our synthetic data, we observed some people with a taller stature
  • Let 165 be a threshold, where people above 165 cm considered as tall
  • Then we can compute the proportion of tall people compared to (ahem) not-so-tall people
10 / 27

Proportion

  • From our synthetic data, we observed some people with a taller stature
  • Let 165 be a threshold, where people above 165 cm considered as tall
  • Then we can compute the proportion of tall people compared to (ahem) not-so-tall people
  • So, we can simply conclude: p=gn, where:
    p: Proportion
    g: Group of interest
    n: Number of element
10 / 27

Proportion

  • From our synthetic data, we observed some people with a taller stature
  • Let 165 be a threshold, where people above 165 cm considered as tall
  • Then we can compute the proportion of tall people compared to (ahem) not-so-tall people
  • So, we can simply conclude: p=gn, where:
    p: Proportion
    g: Group of interest
    n: Number of element
  • So, how many tall people do we have in our data?
set.seed(1)
X <- rnorm(10, mean=160, sd=10)
print(X)
## [1] 153.7 161.8 151.6 176.0 163.3 151.8 164.9 167.4 165.8 156.9
10 / 27

Proportion

  • From our synthetic data, we observed some people with a taller stature
  • Let 165 be a threshold, where people above 165 cm considered as tall
  • Then we can compute the proportion of tall people compared to (ahem) not-so-tall people
  • So, we can simply conclude: p=gn, where:
    p: Proportion
    g: Group of interest
    n: Number of element
  • So, how many tall people do we have in our data?
set.seed(1)
X <- rnorm(10, mean=160, sd=10)
print(X)
## [1] 153.7 161.8 151.6 176.0 163.3 151.8 164.9 167.4 165.8 156.9
sum(X > 165) / length(X)
## [1] 0.3
10 / 27

Proportion

  • From our synthetic data, we observed some people with a taller stature
  • Let 165 be a threshold, where people above 165 cm considered as tall
  • Then we can compute the proportion of tall people compared to (ahem) not-so-tall people
  • So, we can simply conclude: p=gn, where:
    p: Proportion
    g: Group of interest
    n: Number of element
  • So, how many tall people do we have in our data?
set.seed(1)
X <- rnorm(10, mean=160, sd=10)
print(X)
## [1] 153.7 161.8 151.6 176.0 163.3 151.8 164.9 167.4 165.8 156.9
sum(X > 165) / length(X)
## [1] 0.3
10 / 27

Mean

  • Assume: data presents an even distribution
  • Meaning that, data has a roughly equal number of observation on both ends
  • We can calculate the average value as a general description, with:

x¯=1ni=1nxi

11 / 27

Mean

  • Assume: data presents an even distribution
  • Meaning that, data has a roughly equal number of observation on both ends
  • We can calculate the average value as a general description, with:

x¯=1ni=1nxi

  • In action:
sum(X) / length(X)
## [1] 161.3
11 / 27

Mean

  • Assume: data presents an even distribution
  • Meaning that, data has a roughly equal number of observation on both ends
  • We can calculate the average value as a general description, with:

x¯=1ni=1nxi

  • In action:
sum(X) / length(X)
## [1] 161.3
mean(X)
## [1] 161.3
11 / 27

Mean

  • Assume: data presents an even distribution
  • Meaning that, data has a roughly equal number of observation on both ends
  • We can calculate the average value as a general description, with:

x¯=1ni=1nxi

  • In action:
sum(X) / length(X)
## [1] 161.3
mean(X)
## [1] 161.3

Problem: Not all data distributed evenly

11 / 27

Mean

  • Assume: data presents an even distribution
  • Meaning that, data has a roughly equal number of observation on both ends
  • We can calculate the average value as a general description, with:

x¯=1ni=1nxi

On a quick glimpse:

11 / 27

Mean

  • Assume: data presents an even distribution
  • Meaning that, data has a roughly equal number of observation on both ends
  • We can calculate the average value as a general description, with:

x¯=1ni=1nxi

Solution: Use another measure median

11 / 27

Median

  • Sort data ascendingly
  • Find the mid point of such data
  • Median is the middle index
12 / 27

Median

  • Sort data ascendingly
  • Find the mid point of such data
  • Median is the middle index
  • Or, mathematically:

m={xn+12:n212(xn2+xn2+1):n2

12 / 27

Median

  • Sort data ascendingly
  • Find the mid point of such data
  • Median is the middle index
  • Or, mathematically:

m={xn+12:n212(xn2+xn2+1):n2 A quick demo:

sort(X)
## [1] 151.6 151.8 153.7 156.9 161.8 163.3 164.9 165.8 167.4 176.0
12 / 27

Median

  • Sort data ascendingly
  • Find the mid point of such data
  • Median is the middle index
  • Or, mathematically:

m={xn+12:n212(xn2+xn2+1):n2 A quick demo:

sort(X)
## [1] 151.6 151.8 153.7 156.9 161.8 163.3 164.9 165.8 167.4 176.0
median(X)
## [1] 162.6
12 / 27

Median

  • Sort data ascendingly
  • Find the mid point of such data
  • Median is the middle index
  • Or, mathematically:

m={xn+12:n212(xn2+xn2+1):n2 A quick demo:

sort(X)
## [1] 151.6 151.8 153.7 156.9 161.8 163.3 164.9 165.8 167.4 176.0
median(X)
## [1] 162.6
mean(X)
## [1] 161.3
12 / 27

Median

  • Sort data ascendingly
  • Find the mid point of such data
  • Median is the middle index
  • Or, mathematically:

m={xn+12:n212(xn2+xn2+1):n2 A quick demo:

sort(X)
## [1] 151.6 151.8 153.7 156.9 161.8 163.3 164.9 165.8 167.4 176.0
median(X)
## [1] 162.6
mean(X)
## [1] 161.3
12 / 27

Standard Deviation

  • In a simple term, deviation di tells you how far a data element i is from your mean value
  • So, we can calculate deviation as di=Xiμ
  • However, the deviation can be either negative or positive

Take our data as an example:

set.seed(1)
X <- rnorm(10, mean=160, sd=10)
print(X)
## [1] 153.7 161.8 151.6 176.0 163.3 151.8 164.9 167.4 165.8 156.9
13 / 27

Standard Deviation

  • In a simple term, deviation di tells you how far a data element i is from your mean value
  • So, we can calculate deviation as di=Xiμ
  • However, the deviation can be either negative or positive

Take our data as an example:

set.seed(1)
X <- rnorm(10, mean=160, sd=10)
print(X)
## [1] 153.7 161.8 151.6 176.0 163.3 151.8 164.9 167.4 165.8 156.9
d <- X - mean(X)
print(d, digits=2)
## [1] -7.59 0.51 -9.68 14.63 1.97 -9.53 3.55 6.06 4.44 -4.38
13 / 27

Standard Deviation

  • In a simple term, deviation di tells you how far a data element i is from your mean value
  • So, we can calculate deviation as di=Xiμ
  • However, the deviation can be either negative or positive

Take our data as an example:

set.seed(1)
X <- rnorm(10, mean=160, sd=10)
print(X)
## [1] 153.7 161.8 151.6 176.0 163.3 151.8 164.9 167.4 165.8 156.9
d <- X - mean(X)
print(d, digits=2)
## [1] -7.59 0.51 -9.68 14.63 1.97 -9.53 3.55 6.06 4.44 -4.38

Hard to find its general property!

13 / 27

Standard Deviation

  • In a simple term, deviation di tells you how far a data element i is from your mean value
  • So, we can calculate deviation as di=Xiμ
  • However, the deviation can be either negative or positive

Take our data as an example:

set.seed(1)
X <- rnorm(10, mean=160, sd=10)
print(X)
## [1] 153.7 161.8 151.6 176.0 163.3 151.8 164.9 167.4 165.8 156.9
d <- X - mean(X)
print(d, digits=2)
## [1] -7.59 0.51 -9.68 14.63 1.97 -9.53 3.55 6.06 4.44 -4.38

Hard to find its general property! Potential solution?

13 / 27

Standard Deviation

  • In a simple term, deviation di tells you how far a data element i is from your mean value
  • So, we can calculate deviation as di=Xiμ
  • However, the deviation can be either negative or positive

We can take the absolute value and compute the mean:

d¯=1Ni=1N|Xiμ|

13 / 27

Standard Deviation

  • In a simple term, deviation di tells you how far a data element i is from your mean value
  • So, we can calculate deviation as di=Xiμ
  • However, the deviation can be either negative or positive

We can take the absolute value and compute the mean:

d¯=1Ni=1N|Xiμ| A quick demo:

d <- abs(X - mean(X))
print(d, digits=2)
## [1] 7.59 0.51 9.68 14.63 1.97 9.53 3.55 6.06 4.44 4.38
d.bar <- mean(d)
print(d.bar, digits=2)
## [1] 6.2
13 / 27

Standard Deviation

  • In a simple term, deviation di tells you how far a data element i is from your mean value
  • So, we can calculate deviation as di=Xiμ
  • However, the deviation can be either negative or positive

We can take the absolute value and compute the mean:

d¯=1Ni=1N|Xiμ| A quick demo:

d <- abs(X - mean(X))
print(d, digits=2)
## [1] 7.59 0.51 9.68 14.63 1.97 9.53 3.55 6.06 4.44 4.38
d.bar <- mean(d)
print(d.bar, digits=2)
## [1] 6.2

Now, it's easier to report your findings as x¯±d¯

13 / 27

Standard Deviation

  • In a simple term, deviation di tells you how far a data element i is from your mean value
  • So, we can calculate deviation as di=Xiμ
  • However, the deviation can be either negative or positive

We can take the absolute value and compute the mean:

d¯=1Ni=1N|Xiμ| A quick demo:

d <- abs(X - mean(X))
print(d, digits=2)
## [1] 7.59 0.51 9.68 14.63 1.97 9.53 3.55 6.06 4.44 4.38
d.bar <- mean(d)
print(d.bar, digits=2)
## [1] 6.2

Now, it's easier to report your findings as x¯±d¯ , or numerically as 161.32 ± 6.23

13 / 27

Standard Deviation

  • In a simple term, deviation di tells you how far a data element i is from your mean value
  • So, we can calculate deviation as di=Xiμ
  • However, the deviation can be either negative or positive

We can take the absolute value and compute the mean:

d¯=1Ni=1N|Xiμ| A quick demo:

d <- abs(X - mean(X))
print(d, digits=2)
## [1] 7.59 0.51 9.68 14.63 1.97 9.53 3.55 6.06 4.44 4.38
d.bar <- mean(d)
print(d.bar, digits=2)
## [1] 6.2

Now, it's easier to report your findings as x¯±d¯ , or numerically as 161.32 ± 6.23 Yet, such a practice is uncommon to see.

13 / 27

Standard Deviation

  • In a simple term, deviation di tells you how far a data element i is from your mean value
  • So, we can calculate deviation as di=Xiμ
  • However, the deviation can be either negative or positive

Another alternative is to find the root-mean square, which define a standard deviation:

σ=1Ni=1N(Xiμ)2

13 / 27

Standard Deviation

  • In a simple term, deviation di tells you how far a data element i is from your mean value
  • So, we can calculate deviation as di=Xiμ
  • However, the deviation can be either negative or positive

Another alternative is to find the root-mean square, which define a standard deviation:

σ=1Ni=1N(Xiμ)2 A quick demo:

std.dev <- sqrt(sum({X - mean(X)}^2) / length(X))
print(std.dev)
## [1] 7.4
13 / 27

Standard Deviation

  • In a simple term, deviation di tells you how far a data element i is from your mean value
  • So, we can calculate deviation as di=Xiμ
  • However, the deviation can be either negative or positive

In statistics, we need to adjust the estimation by applying Bessel's correction.

13 / 27

Standard Deviation

  • In a simple term, deviation di tells you how far a data element i is from your mean value
  • So, we can calculate deviation as di=Xiμ
  • However, the deviation can be either negative or positive

In statistics, we need to adjust the estimation by applying Bessel's correction. Simply said, we find the mean by dividing into n1 instead of N.

s=1n1i=1n(xix¯)2

13 / 27

Standard Deviation

  • In a simple term, deviation di tells you how far a data element i is from your mean value
  • So, we can calculate deviation as di=Xiμ
  • However, the deviation can be either negative or positive

In statistics, we need to adjust the estimation by applying Bessel's correction. Simply said, we find the mean by dividing into n1 instead of N.

s=1n1i=1n(xix¯)2 A quick demo:

std.dev <- sqrt(sum({X - mean(X)}^2) / {length(X) - 1})
print(std.dev)
## [1] 7.8
sd(X) # Built-in function to calculate standard deviation
## [1] 7.8
13 / 27

Bessel's method applied to correct the bias in estimating population variance.

Variance

  • Another measure to estimate deviation
  • Akin to standard deviation, but without a root square
14 / 27

Variance

  • Another measure to estimate deviation
  • Akin to standard deviation, but without a root square
  • Computed as follow:

s2=1n1i=1n(xix¯)2

14 / 27

Variance

  • Another measure to estimate deviation
  • Akin to standard deviation, but without a root square
  • Computed as follow:

s2=1n1i=1n(xix¯)2 Importance:

  • In making further inference
  • More advanced statistical model
14 / 27

Variance

  • Another measure to estimate deviation
  • Akin to standard deviation, but without a root square
  • Computed as follow:

s2=1n1i=1n(xix¯)2 Importance:

  • In making further inference
  • More advanced statistical model (outside the scope of this lecture, sorry!)
14 / 27

Quantile

  • A cut-off from a given probability distribution
  • Divide the data into a continuous range
15 / 27

Quantile

  • A cut-off from a given probability distribution
  • Divide the data into a continuous range

Our data:

sort(X)
## [1] 152 152 154 157 162 163 165 166 167 176
15 / 27

Quantile

  • A cut-off from a given probability distribution
  • Divide the data into a continuous range

quantile(X, probs=seq(0, 1, 1/5))
## 0% 20% 40% 60% 80% 100%
## 152 153 160 164 166 176
16 / 27

Quantile

  • A cut-off from a given probability distribution
  • Divide the data into a continuous range

quantile(X, probs=seq(0, 1, 1/4))
## 0% 25% 50% 75% 100%
## 152 155 163 166 176
17 / 27

Conclusion

  • Central tendency:
    • Mean
    • Median
  • Spread:
    • Standard deviation
    • Variance
    • Quantile
18 / 27

Query?

19 / 27

Sally Clarke

  • December 1996, home alone with her newly born baby
  • Suddenly the baby stopped responding, Sally called the ambulance
  • Failed to resuscitate Pronounced dead, diagnosed as SIDS
  • January 1998, the same thing happened to her second child
  • Post-mortem: Retro-orbital and spinal bleeding
20 / 27

Sally Clarke

  • UK criminology: two infant murders in the same household is 1 in 2mil
  • An extremely rare chance, indeed
20 / 27

Sally Clarke

  • UK epidemiology: SIDS is about 1 in 8,500 healthy newborns
  • Probability of having two consecutive cases in a row?
  • Assuming independence, two consecutive cases may happen in 1:72mil
  • An even rarer case!
20 / 27

Malcolm Collins

  • In 1964, a young lady snatched Mrs. Juanita Brooks' purse
  • According to eye witnesses:
    • The culprit was a woman in mid 20s
    • Had light blond hair with a pony tail
    • Went to a yellow motor car, driven by a black American
    • The guy had beard and mustache

(Disclaimer: Photo is just an illustration)

21 / 27

Malcolm Collins

The chance of:

  • Black man with beard: 1 in 10
  • Man with mustache: 1 in 4
  • White woman with pony tail: 1 in 10
  • White woman with blond hair: 1 in 3
  • Yellow motor car: 1 in 10
  • Interracial couple in car: 1 in 1,000

Overall probability of consecutive independent occurrence: 1 in 12mil Rare!

21 / 27

Similarities of both cases?

  • Both relied on probability theory.
  • Basically: statistics
22 / 27

Similarities of both cases?

  • Both relied on probability theory.
  • Basically: statistics ... More or less.
22 / 27

Similarities of both cases?

  • Both relied on probability theory.
  • Basically: statistics ... More or less.
  • Used in court room to determine innocence.
22 / 27

Similarities of both cases?

  • Both relied on probability theory.
  • Basically: statistics ... More or less.
  • Used in court room to determine innocence.
  • ... And both were grave mistakes.
22 / 27

Similarities of both cases?

  • Both relied on probability theory.
  • Basically: statistics ... More or less.
  • Used in court room to determine innocence.
  • ... And both were grave mistakes.
  • Let's quickly reinvestigate both cases :)
22 / 27

Sally Clarke

  • The occurrence of SIDS is not independent
  • Meaning that, there is a higher likelihood of having a second child affected by SIDS
  • ... If your first child is
  • Since they are not independent, multiplying the proportion would not do justice to estimate the chance
23 / 27

Malcolm Collins

  • Just like in Sally Clarke's case
  • The chance of finding a black guy with beard and mustache... (and so on, as previously described)... Is not an independent event
24 / 27

Then, is statistics wrong?

  • Not always the case
25 / 27

Then, is statistics wrong?

  • Not always the case
  • Sometimes, we unconsciously obscure the fact with numbers
25 / 27

Example: 8 among 10 dentists recommend Colg*te

Then, is statistics wrong?

  • Not always the case
  • Sometimes, we unconsciously obscure the fact with numbers
  • Now we will learn some examples where statistics help telling the truth
25 / 27

Example: 8 among 10 dentists recommend Colg*te

How statistics describe mental health

  • 28% of HIV-positive participants were having depressive symptoms 1
  • 49% of the French neurosurgical community reported burnout 2
  • 9.4% of medical students in a study had suicidal ideation within the past 12 months 3







  1. S. K. Y. Choi, E. Boyle, J. Cairney, et al. “Prevalence, Recurrence, and Incidence owe Current Depressive Symptoms among People Living with HIV in Ontario, Canada: Results from the Ontario HIV Treatment Network Cohort Study”. In: PLOS ONE 11.11 (Nov. 2016). Ed. by V. D. Lima, p. e0165816. DOI: 10.1371/journal.pone.0165816.
  2. C. Baumgarten, E. Michinov, G. Rouxel, et al. “Personal and psychosocial factors of burnout: A survey within the French neurosurgical community”. In: PLOS ONE 15.5 (May. 2020). Ed. by S. A. Useche, p. e0233137. DOI: 10.1371/journal.pone.0233137.
  3. L. N. Dyrbye, C. P. West, D. Satele, et al. “Burnout Among U.S. Medical Students, Residents, and Early Career Physicians Relative to the General U.S. Population”. In: Academic Medicine 89.3 (Mar. 2014), pp. 443-451. DOI: 10.1097/acm.0000000000000134.
26 / 27

Slide and short note: http://bit.ly/biostatistik-ukrida

27 / 27

About me

Aly Lamuri

FKUI 2011

Newcastle 2018

Academic writer

Research assistant

1 / 27
Paused

Help

Keyboard shortcuts

, , Pg Up, k Go to previous slide
, , Pg Dn, Space, j Go to next slide
Home Go to first slide
End Go to last slide
Number + Return Go to specific slide
b / m / f Toggle blackout / mirrored / fullscreen mode
c Clone slideshow
p Toggle presenter mode
t Restart the presentation timer
?, h Toggle this help
Esc Back to slideshow