Correlation of Numeric Variables
Aly Lamuri
Indonesia Medical Education and Research Institute
Overview
σx,y=n∑i=1(xi−μx)(yi−μy)n
σx,y=n∑i=1(xi−μx)(yi−μy)n
sx,y=n∑i=1(xi−¯x)(yi−¯y)(n−1)
tbl <- subset(iris, select=c(Sepal.Width, Sepal.Length)) %>% str()
## 'data.frame': 150 obs. of 2 variables:## $ Sepal.Width : num 3.5 3 3.2 3.1 3.6 3.9 3.4 3.4 2.9 3.1 ...## $ Sepal.Length: num 5.1 4.9 4.7 4.6 5 5.4 4.6 5 4.4 4.9 ...tbl <- subset(iris, select=c(Sepal.Width, Sepal.Length)) %>% str()
## 'data.frame': 150 obs. of 2 variables:## $ Sepal.Width : num 3.5 3 3.2 3.1 3.6 3.9 3.4 3.4 2.9 3.1 ...## $ Sepal.Length: num 5.1 4.9 4.7 4.6 5 5.4 4.6 5 4.4 4.9 ...Sepal.Width covary with Sepal.Lengthx to represent the widthy to represent the length| x | y | x.resid | y.resid | |
|---|---|---|---|---|
| 1 | 5.1 | 3.5 | -0.743333333333334 | 0.442666666666667 |
| 2 | 4.9 | 3 | -0.943333333333333 | -0.0573333333333332 |
| 3 | 4.7 | 3.2 | -1.14333333333333 | 0.142666666666667 |
| 4 | 4.6 | 3.1 | -1.24333333333333 | 0.0426666666666669 |
| 5 | 5 | 3.6 | -0.843333333333334 | 0.542666666666667 |
| 6 | 5.4 | 3.9 | -0.443333333333333 | 0.842666666666667 |
| 7 | 4.6 | 3.4 | -1.24333333333333 | 0.342666666666667 |
| 8 | 5 | 3.4 | -0.843333333333334 | 0.342666666666667 |
| 9 | 4.4 | 2.9 | -1.44333333333333 | -0.157333333333333 |
| 10 | 4.9 | 3.1 | -0.943333333333333 | 0.0426666666666669 |
covariance <- function(x, y) { n <- length(x) # Length of x must be = length of y {(x - mean(x)) * (y - mean(y))} %>% sum() %>% divide_by(n-1)}
covariance <- function(x, y) { n <- length(x) # Length of x must be = length of y {(x - mean(x)) * (y - mean(y))} %>% sum() %>% divide_by(n-1)}
covariance(tbl$x, tbl$y)
## [1] -0.042cov(tbl$x, tbl$y) # Built-in function
## [1] -0.042How if we calculate covariances of the same variable?
How if we calculate covariances of the same variable?
covariance(tbl$x, tbl$x)
## [1] 0.69var(tbl$x) # Variance of x
## [1] 0.69How if we calculate covariances of the same variable?
covariance(tbl$x, tbl$x)
## [1] 0.69var(tbl$x) # Variance of x
## [1] 0.69sx,x=n∑i=1(xi−¯x)(xi−¯x)(n−1)
tbl <- subset(iris, select=-Species) %T>% str()
## 'data.frame': 150 obs. of 4 variables:## $ Sepal.Length: num 5.1 4.9 4.7 4.6 5 5.4 4.6 5 4.4 4.9 ...## $ Sepal.Width : num 3.5 3 3.2 3.1 3.6 3.9 3.4 3.4 2.9 3.1 ...## $ Petal.Length: num 1.4 1.4 1.3 1.5 1.4 1.7 1.4 1.5 1.4 1.5 ...## $ Petal.Width : num 0.2 0.2 0.2 0.2 0.2 0.4 0.3 0.2 0.2 0.1 ...cov(tbl)
## Sepal.Length Sepal.Width Petal.Length Petal.Width## Sepal.Length 0.686 -0.042 1.27 0.52## Sepal.Width -0.042 0.190 -0.33 -0.12## Petal.Length 1.274 -0.330 3.12 1.30## Petal.Width 0.516 -0.122 1.30 0.58Overview
r=sx,ysx⋅sy=n∑i=1(x−¯x)(y−¯y)(n−1)⋅sx⋅sy=n∑i=1(x−¯xsx)⋅(y−¯ysy)n−1
r=Zx⋅Zyn−1ν=n−2(DoF)
r=Zx⋅Zyn−1ν=n−2(DoF)
t=r√1−r2n−2
lapply(tbl, shapiro.test) %>% lapply(broom::tidy) %>% lapply(data.frame) %>% {do.call(rbind, .)} %>% kable() %>% kable_minimal()
| statistic | p.value | method | |
|---|---|---|---|
| Sepal.Length | 0.98 | 0.01 | Shapiro-Wilk normality test |
| Sepal.Width | 0.98 | 0.10 | Shapiro-Wilk normality test |
| Petal.Length | 0.88 | 0.00 | Shapiro-Wilk normality test |
| Petal.Width | 0.90 | 0.00 | Shapiro-Wilk normality test |

subset(tbl, select=c(Sepal.Length, Sepal.Width)) %>% MVN::mvn() # Multivariate normality
## $multivariateNormality## Test Statistic p value Result## 1 Mardia Skewness 9.46144098216623 0.0505456076692465 YES## 2 Mardia Kurtosis -0.853178029438543 0.393560585232763 YES## 3 MVN <NA> <NA> YES## ## $univariateNormality## Test Variable Statistic p value Normality## 1 Shapiro-Wilk Sepal.Length 0.98 0.01 NO ## 2 Shapiro-Wilk Sepal.Width 0.98 0.10 YES ## ## $Descriptives## n Mean Std.Dev Median Min Max 25th 75th Skew Kurtosis## Sepal.Length 150 5.8 0.83 5.8 4.3 7.9 5.1 6.4 0.31 -0.61## Sepal.Width 150 3.1 0.44 3.0 2.0 4.4 2.8 3.3 0.31 0.14cor.test(tbl$Sepal.Length, tbl$Sepal.Width)
## ## Pearson's product-moment correlation## ## data: tbl$Sepal.Length and tbl$Sepal.Width## t = -1, df = 148, p-value = 0.2## alternative hypothesis: true correlation is not equal to 0## 95 percent confidence interval:## -0.273 0.044## sample estimates:## cor ## -0.12cor.test(tbl$Sepal.Length, tbl$Sepal.Width)
## ## Pearson's product-moment correlation## ## data: tbl$Sepal.Length and tbl$Sepal.Width## t = -1, df = 148, p-value = 0.2## alternative hypothesis: true correlation is not equal to 0## 95 percent confidence interval:## -0.273 0.044## sample estimates:## cor ## -0.12
Overview
ρ=1−6∑(Rx−Ry)2n(n2−1)ν=n−2(DoF)
ρ=1−6∑(Rx−Ry)2n(n2−1)ν=n−2(DoF)
t=ρ√1−ρ2n−2
iris datasettbl <- subset(iris, select=-Species) %T>% str()
## 'data.frame': 150 obs. of 4 variables:## $ Sepal.Length: num 5.1 4.9 4.7 4.6 5 5.4 4.6 5 4.4 4.9 ...## $ Sepal.Width : num 3.5 3 3.2 3.1 3.6 3.9 3.4 3.4 2.9 3.1 ...## $ Petal.Length: num 1.4 1.4 1.3 1.5 1.4 1.7 1.4 1.5 1.4 1.5 ...## $ Petal.Width : num 0.2 0.2 0.2 0.2 0.2 0.4 0.3 0.2 0.2 0.1 ...cov(tbl)
## Sepal.Length Sepal.Width Petal.Length Petal.Width## Sepal.Length 0.686 -0.042 1.27 0.52## Sepal.Width -0.042 0.190 -0.33 -0.12## Petal.Length 1.274 -0.330 3.12 1.30## Petal.Width 0.516 -0.122 1.30 0.58cor.test(tbl$Sepal.Length, tbl$Sepal.Width, method="spearman")
## ## Spearman's rank correlation rho## ## data: tbl$Sepal.Length and tbl$Sepal.Width## S = 7e+05, p-value = 0.04## alternative hypothesis: true rho is not equal to 0## sample estimates:## rho ## -0.17cor.test(tbl$Sepal.Length, tbl$Sepal.Width, method="spearman")
## ## Spearman's rank correlation rho## ## data: tbl$Sepal.Length and tbl$Sepal.Width## S = 7e+05, p-value = 0.04## alternative hypothesis: true rho is not equal to 0## sample estimates:## rho ## -0.17Overview
τa=nc−ndnτb=nc−nd√(n+X0)(n+Y0)τc=2(nc−nd)n2(m−1)mn=(n2)
tbl <- subset(iris, select=-Species) %T>% str()
## 'data.frame': 150 obs. of 4 variables:## $ Sepal.Length: num 5.1 4.9 4.7 4.6 5 5.4 4.6 5 4.4 4.9 ...## $ Sepal.Width : num 3.5 3 3.2 3.1 3.6 3.9 3.4 3.4 2.9 3.1 ...## $ Petal.Length: num 1.4 1.4 1.3 1.5 1.4 1.7 1.4 1.5 1.4 1.5 ...## $ Petal.Width : num 0.2 0.2 0.2 0.2 0.2 0.4 0.3 0.2 0.2 0.1 ...cov(tbl)
## Sepal.Length Sepal.Width Petal.Length Petal.Width## Sepal.Length 0.686 -0.042 1.27 0.52## Sepal.Width -0.042 0.190 -0.33 -0.12## Petal.Length 1.274 -0.330 3.12 1.30## Petal.Width 0.516 -0.122 1.30 0.58cor.test(tbl$Sepal.Length, tbl$Sepal.Width, method="kendall")
## ## Kendall's rank correlation tau## ## data: tbl$Sepal.Length and tbl$Sepal.Width## z = -1, p-value = 0.2## alternative hypothesis: true tau is not equal to 0## sample estimates:## tau ## -0.077cor.test(tbl$Sepal.Length, tbl$Sepal.Width, method="kendall")
## ## Kendall's rank correlation tau## ## data: tbl$Sepal.Length and tbl$Sepal.Width## z = -1, p-value = 0.2## alternative hypothesis: true tau is not equal to 0## sample estimates:## tau ## -0.077R only implements τa, other methods exist in a specific packagesShort answer: no.
Short answer: no.
Query?
Overview
Keyboard shortcuts
| ↑, ←, Pg Up, k | Go to previous slide |
| ↓, →, Pg Dn, Space, j | Go to next slide |
| Home | Go to first slide |
| End | Go to last slide |
| Number + Return | Go to specific slide |
| b / m / f | Toggle blackout / mirrored / fullscreen mode |
| c | Clone slideshow |
| p | Toggle presenter mode |
| t | Restart the presentation timer |
| ?, h | Toggle this help |
| Esc | Back to slideshow |