Parametric: Mean in Two Groups
When we obtain a normally-distributed data, parameters and from our PDF can completely explain the behaviour seen in our sample. With represents the central tendency and the spread, we can directly compare similarly distributed samples. Often, we need to confirm how much our average value differs from other observations. In doing so, we are facing a mean difference problem in our venture of statistics. This lecture will help us proving mean differences in one-sample and two-sample problems.
Mean Difference
We may have a vivid recollection on previous lectures of data distribution and the Central Limit Theorem (CLT). For any data following a normal distribution, centering and scaling according to its and results in a standardized normal distribution, i.e. a Z-distribution.
For any given distribution, the mean of such a sample will undergo a convergence of random variable into a normal distribution with parameters of and .
With known and , we can make a direct comparison between our data and the population. However, how if we do not know the parameter ? We can make a close estimate using its statistics, . A mean difference is simply a result of subtracting sampled mean from the hypothesized parameter, which corresponds to . However, our sample bounds to have error, either a systematic or unsystematic (random) ones. An adjustment to such problems requires us to divide our measures into the standard error. Obtained quotient is our statistics of interest, which will follow a Z-distribution.
Having our statistics as an element of Z-distribution, we can compute the p-value by looking at the probability of our statistical value. As always, we first need to initiate the significant level so we can measure where our statistics is located in relation to the significant value . We regard this approach as a Z-Test, where the p-value represents probabilities of .
We shall consider the following scenario as an example:
In a population of third-year electrical engineering students, we know the average final score of a particular course is 70. In measuring students’ comprehension, UKRIDA has established a standardized examination with a standard deviation of 10. We are interested to see whether students registered to this year course have different average, where 18 students averagely scored 75 on the final exam.
Then, we need to formulate our hypotheses:
Followed by computing the statistics:
Where does located in Z-distribution?
Assigning our significance value on both tails results in:
Since our assumes non-equality, we can compute the p-value according to two-tailed test procedures. First we need to find the cumulative probability of which satisfies:
Then we have to subtract from 1 and multiply the difference by 2 to obtain the two-tailed p-value.
| [1] 0.034
So far, we understand that Z-test requires the sample to follow a normal distribution. Before conducting any formal test, it is imperative to ascertain the sampled distribution, i.e. using a goodness of fit or normality test. We do not need to know the parameter because we can hypothesize the value. However, we need the parameter to correctly compute . It is becoming quite problematic when we do not know the value of , of which we often don’t! In such a case, we need to consider using a T-distribution instead.
Student’s T-Distribution
Student’s T-distribution only depends on 1 parameter, degree of freedom . Mathematically, T-distribution has the following notation: . The T-distribution is pivotal to compute statistics in mean difference of a normally-distributed data. Degree of freedom in T-distribution is simply , where represents the total number of sample.
Aside from its relationship with Z-distribution, T-distribution also independently relates to the distribution, where they share the same degree of freedom.
One Sample T-Test
One sample T-test is analogous to the Z-test, where we use it when we could not ascertain the . In place of , T-test use as an estimate to the population parameter. By adapting the statistics equation, we can compute statistics as follow:
As an example, we shall generate an array of random numbers following a normal distribution where .
First, we do some basic exploratory analysis by finding the central tendency and spread.
| Min. 1st Qu. Median Mean 3rd Qu. Max.
| 75.7 112.3 127.2 123.8 135.2 151.9
| [1] 18.3
We let , yet our is 123.81 with an of r sd(s) and a of 19. Does our statistics differ from the parameter
? We can further formulate our question into hypotheses:
Then we can determine the statistics:
| [1] 0.933
Then we shall locate statistics into its distribution of :
Then we can compute the p-value for a one-tailed test:
| [1] 0.181
Also the p-value for a two-tailed test:
| [1] 0.363
How does our calculation compare to the built-in function in R?
|
| One Sample t-test
|
| data: x
| t = 0.9, df = 19, p-value = 0.4
| alternative hypothesis: true mean is not equal to 120
| 95 percent confidence interval:
| 115 132
| sample estimates:
| mean of x
| 124
At this point, we may have realised R only prints the rounded value for
acquired computation. If we are interested to see the actual value, we may save
our test result as an object, then directly call the specific result. In
following demonstration, we shall save the T-test and obtain its p-value.
| [1] 0.363
Comparing our computation and the result acquired from R built-in function,
we failed to reject our , so we conclude .
Unpaired T-Test
When we two samples, we can compare the central tendency of both samples by computing the mean difference. If both data follow a normal distribution, we are conducting a two-sample T-test. As in previous examples, unpaired T-Test (or rather, T-Test in general) assumes normality. Moreover, there are further assumptions when conducting T-test, resulting in two distinctive types of said test, i.e. a Student’s and Welch’s approach. T-test is arguably robust in non-normally distributed data to a certain degree, where skewedness and outliers highly influence its robustness. The problem with robustness is the way we properly evaluate how T-test may provide a correct inference in the event of having non-normal data. A few simulations have demonstrated how T-test robust against distribution with a specific range of . However, when we have a real-world data, we often do not know their underlying parameters. In such cases, it is safe to follow normality assumption to avoid type-I error inflations. To test mean difference between two samples, we formulate following hypotheses:
As outlined above, our hypotheses depends on the value of , which defaulted to . In some rare occasion, we may apply a different value of , but for now we will stick with .
Student’s T-Test
Student’s approach in T-test is a test with pooled variance. We may conduct this test when we know that variances in both samples are comparably similar. We denote similarity as a homogeneity of variance, which we can prove using the Levene’s test.
However, if our data fail to fulfill homogeneity of variance assumption, we shall appropriately use a Welch’s T-test.
Welch’s T-Test
Welch’s approach still assume normality, but give more leniency to equality of variance. As a result, Welch modified the equation to compute the statistics, where we may find:
Compared to Student’s T-test, Welch’s provide a different measure of to adjust for differences in sample variances. As an example, we may consider following situation:
Suppose we are collecting data on body height. Our population of interest will be students registered in UKRIDA, where we categorize sex as female and male. We acquire a normally distributed data from both sexes, where:
We have a sample of 25 females and 30 males, and would like conduct a hypothesis test on mean difference.
| $female
| Min. 1st Qu. Median Mean 3rd Qu. Max.
| 123 146 154 158 173 190
|
| $male
| Min. 1st Qu. Median Mean 3rd Qu. Max.
| 152 165 168 170 177 184
| female male
| 17.98 7.93
Do both groups in our sample follow a normal distribution?
| $female
|
| Shapiro-Wilk normality test
|
| data: X[[i]]
| W = 1, p-value = 0.7
|
|
| $male
|
| Shapiro-Wilk normality test
|
| data: X[[i]]
| W = 1, p-value = 0.2
Considering both p-values being in Shapiro-Wilk test, we can ascertain their normality. We then tested for homogeneity of variance using Lavene’s test:
| Levene's Test for Homogeneity of Variance (center = median)
| Df F value Pr(>F)
| group 1 14.3 0.00039 ***
| 53
| ---
| Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Interpreting a low p-value, we can conclude variances in both groups are not equal to one another. In this case, we will follow Welch’s method.
|
| Welch Two Sample t-test
|
| data: height by sex
| t = -3, df = 32, p-value = 0.004
| alternative hypothesis: true difference in means between group female and group male is not equal to 0
| 95 percent confidence interval:
| -19.87 -4.07
| sample estimates:
| mean in group female mean in group male
| 158 170
Just for curiosity sake, we may want to try Student’s method as well and see how it is different from Welch’s:
|
| Two Sample t-test
|
| data: height by sex
| t = -3, df = 53, p-value = 0.002
| alternative hypothesis: true difference in means between group female and group male is not equal to 0
| 95 percent confidence interval:
| -19.27 -4.67
| sample estimates:
| mean in group female mean in group male
| 158 170
The Student’s T-test reported a lower p-value compared to Welch’s T-test when we have unequal variance. A low p-value is not a bad sign per se, but we need to be wary when we violated required assumptions. A low p-value may indicate an inflation in statistical error. After conducting our test, we can summarize our findings by visualizing them.
Visualizing our results is important when conducting a statistical inference, since it gives the reader a clearer representation on what we observed in our data. Both figures give similar information, yet conveyed in different fashions.
Paired T-Test
The equation in unpaired T-test implicitly imply independence of each data point, where we could not correctly infer a paired data. We may consider using a paired T-test when we have following situations:
- Difference between multiple measurements
- Probability events where each instance influence another
In measuring mean differences between paired data, first we need to reduce its complexity. Suppose and represent measurement in and . Both measures represent same subject (within comparison). We can calculate the difference between both samples:
Then, we only needed to take into account one-sample difference, where we hypothesize:
Does it seem familiar? Because it is! By reducing the complexity, we can infer differences in paired data using a one-sample T-test to . Following example will help us visualizing the concept:
In the current investigation, we are looking for the effect of a certain anti-hipertensive drug. First we measure the blood pressure baseline, then prescribe the drug to all subjects. Then, we re-measure the blood pressure after one month. Each subject has a unique identifier, so we can specify mean differences within paired samples. Suppose we have the following scenario in 30 sampled subjects:
We then set our hypotheses:
| [1] 3.12
| [1] 0.076
|
| One Sample t-test
|
| data: md
| t = 2, df = 29, p-value = 0.08
| alternative hypothesis: true mean is not equal to 0
| 95 percent confidence interval:
| -0.64 12.10
| sample estimates:
| mean of x
| 5.73
|
| Paired t-test
|
| data: bp by time
| t = 2, df = 29, p-value = 0.08
| alternative hypothesis: true mean difference is not equal to 0
| 95 percent confidence interval:
| -0.64 12.10
| sample estimates:
| mean difference
| 5.73
Choosing an Appropriate Test
All tests explained in this post assume data normality, both for T-test and Z-test. As a general rule of thumb, we may use following conventions:
- One-sample test:
- Known use Z-Test
- Unknown use one-sample T-Test
- Two-sample test do Levene’s test
- Equal variance: Student’s method (pooled variance)
- Unequal variance: Welch’s method
- Paired T-Test: Basically one-sample T-Test on sampled differences
Effect Size
The previous lecture on sample size equation served as a brief introduction on statistical power, which presented us a new concept of effect size. There are two related concepts to effect size and statistical power, viz. sample size and significance level . Effect size calculation varies on the type of data we consider and how we conduct our formal test. In this section, we will focus on measuring effect size in a mean difference between two groups using Cohen’s distance .
As an example, we will re-use previous scenario:
| [1] 3.12
| [1] 0.076
We will calculate by issuing following command in R. Beware though,
different method of computing also exists as we can be more flexible in
expressing our code. I prefer this method because it clearly explains what we
do at the expense of the need to understand apply family of functions in R,
which is a good thing to know if you were to learn R (and I hope you will!)
| [1] 12.4
| [1] 0.464
As a comparison, we can also compute using the psych package.
| Call: psych::cohen.d(x = tbl ~ time)
| Cohen d statistic of difference between two means
| lower effect upper
| bp -0.98 -0.47 0.04
|
| Multivariate (Mahalanobis) distance between groups
| [1] 0.47
| r equivalent of difference between two means
| bp
| -0.23
Finally, we can apply to computing the statistical power:
|
| Paired t test power calculation
|
| n = 30
| d = 0.472
| sig.level = 0.05
| power = 0.704
| alternative = two.sided
|
| NOTE: n is number of *pairs*