Two Sample t-test: Definition, Formula, and Example

A two sample t-test is used to determine whether or not two population means are equal.

This tutorial explains the following:

  • The motivation for performing a two sample t-test.
  • The formula to perform a two sample t-test.
  • The assumptions that should be met to perform a two sample t-test.
  • An example of how to perform a two sample t-test.

Two Sample t-test: Motivation

Suppose we want to know whether or not the mean weight between two different species of turtles is equal. Since there are thousands of turtles in each population, it would be too time-consuming and costly to go around and weigh each individual turtle.

Instead, we might take a simple random sample of 15 turtles from each population and use the mean weight in each sample to determine if the mean weight is equal between the two populations:

Two sample t-test example

However, it’s virtually guaranteed that the mean weight between the two samples will be at least a little different. The question is whether or not this difference is statistically significant . Fortunately, a two sample t-test allows us to answer this question.

Two Sample t-test: Formula

A two-sample t-test always uses the following null hypothesis:

  • H 0 : μ 1  = μ 2 (the two population means are equal)

The alternative hypothesis can be either two-tailed, left-tailed, or right-tailed:

  • H 1 (two-tailed): μ 1  ≠ μ 2 (the two population means are not equal)
  • H 1 (left-tailed): μ 1  2 (population 1 mean is less than population 2 mean)
  • H 1 (right-tailed):  μ 1 > μ 2  (population 1 mean is greater than population 2 mean)

We use the following formula to calculate the test statistic t:

Test statistic:  ( x 1  –  x 2 )  /  s p (√ 1/n 1  + 1/n 2 )

where  x 1  and  x 2 are the sample means, n 1 and n 2  are the sample sizes, and where s p is calculated as:

s p = √  (n 1 -1)s 1 2  +  (n 2 -1)s 2 2  /  (n 1 +n 2 -2)

where s 1 2  and s 2 2  are the sample variances.

If the p-value that corresponds to the test statistic t with (n 1 +n 2 -1) degrees of freedom is less than your chosen significance level (common choices are 0.10, 0.05, and 0.01) then you can reject the null hypothesis.

Two Sample t-test: Assumptions

For the results of a two sample t-test to be valid, the following assumptions should be met:

  • The observations in one sample should be independent of the observations in the other sample.
  • The data should be approximately normally distributed.
  • The two samples should have approximately the same variance. If this assumption is not met, you should instead perform Welch’s t-test .
  • The data in both samples was obtained using a random sampling method .

Two Sample t-test : Example

Suppose we want to know whether or not the mean weight between two different species of turtles is equal. To test this, will perform a two sample t-test at significance level α = 0.05 using the following steps:

Step 1: Gather the sample data.

Suppose we collect a random sample of turtles from each population with the following information:

  • Sample size n 1 = 40
  • Sample mean weight  x 1  = 300
  • Sample standard deviation s 1 = 18.5
  • Sample size n 2 = 38
  • Sample mean weight  x 2  = 305
  • Sample standard deviation s 2 = 16.7

Step 2: Define the hypotheses.

We will perform the two sample t-test with the following hypotheses:

  • H 0 :  μ 1  = μ 2 (the two population means are equal)
  • H 1 :  μ 1  ≠ μ 2 (the two population means are not equal)

Step 3: Calculate the test statistic  t .

First, we will calculate the pooled standard deviation s p :

s p = √  (n 1 -1)s 1 2  +  (n 2 -1)s 2 2  /  (n 1 +n 2 -2)  = √  (40-1)18.5 2  +  (38-1)16.7 2  /  (40+38-2)  = 17.647

Next, we will calculate the test statistic  t :

t = ( x 1  –  x 2 )  /  s p (√ 1/n 1  + 1/n 2 ) =  (300-305) / 17.647(√ 1/40 + 1/38 ) =  -1.2508

Step 4: Calculate the p-value of the test statistic  t .

According to the T Score to P Value Calculator , the p-value associated with t = -1.2508 and degrees of freedom = n 1 +n 2 -2 = 40+38-2 = 76 is  0.21484 .

Step 5: Draw a conclusion.

Since this p-value is not less than our significance level α = 0.05, we fail to reject the null hypothesis. We do not have sufficient evidence to say that the mean weight of turtles between these two populations is different.

Note:  You can also perform this entire two sample t-test by simply using the Two Sample t-test Calculator .

Additional Resources

The following tutorials explain how to perform a two-sample t-test using different statistical programs:

How to Perform a Two Sample t-test in Excel How to Perform a Two Sample t-test in SPSS How to Perform a Two Sample t-test in Stata How to Perform a Two Sample t-test in R How to Perform a Two Sample t-test in Python How to Perform a Two Sample t-test on a TI-84 Calculator

An Introduction to the Binomial Distribution

4 examples of using linear regression in real life, related posts, three-way anova: definition & example, two sample z-test: definition, formula, and example, one sample z-test: definition, formula, and example, how to find a confidence interval for a..., an introduction to the exponential distribution, an introduction to the uniform distribution, the breusch-pagan test: definition & example, population vs. sample: what’s the difference, introduction to multiple linear regression, dunn’s test for multiple comparisons.

JMP | Statistical Discovery.™ From SAS.

Statistics Knowledge Portal

A free online introduction to statistics

The Two-Sample t -Test

What is the two-sample t -test.

The two-sample t -test (also known as the independent samples t -test) is a method used to test whether the unknown population means of two groups are equal or not.

Is this the same as an A/B test?

Yes, a two-sample t -test is used to analyze the results from A/B tests.

When can I use the test?

You can use the test when your data values are independent, are randomly sampled from two normal populations and the two independent groups have equal variances.

What if I have more than two groups?

Use a multiple comparison method. Analysis of variance (ANOVA) is one such method. Other multiple comparison methods include the Tukey-Kramer test of all pairwise differences, analysis of means (ANOM) to compare group means to the overall mean or Dunnett’s test to compare each group mean to a control mean.

What if the variances for my two groups are not equal?

You can still use the two-sample t- test. You use a different estimate of the standard deviation. 

What if my data isn’t nearly normally distributed?

If your sample sizes are very small, you might not be able to test for normality. You might need to rely on your understanding of the data. When you cannot safely assume normality, you can perform a nonparametric test that doesn’t assume normality.

See how to perform a two-sample t -test using statistical software

  • Download JMP to follow along using the sample data included with the software.
  • To see more JMP tutorials, visit the JMP Learning Library .

Using the two-sample t -test

The sections below discuss what is needed to perform the test, checking our data, how to perform the test and statistical details.

What do we need?

For the two-sample t -test, we need two variables. One variable defines the two groups. The second variable is the measurement of interest.

We also have an idea, or hypothesis, that the means of the underlying populations for the two groups are different. Here are a couple of examples:

  • We have students who speak English as their first language and students who do not. All students take a reading test. Our two groups are the native English speakers and the non-native speakers. Our measurements are the test scores. Our idea is that the mean test scores for the underlying populations of native and non-native English speakers are not the same. We want to know if the mean score for the population of native English speakers is different from the people who learned English as a second language.
  • We measure the grams of protein in two different brands of energy bars. Our two groups are the two brands. Our measurement is the grams of protein for each energy bar. Our idea is that the mean grams of protein for the underlying populations for the two brands may be different. We want to know if we have evidence that the mean grams of protein for the two brands of energy bars is different or not.

Two-sample t -test assumptions

To conduct a valid test:

  • Data values must be independent. Measurements for one observation do not affect measurements for any other observation.
  • Data in each group must be obtained via a random sample from the population.
  • Data in each group are normally distributed .
  • Data values are continuous.
  • The variances for the two independent groups are equal.

For very small groups of data, it can be hard to test these requirements. Below, we'll discuss how to check the requirements using software and what to do when a requirement isn’t met.

Two-sample t -test example

One way to measure a person’s fitness is to measure their body fat percentage. Average body fat percentages vary by age, but according to some guidelines, the normal range for men is 15-20% body fat, and the normal range for women is 20-25% body fat.

Our sample data is from a group of men and women who did workouts at a gym three times a week for a year. Then, their trainer measured the body fat. The table below shows the data.

Table 1: Body fat percentage data grouped by gender

You can clearly see some overlap in the body fat measurements for the men and women in our sample, but also some differences. Just by looking at the data, it's hard to draw any solid conclusions about whether the underlying populations of men and women at the gym have the same mean body fat. That is the value of statistical tests – they provide a common, statistically valid way to make decisions, so that everyone makes the same decision on the same set of data values.

Checking the data

Let’s start by answering: Is the two-sample t -test an appropriate method to evaluate the difference in body fat between men and women?

  • The data values are independent. The body fat for any one person does not depend on the body fat for another person.
  • We assume the people measured represent a simple random sample from the population of members of the gym.
  • We assume the data are normally distributed, and we can check this assumption.
  • The data values are body fat measurements. The measurements are continuous.
  • We assume the variances for men and women are equal, and we can check this assumption.

Before jumping into analysis, we should always take a quick look at the data. The figure below shows histograms and summary statistics for the men and women.

Histogram and summary statistics for the body fat data

The two histograms are on the same scale. From a quick look, we can see that there are no very unusual points, or outliers . The data look roughly bell-shaped, so our initial idea of a normal distribution seems reasonable.

Examining the summary statistics, we see that the standard deviations are similar. This supports the idea of equal variances. We can also check this using a test for variances.

Based on these observations, the two-sample t -test appears to be an appropriate method to test for a difference in means.

How to perform the two-sample t -test

For each group, we need the average, standard deviation and sample size. These are shown in the table below.

Table 2: Average, standard deviation and sample size statistics grouped by gender

Without doing any testing, we can see that the averages for men and women in our samples are not the same. But how different are they? Are the averages “close enough” for us to conclude that mean body fat is the same for the larger population of men and women at the gym? Or are the averages too different for us to make this conclusion?

We'll further explain the principles underlying the two sample t -test in the statistical details section below, but let's first proceed through the steps from beginning to end. We start by calculating our test statistic. This calculation begins with finding the difference between the two averages:

$ 22.29 - 14.95 = 7.34 $

This difference in our samples estimates the difference between the population means for the two groups.

Next, we calculate the pooled standard deviation. This builds a combined estimate of the overall standard deviation. The estimate adjusts for different group sizes. First, we calculate the pooled variance:

$ s_p^2 = \frac{((n_1 - 1)s_1^2) + ((n_2 - 1)s_2^2)} {n_1 + n_2 - 2} $

$ s_p^2 = \frac{((10 - 1)5.32^2) + ((13 - 1)6.84^2)}{(10 + 13 - 2)} $

$ = \frac{(9\times28.30) + (12\times46.82)}{21} $

$ = \frac{(254.7 + 561.85)}{21} $

$ =\frac{816.55}{21} = 38.88 $

Next, we take the square root of the pooled variance to get the pooled standard deviation. This is:

$ \sqrt{38.88} = 6.24 $

We now have all the pieces for our test statistic. We have the difference of the averages, the pooled standard deviation and the sample sizes.  We calculate our test statistic as follows:

$ t = \frac{\text{difference of group averages}}{\text{standard error of difference}} = \frac{7.34}{(6.24\times \sqrt{(1/10 + 1/13)})} = \frac{7.34}{2.62} = 2.80 $

To evaluate the difference between the means in order to make a decision about our gym programs, we compare the test statistic to a theoretical value from the t- distribution. This activity involves four steps:

  • We decide on the risk we are willing to take for declaring a significant difference. For the body fat data, we decide that we are willing to take a 5% risk of saying that the unknown population means for men and women are not equal when they really are. In statistics-speak, the significance level, denoted by α, is set to 0.05. It is a good practice to make this decision before collecting the data and before calculating test statistics.
  • We calculate a test statistic. Our test statistic is 2.80.
  • We find the theoretical value from the t- distribution based on our null hypothesis which states that the means for men and women are equal. Most statistics books have look-up tables for the t- distribution. You can also find tables online. The most likely situation is that you will use software and will not use printed tables. To find this value, we need the significance level (α = 0.05) and the degrees of freedom . The degrees of freedom ( df ) are based on the sample sizes of the two groups. For the body fat data, this is: $ df = n_1 + n_2 - 2 = 10 + 13 - 2 = 21 $ The t value with α = 0.05 and 21 degrees of freedom is 2.080.
  • We compare the value of our statistic (2.80) to the t value. Since 2.80 > 2.080, we reject the null hypothesis that the mean body fat for men and women are equal, and conclude that we have evidence body fat in the population is different between men and women.

Statistical details

Let’s look at the body fat data and the two-sample t -test using statistical terms.

Our null hypothesis is that the underlying population means are the same. The null hypothesis is written as:

$ H_o:  \mathrm{\mu_1} =\mathrm{\mu_2} $

The alternative hypothesis is that the means are not equal. This is written as:

$ H_o:  \mathrm{\mu_1} \neq \mathrm{\mu_2} $

We calculate the average for each group, and then calculate the difference between the two averages. This is written as:

$\overline{x_1} -  \overline{x_2} $

We calculate the pooled standard deviation. This assumes that the underlying population variances are equal. The pooled variance formula is written as:

The formula shows the sample size for the first group as n 1 and the second group as n 2 . The standard deviations for the two groups are s 1 and s 2 . This estimate allows the two groups to have different numbers of observations. The pooled standard deviation is the square root of the variance and is written as s p .

What if your sample sizes for the two groups are the same? In this situation, the pooled estimate of variance is simply the average of the variances for the two groups:

$ s_p^2 = \frac{(s_1^2 + s_2^2)}{2} $

The test statistic is calculated as:

$ t = \frac{(\overline{x_1} -\overline{x_2})}{s_p\sqrt{1/n_1 + 1/n_2}} $

The numerator of the test statistic is the difference between the two group averages. It estimates the difference between the two unknown population means. The denominator is an estimate of the standard error of the difference between the two unknown population means. 

Technical Detail: For a single mean, the standard error is $ s/\sqrt{n} $  . The formula above extends this idea to two groups that use a pooled estimate for s (standard deviation), and that can have different group sizes.

We then compare the test statistic to a t value with our chosen alpha value and the degrees of freedom for our data. Using the body fat data as an example, we set α = 0.05. The degrees of freedom ( df ) are based on the group sizes and are calculated as:

$ df = n_1 + n_2 - 2 = 10 + 13 - 2 = 21 $

The formula shows the sample size for the first group as n 1 and the second group as n 2 .  Statisticians write the t value with α = 0.05 and 21 degrees of freedom as:

$ t_{0.05,21} $

The t value with α = 0.05 and 21 degrees of freedom is 2.080. There are two possible results from our comparison:

  • The test statistic is lower than the t value. You fail to reject the hypothesis of equal means. You conclude that the data support the assumption that the men and women have the same average body fat.
  • The test statistic is higher than the t value. You reject the hypothesis of equal means. You do not conclude that men and women have the same average body fat.

t -Test with unequal variances

When the variances for the two groups are not equal, we cannot use the pooled estimate of standard deviation. Instead, we take the standard error for each group separately. The test statistic is:

$ t = \frac{ (\overline{x_1} -  \overline{x_2})}{\sqrt{s_1^2/n_1 + s_2^2/n_2}} $

The numerator of the test statistic is the same. It is the difference between the averages of the two groups. The denominator is an estimate of the overall standard error of the difference between means. It is based on the separate standard error for each group.

The degrees of freedom calculation for the t value is more complex with unequal variances than equal variances and is usually left up to statistical software packages. The key point to remember is that if you cannot use the pooled estimate of standard deviation, then you cannot use the simple formula for the degrees of freedom.

Testing for normality

The normality assumption is more important   when the two groups have small sample sizes than for larger sample sizes.

Normal distributions are symmetric, which means they are “even” on both sides of the center. Normal distributions do not have extreme values, or outliers. You can check these two features of a normal distribution with graphs. Earlier, we decided that the body fat data was “close enough” to normal to go ahead with the assumption of normality. The figure below shows a normal quantile plot for men and women, and supports our decision.

 Normal quantile plot of the body fat measurements for men and women

You can also perform a formal test for normality using software. The figure above shows results of testing for normality with JMP software. We test each group separately. Both the test for men and the test for women show that we cannot reject the hypothesis of a normal distribution. We can go ahead with the assumption that the body fat data for men and for women are normally distributed.

Testing for unequal variances

Testing for unequal variances is complex. We won’t show the calculations in detail, but will show the results from JMP software. The figure below shows results of a test for unequal variances for the body fat data.

Test for unequal variances for the body fat data

Without diving into details of the different types of tests for unequal variances, we will use the F test. Before testing, we decide to accept a 10% risk of concluding the variances are equal when they are not. This means we have set α = 0.10.

Like most statistical software, JMP shows the p -value for a test. This is the likelihood of finding a more extreme value for the test statistic than the one observed. It’s difficult to calculate by hand. For the figure above, with the F test statistic of 1.654, the p- value is 0.4561. This is larger than our α value: 0.4561 > 0.10. We fail to reject the hypothesis of equal variances. In practical terms, we can go ahead with the two-sample t -test with the assumption of equal variances for the two groups.

Understanding p-values

Using a visual, you can check to see if your test statistic is a more extreme value in the distribution. The figure below shows a t- distribution with 21 degrees of freedom.

t-distribution with 21 degrees of freedom and α = .05

Since our test is two-sided and we have set α = .05, the figure shows that the value of 2.080 “cuts off” 2.5% of the data in each of the two tails. Only 5% of the data overall is further out in the tails than 2.080. Because our test statistic of 2.80 is beyond the cut-off point, we reject the null hypothesis of equal means.

Putting it all together with software

The figure below shows results for the two-sample t -test for the body fat data from JMP software.

Results for the two-sample t-test from JMP software

The results for the two-sample t -test that assumes equal variances are the same as our calculations earlier. The test statistic is 2.79996. The software shows results for a two-sided test and for one-sided tests. The two-sided test is what we want (Prob > |t|). Our null hypothesis is that the mean body fat for men and women is equal. Our alternative hypothesis is that the mean body fat is not equal. The one-sided tests are for one-sided alternative hypotheses – for example, for a null hypothesis that mean body fat for men is less than that for women.

We can reject the hypothesis of equal mean body fat for the two groups and conclude that we have evidence body fat differs in the population between men and women. The software shows a p -value of 0.0107. We decided on a 5% risk of concluding the mean body fat for men and women are different, when they are not. It is important to make this decision before doing the statistical test.

The figure also shows the results for the t- test that does not assume equal variances. This test does not use the pooled estimate of the standard deviation. As was mentioned above, this test also has a complex formula for degrees of freedom. You can see that the degrees of freedom are 20.9888. The software shows a p- value of 0.0086. Again, with our decision of a 5% risk, we can reject the null hypothesis of equal mean body fat for men and women.

Other topics

If you have more than two independent groups, you cannot use the two-sample t- test. You should use a multiple comparison   method. ANOVA, or analysis of variance, is one such method. Other multiple comparison methods include the Tukey-Kramer test of all pairwise differences, analysis of means (ANOM) to compare group means to the overall mean or Dunnett’s test to compare each group mean to a control mean.

What if my data are not from normal distributions?

If your sample size is very small, it might be hard to test for normality. In this situation, you might need to use your understanding of the measurements. For example, for the body fat data, the trainer knows that the underlying distribution of body fat is normally distributed. Even for a very small sample, the trainer would likely go ahead with the t -test and assume normality.

What if you know the underlying measurements are not normally distributed? Or what if your sample size is large and the test for normality is rejected? In this situation, you can use nonparametric analyses. These types of analyses do not depend on an assumption that the data values are from a specific distribution. For the two-sample t ­-test, the Wilcoxon rank sum test is a nonparametric test that could be used.

Want to create or adapt books like this? Learn more about how Pressbooks supports open publishing practices.

What is a Hypothesis Test for 2 Samples?

Searching the internet for a definition of hypothesis testing for 2 samples brings back a lot of different results. Most of them are a little different. The definitions you will find online usually are disjointed, covering hypothesis testing for independent means, paired means, and proportions. Instead of giving one uniform definition, we’ll take a look at key components that are common to all of the tests, and then some of the specific components and notation.

The Basic Idea

The appearance of these hypothesis tests (in the real world) will be very similar to the tests that we see with one sample. In fact, the examples of hypothesis tests that were in the previous introduction include tests for one sample as well as two samples. The basic structure of these hypothesis tests are very similar to the ones we saw before. You have a problem, hypothesis, data collection, some computations, results or conclusions. Some of the notation will be slightly different. These examples below are the same ones we presented in the previous introduction, but here we are highlighting the two-sample variations. The examples with bolded terms are the ones that use 2 samples.

Some Examples of Hypothesis Tests

Example 1: agility testing in youth football (soccer)players; evaluating reliability, validity, and correlates of newly developed testing protocols.

Reactive agility (RAG)and change of direction speed (CODS) were analyzed in 13U and 15U youth soccer players. “ Independent samples t-test indicated significant differences between U13 and U15 in S10 (t-test: 3.57, p < 0.001), S20M (t-test: 3.13, p < 0.001), 20Y (t-test: 4.89, p < 0.001), FS_RAG (t-test: 3.96, p < 0.001), and FS_CODS (t-test: 6.42, p < 0.001), with better performance in U15. Starters outperformed non-starters in most capacities among U13, but only in FS_RAG among U15 (t-test: 1.56, p < 0.05).”

Most of this might seem like gibberish for now, but essentially the two groups were analyzed and compared, with significant differences observed between the groups. This is a hypothesis test for 2 means, independent samples.

Source: https://pubmed.ncbi.nlm.nih.gov/31906269/

Example 2: Manual therapy in the treatment of carpal tunnel syndrome in diabetic patients: A randomized clinical trial

Thirty diabetic patients with carpal tunnel syndrome were split up into two groups. One received physiotherapy modality and the other received manual therapy. “ Paired t-test revealed that all of the outcome measures had a significant change in the manual therapy group, whereas only the VAS and SSS changed significantly in the modality group at the end of 4 weeks. Independent t-test showed that the variables of SSS, FSS and MNT in the manual therapy group improved significantly greater than the modality group.”

This is a hypothesis test for matched pairs, sometimes known as 2 means, dependent samples.

Source: https://pubmed.ncbi.nlm.nih.gov/30197774/

Example 3: Omega-3 fatty acids decreased irritability of patients with bipolar disorder in an add-on, open label study

“The initial mean was 63.51 (SD 34.17), indicating that on average, subjects were irritable for about six of the previous ten days. The mean for the last recorded percentage was less than half of the initial score: 30.27 (SD 34.03). The decrease was found to be statistically significant using a paired sample t-test (t = 4.36, 36 df, p < .001).”

Source: https://nutritionj.biomedcentral.com/articles/10.1186/1475-2891-4-6

Example 4: Evaluating the Efficacy of COVID-19 Vaccines

“We reduced all values of vaccine efficacy by 30% to reflect the waning of vaccine efficacy against each endpoint over time. We tested the null hypothesis that the vaccine efficacy is 0% versus the alternative hypothesis that the vaccine efficacy is greater than 0% at the nominal significance level of 2.5%.”

Source: https://www.medrxiv.org/content/10.1101/2020.10.02.20205906v2.full

Example 5: Social Isolation During COVID-19 Pandemic. Perceived Stress and Containment Measures Compliance Among Polish and Italian Residents

“The Polish group had a higher stress level than the Italian group (mean PSS-10 total score 22,14 vs 17,01, respectively; p < 0.01). There was a greater prevalence of chronic diseases among Polish respondents. Italian subjects expressed more concern about their health, as well as about their future employment. Italian subjects did not comply with suggested restrictions as much as Polish subjects and were less eager to restrain from their usual activities (social, physical, and religious), which were more often perceived as “most needed matters” in Italian than in Polish residents.”

Even though the test wording itself does not explicitly state the tests we will study, this is a comparison of means from two different groups, so this is a test for two means, independent samples.

Source: https://www.frontiersin.org/articles/10.3389/fpsyg.2021.673514/full

Example 6: A Comparative Analysis of Student Performance in an Online vs. Face-to-Face Environmental Science Course From 2009 to 2016

“The independent sample t-test showed no significant difference in student performance between online and F2F learners with respect to gender [t(145) = 1.42, p = 0.122].”

Once again, a test of 2 means, independent samples.

Source: https://www.frontiersin.org/articles/10.3389/fcomp.2019.00007/full

But what does it all mean?

That’s what comes next. The examples above span a variety of different types of hypothesis tests. Within this chapter we will take a look at some of the terminology, formulas, and concepts related to Hypothesis Testing for 2 Samples.

Key Terminology and Formulas

Hypothesis: This is a claim or statement about a population, usually focusing on a parameter such as a proportion (%), mean, standard deviation, or variance. We will be focusing primarily on the proportion and the mean.

Hypothesis Test: Also known as a Significance Test or Test of Significance , the hypothesis test is the collection of procedures we use to test a claim about a population.

Null Hypothesis: This is a statement that the population parameter (such as the proportion, mean, standard deviation, or variance) is equal to some value. In simpler terms, the Null Hypothesis is a statement that “nothing is different from what usually happens.” The Null Hypothesis is usually denoted by [latex]H_{0}[/latex], followed by other symbols and notation that describe how the parameter from one population or group is the same as the parameter from another population or group.

Alternative Hypothesis: This is a statement that the population parameter (such as the proportion, mean, standard deviation, or variance) is somehow different the value involved in the Null Hypothesis. For our examples, “somehow different” will involve the use of [latex] [/latex], or [latex]\neq[/latex]. In simpler terms, the Alternative Hypothesis is a statement that “something is different from what usually happens.” The Alternative Hypothesis is usually denoted by [latex]H_{1}[/latex], [latex]H_{A}[/latex], or [latex]H_{a}[/latex], followed by other symbols and notation that describe how the parameter from one population or group is different from the parameter from another population or group.

Significance Level: We previous learned about the significance level as the “left over” stuff from the confidence level. This is still true, but we will now focus more on the significance level as its own value, and we will use the symbol alpha, [latex]\alpha[/latex]. This looks like a lowercase “a,” or a drawing of a little fish. The significance level [latex]\alpha[/latex] is the probability of rejecting the null hypothesis when it is actually true (more on what this means in the next section). The common values are still similar to what we had previously, 1%, 5%, and 10%. We commonly write these as decimals instead, 0.01, 0.05, and 0.10.

Test Statistic:  One of the key components of a hypothesis test is what we call a  test statistic . This is a calculation, sort of like a z-score, that is specific to the type of test being conducted. The idea behind a test statistic, relating it back to science projects, would be like calculations from measurements that were taken. In this chapter we will address the test statistic for 2 proportions, 2 means (independent samples), and matched pairs (2 means from dependent samples). The formulas are listed in the table below:

What the different symbols mean:

Critical Region: The critical region , also known as the rejection region , is the area in the normal (or other) distribution in which we reject the null hypothesis. Think of the critical region  like a target area that you are aiming for. If we are able to get a value in this region, it means we have evidence for the claim.

Critical Value: These are like special z-scores for us; the critical value  (or values, sometimes there are two) separates the critical region from the rest of the distribution. This is the non-target part, or what we are not aiming for. If our value is in this region, we do not have evidence for the claim.

P-Value: This is a special value that we compute. If we assume the null hypothesis is true, the p-value represents the probability that a test statistic is at least as extreme as the one we computed from our sample data; for us the test statistics would be either [latex]z[/latex] or [latex]t[/latex].

Decision Rule for Hypothesis Testing:  There are a few ways we can arrive at our decision with a hypothesis test. We can arrive at our conclusion by using confidence intervals, critical values (also known as traditional method), and using p-values. Relating this to a science project, the decision rule would be what we take into consideration to arrive at our conclusion. When we make our decision, the wording will sound a little strange. We’ll say things like “we have enough evidence to reject the null hypothesis” or “there is insufficient evidence to reject the null hypothesis.”

Decision Rule with Critical Values:  If the test statistic is in the critical region, we have enough evidence to reject the null hypothesis. We can also say we have sufficient evidence to support the claim. If the test statistic is not in the critical region, we fail to reject the null hypothesis. We can also say we do not have sufficient evidence to support the claim.

Decision Rule with P-Values: If the p-value is less than or equal to the significance level, we have enough evidence to reject the null hypothesis. We can also say we have sufficient evidence to support the claim. If the p-value is greater than the significance level, we fail to reject the null hypothesis. We can also say we do not have sufficient evidence to support the claim.

More About Hypotheses

Writing the Null and Alternative Hypothesis can be tricky. Here are a few examples of claims followed by the respective hypotheses:

Basic Statistics Copyright © by Allyn Leon is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License , except where otherwise noted.

Share This Book

Calcworkshop

Two Sample T Test Defined w/ 7 Step-by-Step Examples!

// Last Updated: October 9, 2020 - Watch Video //

Did you know that the two sample t test is used to calculate the difference between population means?

Jenn (B.S., M.Ed.) of Calcworkshop® teaching two sample t test

Jenn, Founder Calcworkshop ® , 15+ Years Experience (Licensed & Certified Teacher)

It’s true!

Now, there 3 ways to calculate the difference between means, as listed below:

  • If the population standard deviation is known (z-test)
  • Independent samples with an un-known standard deviation (two-sample-t-test)
  • pooled variances
  • un-pooled variances
  • Matched Pair

Let’s find out more!

So how do we compare the mean of some quantitative variables for two different populations?

If our parameters of interest are the population means , then the best approach is to take random samples from both populations and compare their sample means as noted on the Engineering Statistics Handbook .

In other words , we analyze the difference between two sample means to understand the average difference between the two populations. And as always, the larger the sample size the more accurate our inferences will be.

Just like we saw with one-sample means , we will either employ a z-test or t-test depending on whether or not the population standard deviation is known or unknown .

However, there is a component we must consider, if we have independent random samples where the population standard deviation is unknown – do we pool our variances ?

When we found the difference of population proportions, we automatically pooled our variances. However, with the difference of population means, we will have to check. We do this by finding an F-statistic .

If this F-statistic is less than or equal to the critical number, then we will pool our variances. Otherwise, we will not pool.

Please note, that it is infrequent to have two independent samples with equal, or almost equal, variances — therefore, the formula for un-pooled variations is more readily accepted for most high school statistics courses.

But it is an important skill to learn and understand, so we will be working through several examples of when we need to pool variances and when we do not.

Worked Example

For example, imagine the college provost at one school said their students study more, on average than those at the neighboring school.

However, the provost at the nearby school believed the study time was the same and wants to clear up the controversy.

So, independent random samples were taken from both schools, with the results stated below. And at a 5% significance level, the following significance test is conducted.

two sample t test pooled example

Two Sample T Test Pooled Example

Notice that we pooled our variances because our F-statistic yielded a value less than our critical value. The interpretation of our results are as follows:

  • Since the p-value is greater than our significance level, we fail to reject the null hypothesis.
  • And conclude that the students at both schools, on average, study the same amount.

Matched Pairs Test

But what do we do if the populations we wish to compare are not different but the same?

Meaning, the difference between means is due to the population’s varying conditions and not due to the experimental units in the study.

When this happens, we have what is called a Matched Pairs T Test .

The great thing about a paired t test is that it becomes a one-sample t-test on the differences.

And then we will calculate the sample mean and sample standard deviation, sometimes referred to as standard error, using these difference values.

matched pairs t test formula

Matched Pairs T Test Formula

What is important to remember with any of these tests, whether it be a z-test or a two-sample t-test, our conclusions will be the same as a one-sample test.

For example, once we find out the test statistic, we then determine our p-value, and if our p-value is less than or equal to our significance level, we will reject our null hypothesis.

one sample flow chart

One Sample Flow Chart

two sample flow chart

Two Sample Flow Chart

As the flow chart demonstrates above, our first step is to decide what type of test we are conducting. Is the standard deviation known? Do we have a one sample test or a two sample test or is it matched-pair?

Then, once we have identified the test we are using, our procedure is as follows:

  • Calculate the test statistic
  • Determine our p-value
  • If our p-value is less than or equal to our significance level, we will reject our null hypothesis.
  • Otherwise we fail to reject the null hypothesis

Together, we will work through various examples of all different hypothesis tests for the difference in population means, so we become comfortable with each formula and know why and how to use them effectively.

Two Sample T Test – Lesson & Examples (Video)

1 hr 22 min

  • Introduction to Video: Two Sample Hypothesis Test for Population Means
  • 00:00:37 – How to write a two sample hypothesis test when population standard deviation is known? (Example#1)
  • Exclusive Content for Members Only
  • 00:16:35 – Construct a two sample hypothesis test when population standard deviation is known (Example #2)
  • 00:26:01 – What is a Two-Sample t-test? Pooled variances or non-pooled variances?
  • 00:28:31 – Use a two sample t-test with un-pooled variances (Example #3)
  • 00:37:48 – Create a two sample t-test and confidence interval with pooled variances (Example #4)
  • 00:51:23 – Construct a two-sample t-test (Example #5)
  • 00:59:47 – Matched Pair one sample t-test (Example #6)
  • 01:09:38 – Use a match paired hypothesis test and provide a confidence interval for difference of means (Example #7)
  • Practice Problems with Step-by-Step Solutions
  • Chapter Tests with Video Solutions

Get access to all the courses and over 450 HD videos with your subscription

Monthly and Yearly Plans Available

Get My Subscription Now

Still wondering if CalcWorkshop is right for you? Take a Tour and find out how a membership can take the struggle out of learning math.

5 Star Excellence award from Shopper Approved for collecting at least 100 5 star reviews

two sample hypothesis test conditions

User Preferences

Content preview.

Arcu felis bibendum ut tristique et egestas quis:

  • Ut enim ad minim veniam, quis nostrud exercitation ullamco laboris
  • Duis aute irure dolor in reprehenderit in voluptate
  • Excepteur sint occaecat cupidatat non proident

Keyboard Shortcuts

7.2.2 - hypothesis testing, derivation of the test section  .

We are now going to develop the hypothesis test for the difference of two proportions for independent samples. The hypothesis test will follow the same six steps we learned in the previous Lesson although they are not explicitly stated.

We will use the sampling distribution of \(\hat{p}_1-\hat{p}_2\) as we did for the confidence interval. One major difference in the hypothesis test is the null hypothesis and assuming the null hypothesis is true.

For a test for two proportions, we are interested in the difference. If the difference is zero, then they are not different (i.e., they are equal). Therefore, the null hypothesis will always be:

\(H_0\colon p_1-p_2=0\)

Another way to look at it is \(H_0\colon p_1=p_2\). This is worth stopping to think about. Remember, in hypothesis testing, we assume the null hypothesis is true. In this case, it means that \(p_1\) and \(p_2\) are equal. Under this assumption, then \(\hat{p}_1\) and \(\hat{p}_2\) are both estimating the same proportion. Think of this proportion as \(p^*\). Therefore, the sampling distribution of both proportions, \(\hat{p}_1\) and \(\hat{p}_2\), will, under certain conditions, be approximately normal centered around \(p^*\), with standard error \(\sqrt{\dfrac{p^*(1-p^*)}{n_i}}\), for \(i=1, 2\).

We take this into account by finding an estimate for this \(p^*\) using the two sample proportions. We can calculate an estimate of \(p^*\) using the following formula:

\(\hat{p}^*=\dfrac{x_1+x_2}{n_1+n_2}\)

This value is the total number in the desired categories \((x_1+x_2)\) from both samples over the total number of sampling units in the combined sample \((n_1+n_2)\).

Putting everything together, if we assume \(p_1=p_2\), then the sampling distribution of \(\hat{p}_1-\hat{p}_2\) will be approximately normal with mean 0 and standard error of \(\sqrt{p^*(1-p^*)\left(\frac{1}{n_1}+\frac{1}{n_2}\right)}\), under certain conditions.

\(z^*=\dfrac{(\hat{p}_1-\hat{p}_2)-0}{\sqrt{\hat{p}^*(1-\hat{p}^*)\left(\dfrac{1}{n_1}+\dfrac{1}{n_2}\right)}}\)

...will follow a standard normal distribution.

Finally, we can develop our hypothesis test for \(p_1-p_2\).

Null: \(H_0\colon p_1-p_2=0\)

Possible Alternatives:

\(H_a\colon p_1-p_2\ne0\)

\(H_a\colon p_1-p_2>0\)

\(H_a\colon p_1-p_2<0\)

Conditions:

\(n_1\hat{p}_1\), \(n_1(1-\hat{p}_1)\), \(n_2\hat{p}_2\), and \(n_2(1-\hat{p}_2)\) are all greater than five

The test statistic is:

\(z^*=\dfrac{\hat{p}_1-\hat{p}_2-0}{\sqrt{\hat{p}^*(1-\hat{p}^*)\left(\dfrac{1}{n_1}+\dfrac{1}{n_2}\right)}}\)

...where \(\hat{p}^*=\dfrac{x_1+x_2}{n_1+n_2}\).

The critical values, rejection regions, p-values, and decisions will all follow the same steps as those from a hypothesis test for a one sample proportion.

Example 7-2: Received $100 by Mistake Section  

$100 Bill

Let's continue with the question that was asked previously.

Males and females were asked about what they would do if they received a $100 bill by mail, addressed to their neighbor, but wrongly delivered to them. Would they return it to their neighbor? Of the 69 males sampled, 52 said “yes” and of the 131 females sampled, 120 said “yes.”

Does the data indicate that the proportions that said “yes” are different for males and females at a 5% level of significance? Conduct the test using the p-value approach.

  • Using Minitab

Again, let’s define males as sample 1.

The conditions are all satisfied as we have shown previously.

The null and alternative hypotheses are:

\(H_0\colon p_1-p_2=0\) vs \(H_a\colon p_1-p_2\ne 0\)

The test statistic:

\(n_1=69\), \(\hat{p}_1=\frac{52}{69}\)

\(n_2=131\), \(\hat{p}_2=\frac{120}{131}\)

\(\hat{p}^*=\dfrac{x_1+x_2}{n_1+n_2}=\dfrac{52+120}{69+131}=\dfrac{172}{200}=0.86\)

\(z^*=\dfrac{\hat{p}_1-\hat{p}_2-0}{\sqrt{\hat{p}^*(1-\hat{p}^*)\left(\frac{1}{n_1}+\frac{1}{n_2}\right)}}=\dfrac{\dfrac{52}{69}-\dfrac{120}{131}}{\sqrt{0.86(1-0.86)\left(\frac{1}{69}+\frac{1}{131}\right)}}=-3.1466\)

The p-value of the test based on the two-sided alternative is:

\(\text{p-value}=2P(Z>|-3.1466|)=2P(Z>3.1466)=2(0.0008)=0.0016\)

Since our p-value of 0.0016 is less than our significance level of 5%, we reject the null hypothesis. There is enough evidence to suggest that proportions of males and females who would return the money are different.

  Minitab: Inference for Two Proportions with Independent Samples

To conduct inference for two proportions with an independent sample in Minitab...

The following window will appear. In the drop-down choose ‘Summarized data’ and entered the number of events and trials for both samples.

Minitab window for two-sample prortion test.

You should get the following output for this example:

Test and CI for Two Proportions

Difference = p (1) - p (2)

Estimate for difference: -0.162407

95% CI for difference: (-0.274625, -0.0501900)

Test for difference = 0 (vs  ≠ 0): Z = -3.15 P-Value = 0.002 (Use this!)

Fisher's exact test: P-Value = 0.003 (Ignore the Fisher's exact test. This test uses a different method to calculate a test statistic from the Z-test we have learned in this lesson.)

Ignore the Fisher's p -value! The p -value highlighted above is calculated using the methods we learned in this lesson. The Fisher's test uses a different method than what we explained in this lesson to calculate a test statistic and p -value. This method incorporates a log of the ratio of observed to expected values. It's just a different technique that is more complicated to do by-hand. Minitab automatically includes both results in its output.

Try it! Section  

In 1980, of 750 men 20-34 years old, 130 were found to be overweight. Whereas, in 1990, of 700 men, 20-34 years old, 160 were found to be overweight.

At the 5% significance level, do the data provide sufficient evidence to conclude that, for men 20-34 years old, a higher percentage were overweight in 1990 than 10 years earlier? Conduct the test using the p-value approach.

Let’s define 1990 as sample 1.

\(H_0\colon p_1-p_2=0\) vs \(H_a\colon p_1-p_2>0\)

\(n_1=700\), \(\hat{p}_1=\frac{160}{700}\)

\(n_2=750\), \(\hat{p}_2=\frac{130}{750}\)

\(\hat{p}^*=\dfrac{x_1+x_2}{n_1+n_2}=\dfrac{160+130}{700+750}=\dfrac{290}{1450}=0.2\)

The conditions are all satisfied: \(n_1\hat{p}_1\), \(n_1(1-\hat{p}_1)\), \(n_2\hat{p}_2\), and \(n_2(1-\hat{p}_2)\) are all greater than 5.

\(z^*=\dfrac{\hat{p}_1-\hat{p}_2-0}{\sqrt{\hat{p}^*(1-\hat{p}^*)\left(\frac{1}{n_1}+\frac{1}{n_2}\right)}}=\dfrac{\dfrac{160}{700}-\dfrac{130}{750}}{\sqrt{0.2(1-0.2)\left(\frac{1}{700}+\frac{1}{750}\right)}}=2.6277\)

The p-value of the test based on the right-tailed alternative is:

\(\text{p-value}=P(Z>2.6277)=0.0043\)

Since our p-value of 0.0043 is less than our significance level of 5%, we reject the null hypothesis. There is enough evidence to suggest that the proportion of males overweight in 1990 is greater than the proportion in 1980.

  Using Minitab

To conduct inference for two proportions with independent samples in Minitab...

  • Choose Stat > Basic Statistics > 2 proportions
  • Choose Options

Select "Difference < hypothesized difference" for 'Alternative Hypothesis.

You should get the following output.

Estimate for difference: -0.0552381

95% upper bound for difference: -0.0206200

Test for difference = 0 (vs < 0): Z = -2.63 P-Value = 0.004

Fisher's exact test: P-Value = 0.005 (Ignore the Fisher's exact test)

Teach yourself statistics

Hypothesis Test: Difference Between Means

This lesson explains how to conduct a hypothesis test for the difference between two means. The test procedure, called the two-sample t-test , is appropriate when the following conditions are met:

  • The sampling method for each sample is simple random sampling .
  • The samples are independent .
  • Each population is at least 20 times larger than its respective sample .
  • The population distribution is normal.
  • The population data are symmetric , unimodal , without outliers , and the sample size is 15 or less.
  • The population data are slightly skewed , unimodal, without outliers, and the sample size is 16 to 40.
  • The sample size is greater than 40, without outliers.

This approach consists of four steps: (1) state the hypotheses, (2) formulate an analysis plan, (3) analyze sample data, and (4) interpret results.

State the Hypotheses

Every hypothesis test requires the analyst to state a null hypothesis and an alternative hypothesis . The hypotheses are stated in such a way that they are mutually exclusive. That is, if one is true, the other must be false; and vice versa.

The table below shows three sets of null and alternative hypotheses. Each makes a statement about the difference d between the mean of one population μ 1 and the mean of another population μ 2 . (In the table, the symbol ≠ means " not equal to ".)

The first set of hypotheses (Set 1) is an example of a two-tailed test , since an extreme value on either side of the sampling distribution would cause a researcher to reject the null hypothesis. The other two sets of hypotheses (Sets 2 and 3) are one-tailed tests , since an extreme value on only one side of the sampling distribution would cause a researcher to reject the null hypothesis.

When the null hypothesis states that there is no difference between the two population means (i.e., d = 0), the null and alternative hypothesis are often stated in the following form.

H o : μ 1 = μ 2

H a : μ 1 ≠ μ 2

Formulate an Analysis Plan

The analysis plan describes how to use sample data to accept or reject the null hypothesis. It should specify the following elements.

  • Significance level. Often, researchers choose significance levels equal to 0.01, 0.05, or 0.10; but any value between 0 and 1 can be used.
  • Test method. Use the two-sample t-test to determine whether the difference between means found in the sample is significantly different from the hypothesized difference between means.

Analyze Sample Data

Using sample data, find the standard error, degrees of freedom, test statistic, and the P-value associated with the test statistic.

SE = sqrt[ (s 1 2 /n 1 ) + (s 2 2 /n 2 ) ]

DF = (s 1 2 /n 1 + s 2 2 /n 2 ) 2 / { [ (s 1 2 / n 1 ) 2 / (n 1 - 1) ] + [ (s 2 2 / n 2 ) 2 / (n 2 - 1) ] }

t = [ ( x 1 - x 2 ) - d ] / SE

  • P-value. The P-value is the probability of observing a sample statistic as extreme as the test statistic. Since the test statistic is a t statistic, use the t Distribution Calculator to assess the probability associated with the t statistic, having the degrees of freedom computed above. (See sample problems at the end of this lesson for examples of how this is done.)

Interpret Results

If the sample findings are unlikely, given the null hypothesis, the researcher rejects the null hypothesis. Typically, this involves comparing the P-value to the significance level , and rejecting the null hypothesis when the P-value is less than the significance level.

Test Your Understanding

In this section, two sample problems illustrate how to conduct a hypothesis test of a difference between mean scores. The first problem involves a two-tailed test; the second problem, a one-tailed test.

Problem 1: Two-Tailed Test

Within a school district, students were randomly assigned to one of two Math teachers - Mrs. Smith and Mrs. Jones. After the assignment, Mrs. Smith had 30 students, and Mrs. Jones had 25 students.

At the end of the year, each class took the same standardized test. Mrs. Smith's students had an average test score of 78, with a standard deviation of 10; and Mrs. Jones' students had an average test score of 85, with a standard deviation of 15.

Test the hypothesis that Mrs. Smith and Mrs. Jones are equally effective teachers. Use a 0.10 level of significance. (Assume that student performance is approximately normal.)

Solution: The solution to this problem takes four steps: (1) state the hypotheses, (2) formulate an analysis plan, (3) analyze sample data, and (4) interpret results. We work through those steps below:

State the hypotheses. The first step is to state the null hypothesis and an alternative hypothesis.

Null hypothesis: μ 1 - μ 2 = 0

Alternative hypothesis: μ 1 - μ 2 ≠ 0

  • Formulate an analysis plan . For this analysis, the significance level is 0.10. Using sample data, we will conduct a two-sample t-test of the null hypothesis.

SE = sqrt[(s 1 2 /n 1 ) + (s 2 2 /n 2 )]

SE = sqrt[(10 2 /30) + (15 2 /25] = sqrt(3.33 + 9)

SE = sqrt(12.33) = 3.51

DF = (10 2 /30 + 15 2 /25) 2 / { [ (10 2 / 30) 2 / (29) ] + [ (15 2 / 25) 2 / (24) ] }

DF = (3.33 + 9) 2 / { [ (3.33) 2 / (29) ] + [ (9) 2 / (24) ] } = 152.03 / (0.382 + 3.375) = 152.03/3.757 = 40.47

t = [ ( x 1 - x 2 ) - d ] / SE = [ (78 - 85) - 0 ] / 3.51 = -7/3.51 = -1.99

where s 1 is the standard deviation of sample 1, s 2 is the standard deviation of sample 2, n 1 is the size of sample 1, n 2 is the size of sample 2, x 1 is the mean of sample 1, x 2 is the mean of sample 2, d is the hypothesized difference between the population means, and SE is the standard error.

Since we have a two-tailed test , the P-value is the probability that a t statistic having 40 degrees of freedom is more extreme than -1.99; that is, less than -1.99 or greater than 1.99.

We use the t Distribution Calculator to find P(t < -1.99) is about 0.027.

  • If you enter 1.99 as the sample mean in the t Distribution Calculator, you will find the that the P(t ≤ 1.99) is about 0.973. Therefore, P(t > 1.99) is 1 minus 0.973 or 0.027. Thus, the P-value = 0.027 + 0.027 = 0.054.
  • Interpret results . Since the P-value (0.054) is less than the significance level (0.10), we cannot accept the null hypothesis.

Note: If you use this approach on an exam, you may also want to mention why this approach is appropriate. Specifically, the approach is appropriate because the sampling method was simple random sampling, the samples were independent, the sample size was much smaller than the population size, and the samples were drawn from a normal population.

Problem 2: One-Tailed Test

The Acme Company has developed a new battery. The engineer in charge claims that the new battery will operate continuously for at least 7 minutes longer than the old battery.

To test the claim, the company selects a simple random sample of 100 new batteries and 100 old batteries. The old batteries run continuously for 190 minutes with a standard deviation of 20 minutes; the new batteries, 200 minutes with a standard deviation of 40 minutes.

Test the engineer's claim that the new batteries run at least 7 minutes longer than the old. Use a 0.05 level of significance. (Assume that there are no outliers in either sample.)

Null hypothesis: μ 1 - μ 2 <= 7

Alternative hypothesis: μ 1 - μ 2 > 7

where μ 1 is battery life for the new battery, and μ 2 is battery life for the old battery.

  • Formulate an analysis plan . For this analysis, the significance level is 0.05. Using sample data, we will conduct a two-sample t-test of the null hypothesis.

SE = sqrt[(40 2 /100) + (20 2 /100]

SE = sqrt(16 + 4) = 4.472

DF = (40 2 /100 + 20 2 /100) 2 / { [ (40 2 / 100) 2 / (99) ] + [ (20 2 / 100) 2 / (99) ] }

DF = (20) 2 / { [ (16) 2 / (99) ] + [ (2) 2 / (99) ] } = 400 / (2.586 + 0.162) = 145.56

t = [ ( x 1 - x 2 ) - d ] / SE = [(200 - 190) - 7] / 4.472 = 3/4.472 = 0.67

where s 1 is the standard deviation of sample 1, s 2 is the standard deviation of sample 2, n 1 is the size of sample 1, n 2 is the size of sample 2, x 1 is the mean of sample 1, x 2 is the mean of sample 2, d is the hypothesized difference between population means, and SE is the standard error.

Here is the logic of the analysis: Given the alternative hypothesis (μ 1 - μ 2 > 7), we want to know whether the observed difference in sample means is big enough (i.e., sufficiently greater than 7) to cause us to reject the null hypothesis.

Interpret results . Suppose we replicated this study many times with different samples. If the true difference in population means were actually 7, we would expect the observed difference in sample means to be 10 or less in 75% of our samples. And we would expect to find an observed difference to be more than 10 in 25% of our samples Therefore, the P-value in this analysis is 0.25.

Statology

Statistics Made Easy

Two Sample Z-Test: Definition, Formula, and Example

A  two sample z-test is used to test whether two population means are equal.

This test assumes that the standard deviation of each population is known.

This tutorial explains the following:

  • The formula to perform a two sample z-test.
  • The assumptions of a two sample z-test.
  • An example of how to perform a two sample z-test.

Let’s jump in!

Two Sample Z-Test: Formula

A two sample z-test uses the following null and alternative hypotheses:

  • H 0 :  μ 1 = μ 2 (the two population means are equal)
  • H A :  μ 1 ≠ μ 2 (the two population means are not equal)

We use the following formula to calculate the z test statistic:

  • z = ( x 1 – x 2 ) / √ σ 1 2 /n 1 + σ 2 2 /n 2 )
  • x 1 , x 2 : sample means
  • σ 1 , σ 2 : population standard deviations
  • n 1 , n 2 : sample sizes

If the p-value that corresponds to the z test statistic is less than your chosen significance level (common choices are 0.10, 0.05, and 0.01) then you can reject the null hypothesis .

Two Sample Z-Test: Assumptions

For the results of a two sample z-test to be valid, the following assumptions should be met:

  • The data from each population are continuous (not discrete).
  • Each sample is a simple random sample from the population of interest.
  • The data in each population is approximately normally distributed .
  • The population standard deviations are known.

Two Sample Z-Test : Example

Suppose the IQ levels among individuals in two different cities are known to be normally distributed each with population standard deviations of 15.

A scientist wants to know if the mean IQ level between individuals in city A and city B are different, so she selects a simple random sample of  20 individuals from each city and records their IQ levels.

To test this, she will perform a two sample z-test at significance level α = 0.05 using the following steps:

Step 1: Gather the sample data.

Suppose she collects two simple random samples with the following information:

  •   x 1  (sample 1 mean IQ) = 100.65
  • n 1 (sample 1 size) = 20
  • x 2 (sample 2 mean IQ) = 108.8
  • n 2 (sample 2 size) = 20

Step 2: Define the hypotheses.

She will perform the two sample z-test with the following hypotheses:

Step 3: Calculate the z test statistic.

The z test statistic is calculated as:

  • z = (100.65-108.8) / √ 15 2 /20 + 15 2 /20)

Step 4: Calculate the p-value of the z test statistic.

According to the Z Score to P Value Calculator , the two-tailed p-value associated with z = -1.718 is 0.0858 .

Step 5: Draw a conclusion.

Since the p-value (0.0858) is not less than the significance level (.05), the scientist will fail to reject the null hypothesis.

There is not sufficient evidence to say that the mean IQ level is different between the two populations.

Note:  You can also perform this entire two sample z-test by using the Two Sample Z-Test Calculator .

Additional Resources

The following tutorials explain how to perform a two sample z-test using different statistical software:

How to Perform Z-Tests in Excel How to Perform Z-Tests in R How to Perform Z-Tests in Python

Featured Posts

5 Statistical Biases to Avoid

Hey there. My name is Zach Bobbitt. I have a Master of Science degree in Applied Statistics and I’ve worked on machine learning algorithms for professional businesses in both healthcare and retail. I’m passionate about statistics, machine learning, and data visualization and I created Statology to be a resource for both students and teachers alike.  My goal with this site is to help you learn statistics through using simple terms, plenty of real-world examples, and helpful illustrations.

One Reply to “Two Sample Z-Test: Definition, Formula, and Example”

I’m a 200 Level Statistics Student. And this has really helped me.

God bless you Soo much.

Leave a Reply Cancel reply

Your email address will not be published. Required fields are marked *

Library homepage

  • school Campus Bookshelves
  • menu_book Bookshelves
  • perm_media Learning Objects
  • login Login
  • how_to_reg Request Instructor Account
  • hub Instructor Commons
  • Download Page (PDF)
  • Download Full Book (PDF)
  • Periodic Table
  • Physics Constants
  • Scientific Calculator
  • Reference & Cite
  • Tools expand_more
  • Readability

selected template will load here

This action is not available.

Statistics LibreTexts

10.E: Hypothesis Testing with Two Samples (Exercises)

  • Last updated
  • Save as PDF
  • Page ID 1149

These are homework exercises to accompany the Textmap created for "Introductory Statistics" by OpenStax.

10.1: Introduction

10.2: two population means with unknown standard deviations.

Use the following information to answer the next 15 exercises: Indicate if the hypothesis test is for

  • independent group means, population standard deviations, and/or variances known
  • independent group means, population standard deviations, and/or variances unknown
  • matched or paired samples
  • single mean
  • two proportions
  • single proportion

Exercise 10.2.3

It is believed that 70% of males pass their drivers test in the first attempt, while 65% of females pass the test in the first attempt. Of interest is whether the proportions are in fact equal.

Exercise 10.2.4

A new laundry detergent is tested on consumers. Of interest is the proportion of consumers who prefer the new brand over the leading competitor. A study is done to test this.

Exercise 10.2.5

A new windshield treatment claims to repel water more effectively. Ten windshields are tested by simulating rain without the new treatment. The same windshields are then treated, and the experiment is run again. A hypothesis test is conducted.

Exercise 10.2.6

The known standard deviation in salary for all mid-level professionals in the financial industry is $11,000. Company A and Company B are in the financial industry. Suppose samples are taken of mid-level professionals from Company A and from Company B. The sample mean salary for mid-level professionals in Company A is $80,000. The sample mean salary for mid-level professionals in Company B is $96,000. Company A and Company B management want to know if their mid-level professionals are paid differently, on average.

Exercise 10.2.7

The average worker in Germany gets eight weeks of paid vacation.

Exercise 10.2.8

According to a television commercial, 80% of dentists agree that Ultrafresh toothpaste is the best on the market.

Exercise 10.2.9

It is believed that the average grade on an English essay in a particular school system for females is higher than for males. A random sample of 31 females had a mean score of 82 with a standard deviation of three, and a random sample of 25 males had a mean score of 76 with a standard deviation of four.

  • independent group means, population standard deviations and/or variances unknown

Exercise 10.2.10

The league mean batting average is 0.280 with a known standard deviation of 0.06. The Rattlers and the Vikings belong to the league. The mean batting average for a sample of eight Rattlers is 0.210, and the mean batting average for a sample of eight Vikings is 0.260. There are 24 players on the Rattlers and 19 players on the Vikings. Are the batting averages of the Rattlers and Vikings statistically different?

Exercise 10.2.11

In a random sample of 100 forests in the United States, 56 were coniferous or contained conifers. In a random sample of 80 forests in Mexico, 40 were coniferous or contained conifers. Is the proportion of conifers in the United States statistically more than the proportion of conifers in Mexico?

Exercise 10.2.12

A new medicine is said to help improve sleep. Eight subjects are picked at random and given the medicine. The means hours slept for each person were recorded before starting the medication and after.

Exercise 10.2.13

It is thought that teenagers sleep more than adults on average. A study is done to verify this. A sample of 16 teenagers has a mean of 8.9 hours slept and a standard deviation of 1.2. A sample of 12 adults has a mean of 6.9 hours slept and a standard deviation of 0.6.

Exercise 10.2.14

Varsity athletes practice five times a week, on average.

Exercise 10.2.15

A sample of 12 in-state graduate school programs at school A has a mean tuition of $64,000 with a standard deviation of $8,000. At school B, a sample of 16 in-state graduate programs has a mean of $80,000 with a standard deviation of $6,000. On average, are the mean tuitions different?

Exercise 10.2.16

A new WiFi range booster is being offered to consumers. A researcher tests the native range of 12 different routers under the same conditions. The ranges are recorded. Then the researcher uses the new WiFi range booster and records the new ranges. Does the new WiFi range booster do a better job?

Exercise 10.2.17

A high school principal claims that 30% of student athletes drive themselves to school, while 4% of non-athletes drive themselves to school. In a sample of 20 student athletes, 45% drive themselves to school. In a sample of 35 non-athlete students, 6% drive themselves to school. Is the percent of student athletes who drive themselves to school more than the percent of nonathletes?

Use the following information to answer the next three exercises: A study is done to determine which of two soft drinks has more sugar. There are 13 cans of Beverage A in a sample and six cans of Beverage B. The mean amount of sugar in Beverage A is 36 grams with a standard deviation of 0.6 grams. The mean amount of sugar in Beverage B is 38 grams with a standard deviation of 0.8 grams. The researchers believe that Beverage B has more sugar than Beverage A, on average. Both populations have normal distributions.

Exercise 10.2.18

Are standard deviations known or unknown?

Exercise 10.2.19

What is the random variable?

The random variable is the difference between the mean amounts of sugar in the two soft drinks.

Exercise 10.2.20

Is this a one-tailed or two-tailed test?

Use the following information to answer the next 12 exercises: The U.S. Center for Disease Control reports that the mean life expectancy was 47.6 years for whites born in 1900 and 33.0 years for nonwhites. Suppose that you randomly survey death records for people born in 1900 in a certain county. Of the 124 whites, the mean life span was 45.3 years with a standard deviation of 12.7 years. Of the 82 nonwhites, the mean life span was 34.1 years with a standard deviation of 15.6 years. Conduct a hypothesis test to see if the mean life spans in the county were the same for whites and nonwhites.

Exercise 10.2.21

Is this a test of means or proportions?

Exercise 10.2.22

State the null and alternative hypotheses.

  • \(H_{0}\): __________
  • \(H_{a}\): __________

Exercise 10.2.23

Is this a right-tailed, left-tailed, or two-tailed test?

Exercise 10.2.24

In symbols, what is the random variable of interest for this test?

Exercise 10.2.25

In words, define the random variable of interest for this test.

the difference between the mean life spans of whites and nonwhites

Exercise 10.2.26

Which distribution (normal or Student's t ) would you use for this hypothesis test?

Exercise 10.2.27

Explain why you chose the distribution you did for Exercise .

This is a comparison of two population means with unknown population standard deviations.

Exercise 10.2.28

Calculate the test statistic and \(p\text{-value}\).

Exercise 10.2.29

Sketch a graph of the situation. Label the horizontal axis. Mark the hypothesized difference and the sample difference. Shade the area corresponding to the \(p\text{-value}\).

This is a horizontal axis with arrows at each end. The axis is labeled p'N - p'ND

  • Check student’s solution.

Exercise 10.2.30

Find the \(p\text{-value}\).

Exercise 10.2.31

At a pre-conceived \(\alpha = 0.05\), what is your:

  • Reason for the decision:
  • Conclusion (write out in a complete sentence):
  • Reject the null hypothesis
  • \(p\text{-value} < 0.05\)
  • There is not enough evidence at the 5% level of significance to support the claim that life expectancy in the 1900s is different between whites and nonwhites.

Exercise 10.2.32

Does it appear that the means are the same? Why or why not?

DIRECTIONS: For each of the word problems, use a solution sheet to do the hypothesis test. The solution sheet is found in Appendix E . Please feel free to make copies of the solution sheets. For the online version of the book, it is suggested that you copy the .doc or the .pdf files.

If you are using a Student's t -distribution for a homework problem in what follows, including for paired data, you may assume that the underlying population is normally distributed. (When using these tests in a real situation, you must first prove that assumption, however.)

The mean number of English courses taken in a two–year time period by male and female college students is believed to be about the same. An experiment is conducted and data are collected from 29 males and 16 females. The males took an average of three English courses with a standard deviation of 0.8. The females took an average of four English courses with a standard deviation of 1.0. Are the means statistically the same?

A student at a four-year college claims that mean enrollment at four–year colleges is higher than at two–year colleges in the United States. Two surveys are conducted. Of the 35 two–year colleges surveyed, the mean enrollment was 5,068 with a standard deviation of 4,777. Of the 35 four-year colleges surveyed, the mean enrollment was 5,466 with a standard deviation of 8,191.

Subscripts: 1: two-year colleges; 2: four-year colleges

  • \(H_{0}: \mu_{1} \geq \mu_{2}\)
  • \(H_{a}: \mu_{1} < \mu_{2}\)
  • \(\bar{X}_{1} - \bar{X}_{2}\) is the difference between the mean enrollments of the two-year colleges and the four-year colleges.
  • Student’s- t
  • test statistic: -0.2480
  • \(p\text{-value}: 0.4019\)
  • Alpha: 0.05
  • Decision: Do not reject
  • Reason for Decision: \(p\text{-value} > \alpha\)
  • Conclusion: At the 5% significance level, there is sufficient evidence to conclude that the mean enrollment at four-year colleges is higher than at two-year colleges.

At Rachel’s 11 th birthday party, eight girls were timed to see how long (in seconds) they could hold their breath in a relaxed position. After a two-minute rest, they timed themselves while jumping. The girls thought that the mean difference between their jumping and relaxed times would be zero. Test their hypothesis.

Mean entry-level salaries for college graduates with mechanical engineering degrees and electrical engineering degrees are believed to be approximately the same. A recruiting office thinks that the mean mechanical engineering salary is actually lower than the mean electrical engineering salary. The recruiting office randomly surveys 50 entry level mechanical engineers and 60 entry level electrical engineers. Their mean salaries were $46,100 and $46,700, respectively. Their standard deviations were $3,450 and $4,210, respectively. Conduct a hypothesis test to determine if you agree that the mean entry-level mechanical engineering salary is lower than the mean entry-level electrical engineering salary.

Subscripts: 1: mechanical engineering; 2: electrical engineering

  • \(\bar{X}_{1} - \bar{X}_{2}\) is the difference between the mean entry level salaries of mechanical engineers and electrical engineers.
  • \(t_{108}\)
  • test statistic: \(t = -0.82\)
  • \(p\text{-value}: 0.2061\)
  • \(\alpha: 0.05\)
  • Decision: Do not reject the null hypothesis.
  • Conclusion: At the 5% significance level, there is insufficient evidence to conclude that the mean entry-level salaries of mechanical engineers is lower than that of electrical engineers.

Marketing companies have collected data implying that teenage girls use more ring tones on their cellular phones than teenage boys do. In one particular study of 40 randomly chosen teenage girls and boys (20 of each) with cellular phones, the mean number of ring tones for the girls was 3.2 with a standard deviation of 1.5. The mean for the boys was 1.7 with a standard deviation of 0.8. Conduct a hypothesis test to determine if the means are approximately the same or if the girls’ mean is higher than the boys’ mean.

Use the information from [link] to answer the next four exercises.

Using the data from Lap 1 only, conduct a hypothesis test to determine if the mean time for completing a lap in races is the same as it is in practices.

  • \(H_{0}: \mu_{1} = \mu_{2}\)

\(H_{a}: \mu_{1} \neq \mu_{2}\)

  • \(\bar{X}_{1} - \bar{X}_{2}\) is the difference between the mean times for completing a lap in races and in practices.
  • \(t_{20.32}\)
  • test statistic: –4.70
  • \(p\text{-value}: 0.0001\)
  • Decision: Reject the null hypothesis.
  • Conclusion: At the 5% significance level, there is sufficient evidence to conclude that the mean time for completing a lap in races is different from that in practices.

Repeat the test in Exercise 10.83, but use Lap 5 data this time.

Repeat the test in Exercise 10.83, but this time combine the data from Laps 1 and 5.

  • is the difference between the mean times for completing a lap in races and in practices.
  • \(t_{40.94}\)
  • test statistic: –5.08
  • \(p\text{-value}: 0\)
  • Reason for Decision: \(p\text{-value} < \alpha\)

In two to three complete sentences, explain in detail how you might use Terri Vogel’s data to answer the following question. “Does Terri Vogel drive faster in races than she does in practices?”

Use the following information to answer the next two exercises. The Eastern and Western Major League Soccer conferences have a new Reserve Division that allows new players to develop their skills. Data for a randomly picked date showed the following annual goals.

Conduct a hypothesis test to answer the next two exercises.

The exact distribution for the hypothesis test is:

  • the normal distribution
  • the Student's t -distribution
  • the uniform distribution
  • the exponential distribution

If the level of significance is 0.05, the conclusion is:

  • There is sufficient evidence to conclude that the W Division teams score fewer goals, on average, than the E teams
  • There is insufficient evidence to conclude that the W Division teams score more goals, on average, than the E teams.
  • There is insufficient evidence to conclude that the W teams score fewer goals, on average, than the E teams score.
  • Unable to determine

Suppose a statistics instructor believes that there is no significant difference between the mean class scores of statistics day students on Exam 2 and statistics night students on Exam 2. She takes random samples from each of the populations. The mean and standard deviation for 35 statistics day students were 75.86 and 16.91. The mean and standard deviation for 37 statistics night students were 75.41 and 19.73. The “day” subscript refers to the statistics day students. The “night” subscript refers to the statistics night students. A concluding statement is:

  • There is sufficient evidence to conclude that statistics night students' mean on Exam 2 is better than the statistics day students' mean on Exam 2.
  • There is insufficient evidence to conclude that the statistics day students' mean on Exam 2 is better than the statistics night students' mean on Exam 2.
  • There is insufficient evidence to conclude that there is a significant difference between the means of the statistics day students and night students on Exam 2.
  • There is sufficient evidence to conclude that there is a significant difference between the means of the statistics day students and night students on Exam 2.

Researchers interviewed street prostitutes in Canada and the United States. The mean age of the 100 Canadian prostitutes upon entering prostitution was 18 with a standard deviation of six. The mean age of the 130 United States prostitutes upon entering prostitution was 20 with a standard deviation of eight. Is the mean age of entering prostitution in Canada lower than the mean age in the United States? Test at a 1% significance level.

Test: two independent sample means, population standard deviations unknown.

Random variable:

\[\bar{X}_{1} - \bar{X}_{2}\]

Distribution: \(H_{0}: \mu_{1} = \mu_{2} H_{a}: \mu_{1} < \mu_{2}\) The mean age of entering prostitution in Canada is lower than the mean age in the United States.

This is a normal distribution curve with mean equal to zero. A vertical line near the tail of the curve to the left of zero extends from the axis to the curve. The region under the curve to the left of the line is shaded representing p-value = 0.0157.

Graph: left-tailed

\(p\text{-value}: 0.0151\)

Decision: Do not reject \(H_{0}\).

Conclusion: At the 1% level of significance, from the sample data, there is not sufficient evidence to conclude that the mean age of entering prostitution in Canada is lower than the mean age in the United States.

A powder diet is tested on 49 people, and a liquid diet is tested on 36 different people. Of interest is whether the liquid diet yields a higher mean weight loss than the powder diet. The powder diet group had a mean weight loss of 42 pounds with a standard deviation of 12 pounds. The liquid diet group had a mean weight loss of 45 pounds with a standard deviation of 14 pounds.

Suppose a statistics instructor believes that there is no significant difference between the mean class scores of statistics day students on Exam 2 and statistics night students on Exam 2. She takes random samples from each of the populations. The mean and standard deviation for 35 statistics day students were 75.86 and 16.91, respectively. The mean and standard deviation for 37 statistics night students were 75.41 and 19.73. The “day” subscript refers to the statistics day students. The “night” subscript refers to the statistics night students. An appropriate alternative hypothesis for the hypothesis test is:

  • \(\mu_{day} > \mu_{night}\)
  • \(\mu_{day} < \mu_{night}\)
  • \(\mu_{day} = \mu_{night}\)
  • \(\mu_{day} \neq \mu_{night}\)

10.3: Two Population Means with Known Standard Deviations

Use the following information to answer the next five exercises. The mean speeds of fastball pitches from two different baseball pitchers are to be compared. A sample of 14 fastball pitches is measured from each pitcher. The populations have normal distributions. Table shows the result. Scouters believe that Rodriguez pitches a speedier fastball.

Exercise 10.3.2

The difference in mean speeds of the fastball pitches of the two pitchers

Exercise 10.3.3

Exercise 10.3.4

What is the test statistic?

Exercise 10.3.5

What is the \(p\text{-value}\)?

Exercise 10.3.6

At the 1% significance level, we can reject the null hypothesis. There is sufficient data to conclude that the mean speed of Rodriguez’s fastball is faster than Wesley’s.

Use the following information to answer the next five exercises. A researcher is testing the effects of plant food on plant growth. Nine plants have been given the plant food. Another nine plants have not been given the plant food. The heights of the plants are recorded after eight weeks. The populations have normal distributions. The following table is the result. The researcher thinks the food makes the plants grow taller.

Exercise 10.3.7

Is the population standard deviation known or unknown?

Exercise 10.3.8

Subscripts: 1 = Food, 2 = No Food

  • \(H_{a}: \mu_{1} > \mu_{2}\)

Exercise 10.3.9

Exercise 10.3.10

Draw the graph of the \(p\text{-value}\).

This is a normal distribution curve with mean equal to zero. The values 0 and 0.1 are labeled on the horiztonal axis. A vertical line extends from 0.1 to the curve. The region under the curve to the right of the line is shaded to represent p-value = 0.0198.

Exercise 10.3.11

At the 1% significance level, what is your conclusion?

Use the following information to answer the next five exercises. Two metal alloys are being considered as material for ball bearings. The mean melting point of the two alloys is to be compared. 15 pieces of each metal are being tested. Both populations have normal distributions. The following table is the result. It is believed that Alloy Zeta has a different melting point.

Exercise 10.3.12

Subscripts: 1 = Gamma, 2 = Zeta

Exercise 10.3.13

Is this a right-, left-, or two-tailed test?

Exercise 10.3.14

Exercise 10.3.15

Exercise 10.3.16

There is sufficient evidence to reject the null hypothesis. The data support that the melting point for Alloy Zeta is different from the melting point of Alloy Gamma.

DIRECTIONS: For each of the word problems, use a solution sheet to do the hypothesis test. The solution sheet is found in [link] . Please feel free to make copies of the solution sheets. For the online version of the book, it is suggested that you copy the .doc or the .pdf files.

If you are using a Student's t -distribution for one of the following homework problems, including for paired data, you may assume that the underlying population is normally distributed. (When using these tests in a real situation, you must first prove that assumption, however.)

A study is done to determine if students in the California state university system take longer to graduate, on average, than students enrolled in private universities. One hundred students from both the California state university system and private universities are surveyed. Suppose that from years of research, it is known that the population standard deviations are 1.5811 years and 1 year, respectively. The following data are collected. The California state university system students took on average 4.5 years with a standard deviation of 0.8. The private university students took on average 4.1 years with a standard deviation of 0.3.

Parents of teenage boys often complain that auto insurance costs more, on average, for teenage boys than for teenage girls. A group of concerned parents examines a random sample of insurance bills. The mean annual cost for 36 teenage boys was $679. For 23 teenage girls, it was $559. From past years, it is known that the population standard deviation for each group is $180. Determine whether or not you believe that the mean cost for auto insurance for teenage boys is greater than that for teenage girls.

Subscripts: 1 = boys, 2 = girls

  • \(H_{0}: \mu_{1} \leq \mu_{2}\)
  • The random variable is the difference in the mean auto insurance costs for boys and girls.
  • test statistic: \(z = 2.50\)
  • \(p\text{-value}: 0.0062\)
  • Conclusion: At the 5% significance level, there is sufficient evidence to conclude that the mean cost of auto insurance for teenage boys is greater than that for girls.

A group of transfer bound students wondered if they will spend the same mean amount on texts and supplies each year at their four-year university as they have at their community college. They conducted a random survey of 54 students at their community college and 66 students at their local four-year university. The sample means were $947 and $1,011, respectively. The population standard deviations are known to be $254 and $87, respectively. Conduct a hypothesis test to determine if the means are statistically the same.

Some manufacturers claim that non-hybrid sedan cars have a lower mean miles-per-gallon (mpg) than hybrid ones. Suppose that consumers test 21 hybrid sedans and get a mean of 31 mpg with a standard deviation of seven mpg. Thirty-one non-hybrid sedans get a mean of 22 mpg with a standard deviation of four mpg. Suppose that the population standard deviations are known to be six and three, respectively. Conduct a hypothesis test to evaluate the manufacturers claim.

Subscripts: 1 = non-hybrid sedans, 2 = hybrid sedans

  • The random variable is the difference in the mean miles per gallon of non-hybrid sedans and hybrid sedans.
  • test statistic: 6.36
  • Reason for decision: \(p\text{-value} < \alpha\)
  • Conclusion: At the 5% significance level, there is sufficient evidence to conclude that the mean miles per gallon of non-hybrid sedans is less than that of hybrid sedans.

A baseball fan wanted to know if there is a difference between the number of games played in a World Series when the American League won the series versus when the National League won the series. From 1922 to 2012, the population standard deviation of games won by the American League was 1.14, and the population standard deviation of games won by the National League was 1.11. Of 19 randomly selected World Series games won by the American League, the mean number of games won was 5.76. The mean number of 17 randomly selected games won by the National League was 5.42. Conduct a hypothesis test.

One of the questions in a study of marital satisfaction of dual-career couples was to rate the statement “I’m pleased with the way we divide the responsibilities for childcare.” The ratings went from one (strongly agree) to five (strongly disagree). Table contains ten of the paired responses for husbands and wives. Conduct a hypothesis test to see if the mean difference in the husband’s versus the wife’s satisfaction level is negative (meaning that, within the partnership, the husband is happier than the wife).

  • \(H_{0}: \mu_{d} = 0\)

\(H_{a}: \mu_{d} < 0\)

  • The random variable \(X_{d}\) is the average difference between husband’s and wife’s satisfaction level.
  • test statistic: \(t = –1.86\)
  • \(p\text{-value}: 0.0479\)
  • Check student’s solution
  • Decision: Reject the null hypothesis, but run another test.
  • Conclusion: This is a weak test because alpha and the p -value are close. However, there is insufficient evidence to conclude that the mean difference is negative.

10.4: Comparing Two Independent Population Proportions

Use the following information for the next five exercises. Two types of phone operating system are being tested to determine if there is a difference in the proportions of system failures (crashes). Fifteen out of a random sample of 150 phones with OS 1 had system failures within the first eight hours of operation. Nine out of another random sample of 150 phones with OS 2 had system failures within the first eight hours of operation. OS 2 is believed to be more stable (have fewer crashes) than OS 1 .

Exercise 10.4.2

Exercise 10.4.3

\(P'_{OS_{1}} - P'_{OS_{2}} =\) difference in the proportions of phones that had system failures within the first eight hours of operation with OS 1 and OS 2 .

Exercise 10.4.4

Exercise 10.4.5

Exercise 10.4.6

What can you conclude about the two operating systems?

Use the following information to answer the next twelve exercises. In the recent Census, three percent of the U.S. population reported being of two or more races. However, the percent varies tremendously from state to state. Suppose that two random surveys are conducted. In the first random survey, out of 1,000 North Dakotans, only nine people reported being of two or more races. In the second random survey, out of 500 Nevadans, 17 people reported being of two or more races. Conduct a hypothesis test to determine if the population percents are the same for the two states or if the percent for Nevada is statistically higher than for North Dakota.

Exercise 10.4.7

proportions

Exercise 10.4.8

  • \(H_{0}\): _________
  • \(H_{a}\): _________

Exercise 10.4.9

Is this a right-tailed, left-tailed, or two-tailed test? How do you know?

right-tailed

Exercise 10.4.10

What is the random variable of interest for this test?

Exercise 10.4.11

In words, define the random variable for this test.

The random variable is the difference in proportions (percents) of the populations that are of two or more races in Nevada and North Dakota.

Exercise 10.4.12

Exercise 10.4.13

Explain why you chose the distribution you did for the Exercise 10.56 .

Our sample sizes are much greater than five each, so we use the normal for two proportions distribution for this hypothesis test.

Exercise 10.4.14

Calculate the test statistic.

Exercise 10.4.15

Sketch a graph of the situation. Mark the hypothesized difference and the sample difference. Shade the area corresponding to the \(p\text{-value}\).

This is a horizontal axis with arrows at each end. The axis is labeled p'N - p'ND

Exercise 10.4.16

Exercise 10.4.17

  • Reject the null hypothesis.
  • \(p\text{-value} < \alpha\)
  • At the 5% significance level, there is sufficient evidence to conclude that the proportion (percent) of the population that is of two or more races in Nevada is statistically higher than that in North Dakota.

Exercise 10.4.18

Does it appear that the proportion of Nevadans who are two or more races is higher than the proportion of North Dakotans? Why or why not?

If you are using a Student's t -distribution for one of the following homework problems, including for paired data, you may assume that the underlying population is normally distributed. (In general, you must first prove that assumption, however.)

A recent drug survey showed an increase in the use of drugs and alcohol among local high school seniors as compared to the national percent. Suppose that a survey of 100 local seniors and 100 national seniors is conducted to see if the proportion of drug and alcohol use is higher locally than nationally. Locally, 65 seniors reported using drugs or alcohol within the past month, while 60 national seniors reported using them.

We are interested in whether the proportions of female suicide victims for ages 15 to 24 are the same for the whites and the blacks races in the United States. We randomly pick one year, 1992, to compare the races. The number of suicides estimated in the United States in 1992 for white females is 4,930. Five hundred eighty were aged 15 to 24. The estimate for black females is 330. Forty were aged 15 to 24. We will let female suicide victims be our population.

  • \(H_{0}: P_{W} = P_{B}\)
  • \(H_{a}: P_{W} \neq P_{B}\)
  • The random variable is the difference in the proportions of white and black suicide victims, aged 15 to 24.
  • normal for two proportions
  • test statistic: –0.1944
  • \(p\text{-value}: 0.8458\)
  • Reason for decision: \(p\text{-value} > \alpha\)
  • Conclusion: At the 5% significance level, there is insufficient evidence to conclude that the proportions of white and black female suicide victims, aged 15 to 24, are different.

Elizabeth Mjelde, an art history professor, was interested in whether the value from the Golden Ratio formula, \(\left(\frac{(larger + smaller dimension}{larger dimension}\right)\) was the same in the Whitney Exhibit for works from 1900 to 1919 as for works from 1920 to 1942. Thirty-seven early works were sampled, averaging 1.74 with a standard deviation of 0.11. Sixty-five of the later works were sampled, averaging 1.746 with a standard deviation of 0.1064. Do you think that there is a significant difference in the Golden Ratio calculation?

A recent year was randomly picked from 1985 to the present. In that year, there were 2,051 Hispanic students at Cabrillo College out of a total of 12,328 students. At Lake Tahoe College, there were 321 Hispanic students out of a total of 2,441 students. In general, do you think that the percent of Hispanic students at the two colleges is basically the same or different?

Subscripts: 1 = Cabrillo College, 2 = Lake Tahoe College

  • \(H_{0}: p_{1} = p_{2}\)
  • \(H_{a}: p_{1} \neq p_{2}\)
  • The random variable is the difference between the proportions of Hispanic students at Cabrillo College and Lake Tahoe College.
  • test statistic: 4.29
  • \(p\text{-value}: 0.00002\)
  • Reason for decision: p -value < alpha
  • Conclusion: There is sufficient evidence to conclude that the proportions of Hispanic students at Cabrillo College and Lake Tahoe College are different.

Use the following information to answer the next three exercises. Neuroinvasive West Nile virus is a severe disease that affects a person’s nervous system . It is spread by the Culex species of mosquito. In the United States in 2010 there were 629 reported cases of neuroinvasive West Nile virus out of a total of 1,021 reported cases and there were 486 neuroinvasive reported cases out of a total of 712 cases reported in 2011. Is the 2011 proportion of neuroinvasive West Nile virus cases more than the 2010 proportion of neuroinvasive West Nile virus cases? Using a 1% level of significance, conduct an appropriate hypothesis test.

  • “2011” subscript: 2011 group.
  • “2010” subscript: 2010 group
  • a test of two proportions
  • a test of two independent means
  • a test of a single mean
  • a test of matched pairs.

An appropriate null hypothesis is:

  • \(p_{2011} \leq p_{2010}\)
  • \(p_{2011} \geq p_{2010}\)
  • \(\mu_{2011} \leq \mu_{2010}\)
  • \(p_{2011} > p_{2010}\)

The \(p\text{-value}\) is 0.0022. At a 1% level of significance, the appropriate conclusion is

  • There is sufficient evidence to conclude that the proportion of people in the United States in 2011 who contracted neuroinvasive West Nile disease is less than the proportion of people in the United States in 2010 who contracted neuroinvasive West Nile disease.
  • There is insufficient evidence to conclude that the proportion of people in the United States in 2011 who contracted neuroinvasive West Nile disease is more than the proportion of people in the United States in 2010 who contracted neuroinvasive West Nile disease.
  • There is insufficient evidence to conclude that the proportion of people in the United States in 2011 who contracted neuroinvasive West Nile disease is less than the proportion of people in the United States in 2010 who contracted neuroinvasive West Nile disease.
  • There is sufficient evidence to conclude that the proportion of people in the United States in 2011 who contracted neuroinvasive West Nile disease is more than the proportion of people in the United States in 2010 who contracted neuroinvasive West Nile disease.

Researchers conducted a study to find out if there is a difference in the use of eReaders by different age groups. Randomly selected participants were divided into two age groups. In the 16- to 29-year-old group, 7% of the 628 surveyed use eReaders, while 11% of the 2,309 participants 30 years old and older use eReaders.

Test: two independent sample proportions.

Random variable: \(p′_{1} - p′_{2}\)

Distribution:

The proportion of eReader users is different for the 16- to 29-year-old users from that of the 30 and older users.

Graph: two-tailed

This is a normal distribution curve with mean equal to zero. Both the right and left tails of the curve are shaded. Each tail represents 1/2(p-value) = 0.0017.

\(p\text{-value}: 0.0033\)

Conclusion: At the 5% level of significance, from the sample data, there is sufficient evidence to conclude that the proportion of eReader users 16 to 29 years old is different from the proportion of eReader users 30 and older.

are considered obese if their body mass index (BMI) is at least 30. The researchers wanted to determine if the proportion of women who are obese in the south is less than the proportion of southern men who are obese. The results are shown in Table . Test at the 1% level of significance.

Two computer users were discussing tablet computers. A higher proportion of people ages 16 to 29 use tablets than the proportion of people age 30 and older. Table details the number of tablet owners for each age group. Test at the 1% level of significance.

Test: two independent sample proportions

  • \(H_{a}: p_{1} > p_{2}\)

A higher proportion of tablet owners are aged 16 to 29 years old than are 30 years old and older.

Graph: right-tailed

This is a normal distribution curve with mean equal to zero. A vertical line near the tail of the curve to the right of zero extends from the axis to the curve. The region under the curve to the right of the line is shaded representing p-value = 0.2354.

\(p\text{-value}: 0.2354\)

Decision: Do not reject the \(H_{0}\).

Conclusion: At the 1% level of significance, from the sample data, there is not sufficient evidence to conclude that a higher proportion of tablet owners are aged 16 to 29 years old than are 30 years old and older.

A group of friends debated whether more men use smartphones than women. They consulted a research study of smartphone use among adults. The results of the survey indicate that of the 973 men randomly sampled, 379 use smartphones. For women, 404 of the 1,304 who were randomly sampled use smartphones. Test at the 5% level of significance.

While her husband spent 2½ hours picking out new speakers, a statistician decided to determine whether the percent of men who enjoy shopping for electronic equipment is higher than the percent of women who enjoy shopping for electronic equipment. The population was Saturday afternoon shoppers. Out of 67 men, 24 said they enjoyed the activity. Eight of the 24 women surveyed claimed to enjoy the activity. Interpret the results of the survey.

Subscripts: 1: men; 2: women

  • \(H_{0}: p_{1} \leq p_{2}\)
  • \(P'_{1} - P\_{2}\) is the difference between the proportions of men and women who enjoy shopping for electronic equipment.
  • test statistic: 0.22
  • \(p\text{-value}: 0.4133\)
  • Conclusion: At the 5% significance level, there is insufficient evidence to conclude that the proportion of men who enjoy shopping for electronic equipment is more than the proportion of women.

We are interested in whether children’s educational computer software costs less, on average, than children’s entertainment software. Thirty-six educational software titles were randomly picked from a catalog. The mean cost was $31.14 with a standard deviation of $4.69. Thirty-five entertainment software titles were randomly picked from the same catalog. The mean cost was $33.86 with a standard deviation of $10.87. Decide whether children’s educational software costs less, on average, than children’s entertainment software.

Joan Nguyen recently claimed that the proportion of college-age males with at least one pierced ear is as high as the proportion of college-age females. She conducted a survey in her classes. Out of 107 males, 20 had at least one pierced ear. Out of 92 females, 47 had at least one pierced ear. Do you believe that the proportion of males has reached the proportion of females?

  • \(P'_{1} - P\_{2}\) is the difference between the proportions of men and women that have at least one pierced ear.
  • test statistic: –4.82
  • Conclusion: At the 5% significance level, there is sufficient evidence to conclude that the proportions of males and females with at least one pierced ear is different.

Use the data sets found in [link] to answer this exercise. Is the proportion of race laps Terri completes slower than 130 seconds less than the proportion of practice laps she completes slower than 135 seconds?

"To Breakfast or Not to Breakfast?" by Richard Ayore

In the American society, birthdays are one of those days that everyone looks forward to. People of different ages and peer groups gather to mark the 18th, 20th, …, birthdays. During this time, one looks back to see what he or she has achieved for the past year and also focuses ahead for more to come.

If, by any chance, I am invited to one of these parties, my experience is always different. Instead of dancing around with my friends while the music is booming, I get carried away by memories of my family back home in Kenya. I remember the good times I had with my brothers and sister while we did our daily routine.

Every morning, I remember we went to the shamba (garden) to weed our crops. I remember one day arguing with my brother as to why he always remained behind just to join us an hour later. In his defense, he said that he preferred waiting for breakfast before he came to weed. He said, “This is why I always work more hours than you guys!”

And so, to prove him wrong or right, we decided to give it a try. One day we went to work as usual without breakfast, and recorded the time we could work before getting tired and stopping. On the next day, we all ate breakfast before going to work. We recorded how long we worked again before getting tired and stopping. Of interest was our mean increase in work time. Though not sure, my brother insisted that it was more than two hours. Using the data in Table , solve our problem.

  • \(H_{a}: \mu_{d} > 0\)
  • The random variable \(X_{d}\) is the mean difference in work times on days when eating breakfast and on days when not eating breakfast.
  • test statistic: 4.8963

\(p\text{-value}: 0.0004\)

  • Reason for Decision:\(p\text{-value} < \alpha\)
  • Conclusion: At the 5% level of significance, there is sufficient evidence to conclude that the mean difference in work times on days when eating breakfast and on days when not eating breakfast has increased.

10.5: Matched or Paired Samples

Use the following information to answer the next five exercises. A study was conducted to test the effectiveness of a software patch in reducing system failures over a six-month period. Results for randomly selected installations are shown in Table . The “before” value is matched to an “after” value, and the differences are calculated. The differences have a normal distribution. Test at the 1% significance level.

Exercise 10.5.4

the mean difference of the system failures

Exercise 10.5.5

Exercise 10.5.6

Exercise 10.5.7

Exercise 10.5.8

What conclusion can you draw about the software patch?

With a \(p\text{-value} 0.0067\), we can reject the null hypothesis. There is enough evidence to support that the software patch is effective in reducing the number of system failures.

Use the following information to answer next five exercises. A study was conducted to test the effectiveness of a juggling class. Before the class started, six subjects juggled as many balls as they could at once. After the class, the same six subjects juggled as many balls as they could. The differences in the number of balls are calculated. The differences have a normal distribution. Test at the 1% significance level.

Exercise 10.5.9

Exercise 10.5.10

Exercise 10.5.11

What is the sample mean difference?

Exercise 10.5.12

This is a normal distribution curve with mean equal to zero. The values 0 and 1.67 are labeled on the horiztonal axis. A vertical line extends from 1.67 to the curve. The region under the curve to the right of the line is shaded to represent p-value = 0.0021.

Exercise 10.5.13

What conclusion can you draw about the juggling class?

Use the following information to answer the next five exercises. A doctor wants to know if a blood pressure medication is effective. Six subjects have their blood pressures recorded. After twelve weeks on the medication, the same six subjects have their blood pressure recorded again. For this test, only systolic pressure is of concern. Test at the 1% significance level.

Exercise 10.5.14

\(H_{0}: \mu_{d} \geq 0\)

Exercise 10.5.15

Exercise 10.5.16

Exercise 10.5.17

Exercise 10.5.18

What is the conclusion?

We decline to reject the null hypothesis. There is not sufficient evidence to support that the medication is effective.

Bringing It Together

Use the following information to answer the next ten exercises. indicate which of the following choices best identifies the hypothesis test.

  • independent group means, population standard deviations and/or variances known

Exercise 10.5.19

A powder diet is tested on 49 people, and a liquid diet is tested on 36 different people. The population standard deviations are two pounds and three pounds, respectively. Of interest is whether the liquid diet yields a higher mean weight loss than the powder diet.

Exercise 10.5.20

A new chocolate bar is taste-tested on consumers. Of interest is whether the proportion of children who like the new chocolate bar is greater than the proportion of adults who like it.

Exercise 10.5.21

The mean number of English courses taken in a two–year time period by male and female college students is believed to be about the same. An experiment is conducted and data are collected from nine males and 16 females.

Exercise 10.5.22

A football league reported that the mean number of touchdowns per game was five. A study is done to determine if the mean number of touchdowns has decreased.

Exercise 10.5.23

A study is done to determine if students in the California state university system take longer to graduate than students enrolled in private universities. One hundred students from both the California state university system and private universities are surveyed. From years of research, it is known that the population standard deviations are 1.5811 years and one year, respectively.

Exercise 10.5.24

According to a YWCA Rape Crisis Center newsletter, 75% of rape victims know their attackers. A study is done to verify this.

Exercise 10.5.25

According to a recent study, U.S. companies have a mean maternity-leave of six weeks.

Exercise 10.5.26

A recent drug survey showed an increase in use of drugs and alcohol among local high school students as compared to the national percent. Suppose that a survey of 100 local youths and 100 national youths is conducted to see if the proportion of drug and alcohol use is higher locally than nationally.

Exercise 10.5.27

A new SAT study course is tested on 12 individuals. Pre-course and post-course scores are recorded. Of interest is the mean increase in SAT scores. The following data are collected:

Exercise 10.5.28

University of Michigan researchers reported in the Journal of the National Cancer Institute that quitting smoking is especially beneficial for those under age 49. In this American Cancer Society study, the risk (probability) of dying of lung cancer was about the same as for those who had never smoked.

Exercise 10.5.29

Lesley E. Tan investigated the relationship between left-handedness vs. right-handedness and motor competence in preschool children. Random samples of 41 left-handed preschool children and 41 right-handed preschool children were given several tests of motor skills to determine if there is evidence of a difference between the children based on this experiment. The experiment produced the means and standard deviations shown Table . Determine the appropriate test and best distribution to use for that test.

  • Two independent means, normal distribution
  • Two independent means, Student’s-t distribution
  • Matched or paired samples, Student’s-t distribution
  • Two population proportions, normal distribution

Exercise 10.5.30

A golf instructor is interested in determining if her new technique for improving players’ golf scores is effective. She takes four (4) new students. She records their 18-hole scores before learning the technique and then after having taken her class. She conducts a hypothesis test. The data are as Table .

  • a test of two independent means.
  • a test of two proportions.
  • a test of a single mean.
  • a test of a single proportion.

If you are using a Student's t -distribution for the homework problems, including for paired data, you may assume that the underlying population is normally distributed. (When using these tests in a real situation, you must first prove that assumption, however.)

Ten individuals went on a low–fat diet for 12 weeks to lower their cholesterol. The data are recorded in Table . Do you think that their cholesterol levels were significantly lowered?

\(p\text{-value} = 0.1494\)

At the 5% significance level, there is insufficient evidence to conclude that the medication lowered cholesterol levels after 12 weeks.

Use the following information to answer the next two exercises. A new AIDS prevention drug was tried on a group of 224 HIV positive patients. Forty-five patients developed AIDS after four years. In a control group of 224 HIV positive patients, 68 developed AIDS after four years. We want to test whether the method of treatment reduces the proportion of patients that develop AIDS after four years or if the proportions of the treated group and the untreated group stay the same.

Let the subscript \(t =\) treated patient and \(ut =\) untreated patient.

The appropriate hypotheses are:

  • \(H_{0}: p_{t} < p_{ut}\) and \(H_{a}: p_{t} \geq p_{ut}\)
  • \(H_{0}: p_{t} \leq p_{ut}\) and \(H_{a}: p_{t} > p_{ut}\)
  • \(H_{0}: p_{t} = p_{ut}\) and \(H_{a}: p_{t} \neq p_{ut}\)
  • \(H_{0}: p_{t} = p_{ut}\) and \(H_{a}: p_{t} < p_{ut}\)

If the \(p\text{-value}\) is 0.0062 what is the conclusion (use \(\alpha = 0.05\))?

  • The method has no effect.
  • There is sufficient evidence to conclude that the method reduces the proportion of HIV positive patients who develop AIDS after four years.
  • There is sufficient evidence to conclude that the method increases the proportion of HIV positive patients who develop AIDS after four years.
  • There is insufficient evidence to conclude that the method reduces the proportion of HIV positive patients who develop AIDS after four years.

Use the following information to answer the next two exercises. An experiment is conducted to show that blood pressure can be consciously reduced in people trained in a “biofeedback exercise program.” Six subjects were randomly selected and blood pressure measurements were recorded before and after the training. The difference between blood pressures was calculated (after - before) producing the following results: \(\bar{x}_{d} = -10.2\) \(s_{d} = 8.4\). Using the data, test the hypothesis that the blood pressure has decreased after the training.

The distribution for the test is:

  • \(N(-10.2, 8.4)\)
  • \(N\left(-10.2, \frac{8.4}{\sqrt{6}}\right)\)

If \(\alpha = 0.05\), the \(p\text{-value}\) and the conclusion are

  • 0.0014; There is sufficient evidence to conclude that the blood pressure decreased after the training.
  • 0.0014; There is sufficient evidence to conclude that the blood pressure increased after the training.
  • 0.0155; There is sufficient evidence to conclude that the blood pressure decreased after the training.
  • 0.0155; There is sufficient evidence to conclude that the blood pressure increased after the training.

A golf instructor is interested in determining if her new technique for improving players’ golf scores is effective. She takes four new students. She records their 18-hole scores before learning the technique and then after having taken her class. She conducts a hypothesis test. The data are as follows.

The correct decision is:

  • Reject \(H_{0}\).
  • Do not reject the \(H_{0}\).

A local cancer support group believes that the estimate for new female breast cancer cases in the south is higher in 2013 than in 2012. The group compared the estimates of new female breast cancer cases by southern state in 2012 and in 2013. The results are in Table .

Test: two matched pairs or paired samples ( t -test)

Random variable: \(\bar{X}_{d}\)

Distribution: \(t_{12}\)

\(H_{0}: \mu_{d} = 0 H_{a}: \mu_{d} > 0\)

The mean of the differences of new female breast cancer cases in the south between 2013 and 2012 is greater than zero. The estimate for new female breast cancer cases in the south is higher in 2013 than in 2012.

This is a normal distribution curve with mean equal to zero. A vertical line near the tail of the curve to the right of zero extends from the axis to the curve. The region under the curve to the right of the line is shaded representing p-value = 0.0004.

Decision: Reject \(H_{0}\)

Conclusion: At the 5% level of significance, from the sample data, there is sufficient evidence to conclude that there was a higher estimate of new female breast cancer cases in 2013 than in 2012.

A traveler wanted to know if the prices of hotels are different in the ten cities that he visits the most often. The list of the cities with the corresponding hotel prices for his two favorite hotel chains is in Table. Test at the 1% level of significance.

A politician asked his staff to determine whether the underemployment rate in the northeast decreased from 2011 to 2012. The results are in Table.

Test: matched or paired samples ( t -test)

Difference data: \(\{–0.9, –3.7, –3.2, –0.5, 0.6, –1.9, –0.5, 0.2, 0.6, 0.4, 1.7, –2.4, 1.8\}\)

Random Variable: \(\bar{X}_{d}\)

Distribution: \(H_{0}: \mu_{d} = 0 H_{a}: \mu_{d} < 0\)

The mean of the differences of the rate of underemployment in the northeastern states between 2012 and 2011 is less than zero. The underemployment rate went down from 2011 to 2012.

Graph: left-tailed.

This is a normal distribution curve with mean equal to zero. A vertical line near the tail of the curve to the right of zero extends from the axis to the curve. The region under the curve to the right of the line is shaded representing p-value = 0.1207.

\(p\text{-value}: 0.1207\)

Conclusion: At the 5% level of significance, from the sample data, there is not sufficient evidence to conclude that there was a decrease in the underemployment rates of the northeastern states from 2011 to 2012.

10.6: Hypothesis Testing for Two Means and Two Proportions

IMAGES

  1. Two Sample t Test (Independent Samples)

    two sample hypothesis test conditions

  2. Hypothesis Testing Example Two Sample t-Test

    two sample hypothesis test conditions

  3. Two Sample T-Test (Two Means)

    two sample hypothesis test conditions

  4. Hypothesis Testing with Two Samples

    two sample hypothesis test conditions

  5. PPT

    two sample hypothesis test conditions

  6. Hypothesis Testing: Upper, Lower, and Two Tailed Tests

    two sample hypothesis test conditions

VIDEO

  1. Two-Sample Hypothesis Testing

  2. Two-Sample Hypothesis Testing: Dependent Sample

  3. Hypothesis Test Two Population Means Using Statcrunch Example 1

  4. Two-Sample Hypothesis Tests

  5. Two-Sample Hypothesis: Pooled t-Test

  6. Two sample hypothesis testing: T test and z test

COMMENTS

  1. Two Sample t-test: Definition, Formula, and Example

    A two sample t-test is used to determine whether or not two population means are equal. ... 0.05, and 0.01) then you can reject the null hypothesis. Two Sample t-test: Assumptions. For the results of a two sample t-test to be valid, the following assumptions should be met:

  2. 10: Hypothesis Testing with Two Samples

    10.5: Matched or Paired Samples When using a hypothesis test for matched or paired samples, the following characteristics should be present: Simple random sampling is used. Sample sizes are often small. Two measurements (samples) are drawn from the same pair of individuals or objects. Differences are calculated from the matched or paired samples.

  3. Two Sample t-test: Definition, Formula, and Example

    Fortunately, a two sample t-test allows us to answer this question. Two Sample t-test: Formula. A two-sample t-test always uses the following null hypothesis: H 0: μ 1 = μ 2 (the two population means are equal) The alternative hypothesis can be either two-tailed, left-tailed, or right-tailed:

  4. T Test Overview: How to Use & Examples

    Two-Sample T Test Hypotheses. Null hypothesis (H 0): Two population means are equal (µ 1 = µ 2). Alternative hypothesis (H A): Two population means are not equal (µ 1 ≠ µ 2). Again, when the p-value is less than or equal to your significance level, reject the null hypothesis. The difference between the two means is statistically significant.

  5. Hypotheses for a two-sample t test (video)

    If that's below your significance level, then you would reject your null hypothesis and it would suggest the alternative that might be that, "Hey, maybe this mean "is greater than zero." On the other hand, a two-sample T test is where you're thinking about two different populations. For example, you could be thinking about a population of men ...

  6. Two-Sample t-Test

    The two-sample t-test (also known as the independent samples t-test) ... We can reject the hypothesis of equal mean body fat for the two groups and conclude that we have evidence body fat differs in the population between men and women. The software shows a p-value of 0.0107. We decided on a 5% risk of concluding the mean body fat for men and ...

  7. Two-sample t test for difference of means

    And let's assume that we are working with a significance level of 0.05. So pause the video, and conduct the two sample T test here, to see whether there's evidence that the sizes of tomato plants differ between the fields. Alright, now let's work through this together. So like always, let's first construct our null hypothesis.

  8. Two-sample hypothesis testing

    In statistical hypothesis testing, a two-sample test is a test performed on the data of two random samples, each independently obtained from a different given population. The purpose of the test is to determine whether the difference between these two populations is statistically significant . There are a large number of statistical tests that ...

  9. Example of hypotheses for paired and two-sample t tests

    First of all, if you have two groups, one testing one placebo, then it's 2 samples. If it is the same group before and after, then paired t-test. I'm trying to run a dependent sample t-test/paired sample t test through using data from a Qualtrics survey measuring two groups of people (one with social anxiety and one without on the effects of ...

  10. Writing hypotheses to test the difference of means

    Conclusion for a two-sample t test using a confidence interval. Making conclusions about the difference of means. ... Assume that these participants can be considered a representative sample and that all other necessary conditions for inference were met. Which of these is the most appropriate test and alternative hypothesis?

  11. 5.5

    5.5 - Hypothesis Testing for Two-Sample Proportions. We are now going to develop the hypothesis test for the difference of two proportions for independent samples. The hypothesis test follows the same steps as one group. These notes are going to go into a little bit of math and formulas to help demonstrate the logic behind hypothesis testing ...

  12. Chapter 15 Hypothesis Testing: Two Sample Tests

    15.1.2 Two Sample t test approach. For this we can use the two-sample t-test to compare the means of these two distinct populations. Here the alternative hypothesis is that the lottery players score more points H A: μL > μN L H A: μ L > μ N L thus the null hypothesis is H 0: μL ≤ μN L. H 0: μ L ≤ μ N L. We can now perform the test ...

  13. Hypothesis Testing for 2 Samples: Introduction

    The mean for the last recorded percentage was less than half of the initial score: 30.27 (SD 34.03). The decrease was found to be statistically significant using a paired sample t-test (t = 4.36, 36 df, p < .001).". This is a hypothesis test for matched pairs, sometimes known as 2 means, dependent samples.

  14. Hypothesis Testing

    There are 5 main steps in hypothesis testing: State your research hypothesis as a null hypothesis and alternate hypothesis (H o) and (H a or H 1 ). Collect data in a way designed to test the hypothesis. Perform an appropriate statistical test. Decide whether to reject or fail to reject your null hypothesis. Present the findings in your results ...

  15. 10: Hypothesis Testing with Two Samples

    When using a hypothesis test for matched or paired samples, the following characteristics should be present: Simple random sampling is used. Sample sizes are often small. Two measurements (samples) are drawn from the same pair of individuals or objects. Differences are calculated from the matched or paired samples.

  16. Two Sample T Test (Defined w/ 7 Step-by-Step Examples!)

    00:37:48 - Create a two sample t-test and confidence interval with pooled variances (Example #4) 00:51:23 - Construct a two-sample t-test (Example #5) 00:59:47 - Matched Pair one sample t-test (Example #6) 01:09:38 - Use a match paired hypothesis test and provide a confidence interval for difference of means (Example #7) Practice ...

  17. 7.2.2

    We can calculate an estimate of p ∗ using the following formula: p ^ ∗ = x 1 + x 2 n 1 + n 2. This value is the total number in the desired categories ( x 1 + x 2) from both samples over the total number of sampling units in the combined sample ( n 1 + n 2). Putting everything together, if we assume p 1 = p 2, then the sampling distribution ...

  18. Hypothesis Test: Difference in Means

    The first step is to state the null hypothesis and an alternative hypothesis. Null hypothesis: μ 1 - μ 2 = 0. Alternative hypothesis: μ 1 - μ 2 ≠ 0. Note that these hypotheses constitute a two-tailed test. The null hypothesis will be rejected if the difference between sample means is too big or if it is too small.

  19. Two Sample t-test Calculator

    If this is not the case, you should instead use the Welch's t-test calculator. To perform a two sample t-test, simply fill in the information below and then click the "Calculate" button. Enter raw data Enter summary data. Sample 1. 301, 298, 295, 297, 304, 305, 309, 298, 291, 299, 293, 304. Sample 2.

  20. Two Sample Z-Test: Definition, Formula, and Example

    To test this, she will perform a two sample z-test at significance level α = 0.05 using the following steps: Step 1: Gather the sample data. Suppose she collects two simple random samples with the following information: x 1 (sample 1 mean IQ) = 100.65; n 1 (sample 1 size) = 20; x 2 (sample 2 mean IQ) = 108.8; n 2 (sample 2 size) = 20

  21. 8: Hypothesis Testing with Two Samples

    8.5: Matched or Paired Samples When using a hypothesis test for matched or paired samples, the following characteristics should be present: Simple random sampling is used. Sample sizes are often small. Two measurements (samples) are drawn from the same pair of individuals or objects. Differences are calculated from the matched or paired samples.

  22. 10.29: Hypothesis Test for a Difference in Two Population Means (1 of 2)

    We use this hypothesis test when the data meets the following conditions. The two random samples are independent. The variable is normally distributed in both populations. If this variable is not known, samples of more than 30 will have a difference in sample means that can be modeled adequately by the t-distribution. As we discussed in ...

  23. Verifying Conditions for Conducting a Hypothesis Test for a Mean are

    How to Verify Conditions for Conducting a Hypothesis Test for a Mean are Met. Step 1: Verify that the sample is random. Step 2: Verify that either the population is normally distributed or the ...

  24. 10.E: Hypothesis Testing with Two Samples (Exercises)

    Use the following information to answer the next 15 exercises: Indicate if the hypothesis test is for. independent group means, population standard deviations, and/or variances known. independent group means, population standard deviations, and/or variances unknown. matched or paired samples. single mean.