Lesson 25: Power of a Statistical Test

Whenever we conduct a hypothesis test, we'd like to make sure that it is a test of high quality. One way of quantifying the quality of a hypothesis test is to ensure that it is a " powerful " test. In this lesson, we'll learn what it means to have a powerful hypothesis test, as well as how we can determine the sample size n necessary to ensure that the hypothesis test we are conducting has high power.

25.1 - Definition of Power

Let's start our discussion of statistical power by recalling two definitions we learned when we first introduced to hypothesis testing:

  • A Type I error occurs if we reject the null hypothesis \(H_0\) (in favor of the alternative hypothesis \(H_A\)) when the null hypothesis \(H_0\) is true. We denote \(\alpha=P(\text{Type I error})\).
  • A Type II error occurs if we fail to reject the null hypothesis \(H_0\) when the alternative hypothesis \(H_A\) is true. We denote \(\beta=P(\text{Type II error})\).

You'll certainly need to know these two definitions inside and out, as you'll be thinking about them a lot in this lesson, and at any time in the future when you need to calculate a sample size either for yourself or for someone else.

Example 25-1

rusted iron rods

The Brinell hardness scale is one of several definitions used in the field of materials science to quantify the hardness of a piece of metal. The Brinell hardness measurement of a certain type of rebar used for reinforcing concrete and masonry structures was assumed to be normally distributed with a standard deviation of 10 kilograms of force per square millimeter. Using a random sample of \(n=25\) bars, an engineer is interested in performing the following hypothesis test:

  • the null hypothesis \(H_0:\mu=170\)
  • against the alternative hypothesis \(H_A:\mu>170\)

If the engineer decides to reject the null hypothesis if the sample mean is 172 or greater, that is, if \(\bar{X} \ge 172 \), what is the probability that the engineer commits a Type I error?

In this case, the engineer commits a Type I error if his observed sample mean falls in the rejection region, that is, if it is 172 or greater, when the true (unknown) population mean is indeed 170. Graphically, \(\alpha\), the engineer's probability of committing a Type I error looks like this:

Now, we can calculate the engineer's value of \(\alpha\) by making the transformation from a normal distribution with a mean of 170 and a standard deviation of 10 to that of \(Z\), the standard normal distribution using:

\(Z= \frac{\bar{X}-\mu}{\sigma / \sqrt{n}} \)

Doing so, we get:

So, calculating the engineer's probability of committing a Type I error reduces to making a normal probability calculation. The probability is 0.1587 as illustrated here:

\(\alpha = P(\bar{X} \ge 172 \text { if } \mu = 170) = P(Z \ge 1.00) = 0.1587 \)

A probability of 0.1587 is a bit high. We'll learn in this lesson how the engineer could reduce his probability of committing a Type I error.

If, unknown to engineer, the true population mean were \(\mu=173\), what is the probability that the engineer commits a Type II error?

In this case, the engineer commits a Type II error if his observed sample mean does not fall in the rejection region, that is, if it is less than 172, when the true (unknown) population mean is 173. Graphically, \(\beta\), the engineer's probability of committing a Type II error looks like this:

Again, we can calculate the engineer's value of \(\beta\) by making the transformation from a normal distribution with a mean of 173 and a standard deviation of 10 to that of \(Z\), the standard normal distribution. Doing so, we get:

So, calculating the engineer's probability of committing a Type II error again reduces to making a normal probability calculation. The probability is 0.3085 as illustrated here:

\(\beta= P(\bar{X} < 172 \text { if } \mu = 173) = P(Z < -0.50) = 0.3085 \)

A probability of 0.3085 is a bit high. We'll learn in this lesson how the engineer could reduce his probability of committing a Type II error.

half empty glass

The power of a hypothesis test is the probability of making the correct decision if the alternative hypothesis is true. That is, the power of a hypothesis test is the probability of rejecting the null hypothesis \(H_0\) when the alternative hypothesis \(H_A\) is the hypothesis that is true.

Let's return to our engineer's problem to see if we can instead look at the glass as being half full!

Example 25-1 (continued)

If, unknown to the engineer, the true population mean were \(\mu=173\), what is the probability that the engineer makes the correct decision by rejecting the null hypothesis in favor of the alternative hypothesis?

In this case, the engineer makes the correct decision if his observed sample mean falls in the rejection region, that is, if it is greater than 172, when the true (unknown) population mean is 173. Graphically, the power of the engineer's hypothesis test looks like this:

That makes the power of the engineer's hypothesis test 0.6915 as illustrated here:

\(\text{Power } = P(\bar{X} \ge 172 \text { if } \mu = 173) = P(Z \ge -0.50) = 0.6915 \)

which of course could have alternatively been calculated by simply subtracting the probability of committing a Type II error from 1, as shown here:

\(\text{Power } = 1 - \beta = 1 - 0.3085 = 0.6915 \)

At any rate, if the unknown population mean were 173, the engineer's hypothesis test would be at least a bit better than flipping a fair coin, in which he'd have but a 50% chance of choosing the correct hypothesis. In this case, he has a 69.15% chance. He could still do a bit better.

In general, for every hypothesis test that we conduct, we'll want to do the following:

Minimize the probability of committing a Type I error. That, is minimize \(\alpha=P(\text{Type I Error})\). Typically, a significance level of \(\alpha\le 0.10\) is desired.

Maximize the power (at a value of the parameter under the alternative hypothesis that is scientifically meaningful). Typically, we desire power to be 0.80 or greater. Alternatively, we could minimize \(\beta=P(\text{Type II Error})\), aiming for a type II error rate of 0.20 or less.

By the way, in the second point, what exactly does "at a value of the parameter under the alternative hypothesis that is scientifically meaningful" mean? Well, let's suppose that a medical researcher is interested in testing the null hypothesis that the mean total blood cholesterol in a population of patients is 200 mg/dl against the alternative hypothesis that the mean total blood cholesterol is greater than 200 mg/dl . Well, the alternative hypothesis contains an infinite number of possible values of the mean. Under the alternative hypothesis, the mean of the population could be, among other values, 201, 202, or 210. Suppose the medical researcher rejected the null hypothesis, because the mean was 201. Whoopdy-do...would that be a rocking conclusion? No, probably not. On the other hand, suppose the medical researcher rejected the null hypothesis, because the mean was 215. In that case, the mean is substantially different enough from the assumed mean under the null hypothesis, that we'd probably get excited about the result. In summary, in this example, we could probably all agree to consider a mean of 215 to be "scientifically meaningful," whereas we could not do the same for a mean of 201.

Now, of course, all of this talk is a bit if gibberish, because we'd never really know whether the true unknown population mean were 201 or 215, otherwise, we wouldn't have to be going through the process of conducting a hypothesis test about the mean. We can do something though. We can plan our scientific studies so that our hypothesis tests have enough power to reject the null hypothesis in favor of values of the parameter under the alternative hypothesis that are scientifically meaningful.

25.2 - Power Functions

Example 25-2.

iq logo

Let's take a look at another example that involves calculating the power of a hypothesis test.

Let \(X\) denote the IQ of a randomly selected adult American. Assume, a bit unrealistically, that \(X\) is normally distributed with unknown mean \(\mu\) and standard deviation 16. Take a random sample of \(n=16\) students, so that, after setting the probability of committing a Type I error at \(\alpha=0.05\), we can test the null hypothesis \(H_0:\mu=100\) against the alternative hypothesis that \(H_A:\mu>100\).

What is the power of the hypothesis test if the true population mean were \(\mu=108\)?

Setting \(\alpha\), the probability of committing a Type I error, to 0.05, implies that we should reject the null hypothesis when the test statistic \(Z\ge 1.645\), or equivalently, when the observed sample mean is 106.58 or greater:

because we transform the test statistic \(Z\) to the sample mean by way of:

\(Z=\dfrac{\bar{X}-\mu}{\frac{\sigma}{\sqrt{n}}}\qquad \Rightarrow \bar{X}=\mu+Z\dfrac{\sigma}{\sqrt{n}} \qquad \bar{X}=100+1.645\left(\dfrac{16}{\sqrt{16}}\right)=106.58\)

Now, that implies that the power, that is, the probability of rejecting the null hypothesis, when \(\mu=108\) is 0.6406 as calculated here (recalling that \(Phi(z)\) is standard notation for the cumulative distribution function of the standard normal random variable):

\( \text{Power}=P(\bar{X}\ge 106.58\text{ when } \mu=108) = P\left(Z\ge \dfrac{106.58-108}{\frac{16}{\sqrt{16}}}\right) \\ = P(Z\ge -0.36)=1-P(Z<-0.36)=1-\Phi(-0.36)=1-0.3594=0.6406 \)

and illustrated here:

In summary, we have determined that we have (only) a 64.06% chance of rejecting the null hypothesis \(H_0:\mu=100\) in favor of the alternative hypothesis \(H_A:\mu>100\) if the true unknown population mean is in reality \(\mu=108\).

What is the power of the hypothesis test if the true population mean were \(\mu=112\)?

Because we are setting \(\alpha\), the probability of committing a Type I error, to 0.05, we again reject the null hypothesis when the test statistic \(Z\ge 1.645\), or equivalently, when the observed sample mean is 106.58 or greater. That means that the probability of rejecting the null hypothesis, when \(\mu=112\) is 0.9131 as calculated here:

\( \text{Power}=P(\bar{X}\ge 106.58\text{ when }\mu=112)=P\left(Z\ge \frac{106.58-112}{\frac{16}{\sqrt{16}}}\right) \\ = P(Z\ge -1.36)=1-P(Z<-1.36)=1-\Phi(-1.36)=1-0.0869=0.9131 \)

In summary, we have determined that we now have a 91.31% chance of rejecting the null hypothesis \(H_0:\mu=100\) in favor of the alternative hypothesis \(H_A:\mu>100\) if the true unknown population mean is in reality \(\mu=112\). Hmm.... it should make sense that the probability of rejecting the null hypothesis is larger for values of the mean, such as 112, that are far away from the assumed mean under the null hypothesis.

What is the power of the hypothesis test if the true population mean were \(\mu=116\)?

Again, because we are setting \(\alpha\), the probability of committing a Type I error, to 0.05, we reject the null hypothesis when the test statistic \(Z\ge 1.645\), or equivalently, when the observed sample mean is 106.58 or greater. That means that the probability of rejecting the null hypothesis, when \(\mu=116\) is 0.9909 as calculated here:

\(\text{Power}=P(\bar{X}\ge 106.58\text{ when }\mu=116) =P\left(Z\ge \dfrac{106.58-116}{\frac{16}{\sqrt{16}}}\right) = P(Z\ge -2.36)=1-P(Z<-2.36)= 1-\Phi(-2.36)=1-0.0091=0.9909 \)

In summary, we have determined that, in this case, we have a 99.09% chance of rejecting the null hypothesis \(H_0:\mu=100\) in favor of the alternative hypothesis \(H_A:\mu>100\) if the true unknown population mean is in reality \(\mu=116\). The probability of rejecting the null hypothesis is the largest yet of those we calculated, because the mean, 116, is the farthest away from the assumed mean under the null hypothesis.

Are you growing weary of this? Let's summarize a few things we've learned from engaging in this exercise:

  • First and foremost, my instructor can be tedious at times..... errrr, I mean, first and foremost, the power of a hypothesis test depends on the value of the parameter being investigated. In the above, example, the power of the hypothesis test depends on the value of the mean \(\mu\).
  • As the actual mean \(\mu\) moves further away from the value of the mean \(\mu=100\) under the null hypothesis, the power of the hypothesis test increases.

It's that first point that leads us to what is called the power function of the hypothesis test . If you go back and take a look, you'll see that in each case our calculation of the power involved a step that looks like this:

\(\text{Power } =1 - \Phi (z) \) where \(z = \frac{106.58 - \mu}{16 / \sqrt{16}} \)

That is, if we use the standard notation \(K(\mu)\) to denote the power function, as it depends on \(\mu\), we have:

\(K(\mu) = 1- \Phi \left( \frac{106.58 - \mu}{16 / \sqrt{16}} \right) \)

So, the reality is your instructor could have been a whole lot more tedious by calculating the power for every possible value of \(\mu\) under the alternative hypothesis! What we can do instead is create a plot of the power function, with the mean \(\mu\) on the horizontal axis and the power \(K(\mu)\) on the vertical axis. Doing so, we get a plot in this case that looks like this:

Now, what can we learn from this plot? Well:

We can see that \(\alpha\) (the probability of a Type I error), \(\beta\) (the probability of a Type II error), and \(K(\mu)\) are all represented on a power function plot, as illustrated here:

We can see that the probability of a Type I error is \(\alpha=K(100)=0.05\), that is, the probability of rejecting the null hypothesis when the null hypothesis is true is 0.05.

We can see the power of a test \(K(\mu)\), as well as the probability of a Type II error \(\beta(\mu)\), for each possible value of \(\mu\).

We can see that \(\beta(\mu)=1-K(\mu)\) and vice versa, that is, \(K(\mu)=1-\beta(\mu)\).

And we can see graphically that, indeed, as the actual mean \(\mu\) moves further away from the null mean \(\mu=100\), the power of the hypothesis test increases.

Now, what would do you suppose would happen to the power of our hypothesis test if we were to change our willingness to commit a Type I error? Would the power for a given value of \(\mu\) increase, decrease, or remain unchanged? Suppose, for example, that we wanted to set \(\alpha=0.01\) instead of \(\alpha=0.05\)? Let's return to our example to explore this question.

Example 25-2 (continued)

iq test

Let \(X\) denote the IQ of a randomly selected adult American. Assume, a bit unrealistically, that \(X\) is normally distributed with unknown mean \(\mu\) and standard deviation 16. Take a random sample of \(n=16\) students, so that, after setting the probability of committing a Type I error at \(\alpha=0.01\), we can test the null hypothesis \(H_0:\mu=100\) against the alternative hypothesis that \(H_A:\mu>100\).

Setting \(\alpha\), the probability of committing a Type I error, to 0.01, implies that we should reject the null hypothesis when the test statistic \(Z\ge 2.326\), or equivalently, when the observed sample mean is 109.304 or greater:

\(\bar{x} = \mu + z \left( \frac{\sigma}{\sqrt{n}} \right) =100 + 2.326\left( \frac{16}{\sqrt{16}} \right)=109.304 \)

That means that the probability of rejecting the null hypothesis, when \(\mu=108\) is 0.3722 as calculated here:

So, the power when \(\mu=108\) and \(\alpha=0.01\) is smaller (0.3722) than the power when \(\mu=108\) and \(\alpha=0.05\) (0.6406)! Perhaps we can see this graphically:

By the way, we could again alternatively look at the glass as being half-empty. In that case, the probability of a Type II error when \(\mu=108\) and \(\alpha=0.01\) is \(1-0.3722=0.6278\). In this case, the probability of a Type II error is greater than the probability of a Type II error when \(\mu=108\) and \(\alpha=0.05\).

All of this can be seen graphically by plotting the two power functions, one where \(\alpha=0.01\) and the other where \(\alpha=0.05\), simultaneously. Doing so, we get a plot that looks like this:

This last example illustrates that, providing the sample size \(n\) remains unchanged, a decrease in \(\alpha\) causes an increase in \(\beta\) , and at least theoretically, if not practically, a decrease in \(\beta\) causes an increase in \(\alpha\). It turns out that the only way that \(\alpha\) and \(\beta\) can be decreased simultaneously is by increasing the sample size \(n\).

25.3 - Calculating Sample Size

Before we learn how to calculate the sample size that is necessary to achieve a hypothesis test with a certain power, it might behoove us to understand the effect that sample size has on power. Let's investigate by returning to our IQ example.

Example 25-3

Let \(X\) denote the IQ of a randomly selected adult American. Assume, a bit unrealistically again, that \(X\) is normally distributed with unknown mean \(\mu\) and (a strangely known) standard deviation of 16. This time, instead of taking a random sample of \(n=16\) students, let's increase the sample size to \(n=64\). And, while setting the probability of committing a Type I error to \(\alpha=0.05\), test the null hypothesis \(H_0:\mu=100\) against the alternative hypothesis that \(H_A:\mu>100\).

What is the power of the hypothesis test when \(\mu=108\), \(\mu=112\), and \(\mu=116\)?

Setting \(\alpha\), the probability of committing a Type I error, to 0.05, implies that we should reject the null hypothesis when the test statistic \(Z\ge 1.645\), or equivalently, when the observed sample mean is 103.29 or greater:

\( \bar{x} = \mu + z \left(\dfrac{\sigma}{\sqrt{n}} \right) = 100 +1.645\left(\dfrac{16}{\sqrt{64}} \right) = 103.29\)

Therefore, the power function \K(\mu)\), when \(\mu>100\) is the true value, is:

\( K(\mu) = P(\bar{X} \ge 103.29 | \mu) = P \left(Z \ge \dfrac{103.29 - \mu}{16 / \sqrt{64}} \right) = 1 - \Phi \left(\dfrac{103.29 - \mu}{2} \right)\)

Therefore, the probability of rejecting the null hypothesis at the \(\alpha=0.05\) level when \(\mu=108\) is 0.9907, as calculated here:

\(K(108) = 1 - \Phi \left( \dfrac{103.29-108}{2} \right) = 1- \Phi(-2.355) = 0.9907 \)

And, the probability of rejecting the null hypothesis at the \(\alpha=0.05\) level when \(\mu=112\) is greater than 0.9999, as calculated here:

\( K(112) = 1 - \Phi \left( \dfrac{103.29-112}{2} \right) = 1- \Phi(-4.355) = 0.9999\ldots \)

And, the probability of rejecting the null hypothesis at the \(\alpha=0.05\) level when \(\mu=116\) is greater than 0.999999, as calculated here:

\( K(116) = 1 - \Phi \left( \dfrac{103.29-116}{2} \right) = 1- \Phi(-6.355) = 0.999999... \)

In summary, in the various examples throughout this lesson, we have calculated the power of testing \(H_0:\mu=100\) against \(H_A:\mu>100\) for two sample sizes ( \(n=16\) and \(n=64\)) and for three possible values of the mean ( \(\mu=108\), \(\mu=112\), and \(\mu=116\)). Here's a summary of our power calculations:

As you can see, our work suggests that for a given value of the mean \(\mu\) under the alternative hypothesis, the larger the sample size \(n\), the greater the power \(K(\mu)\) . Perhaps there is no better way to see this than graphically by plotting the two power functions simultaneously, one when \(n=16\) and the other when \(n=64\):

As this plot suggests, if we are interested in increasing our chance of rejecting the null hypothesis when the alternative hypothesis is true, we can do so by increasing our sample size \(n\). This benefit is perhaps even greatest for values of the mean that are close to the value of the mean assumed under the null hypothesis. Let's take a look at two examples that illustrate the kind of sample size calculation we can make to ensure our hypothesis test has sufficient power.

Example 25-4

corn field

Let \(X\) denote the crop yield of corn measured in the number of bushels per acre. Assume (unrealistically) that \(X\) is normally distributed with unknown mean \(\mu\) and standard deviation \(\sigma=6\). An agricultural researcher is working to increase the current average yield from 40 bushels per acre. Therefore, he is interested in testing, at the \(\alpha=0.05\) level, the null hypothesis \(H_0:\mu=40\) against the alternative hypothesis that \(H_A:\mu>40\). Find the sample size \(n\) that is necessary to achieve 0.90 power at the alternative \(\mu=45\).

As is always the case, we need to start by finding a threshold value \(c\), such that if the sample mean is larger than \(c\), we'll reject the null hypothesis:

That is, in order for our hypothesis test to be conducted at the \(\alpha=0.05\) level, the following statement must hold (using our typical \(Z\) transformation):

\(c = 40 + 1.645 \left( \dfrac{6}{\sqrt{n}} \right) \) (**)

But, that's not the only condition that \(c\) must meet, because \(c\) also needs to be defined to ensure that our power is 0.90 or, alternatively, that the probability of a Type II error is 0.10. That would happen if there was a 10% chance that our test statistic fell short of \(c\) when \(\mu=45\), as the following drawing illustrates in blue:

This illustration suggests that in order for our hypothesis test to have 0.90 power, the following statement must hold (using our usual \(Z\) transformation):

\(c = 45 - 1.28 \left( \dfrac{6}{\sqrt{n}} \right) \) (**)

Aha! We have two (asterisked (**)) equations and two unknowns! All we need to do is equate the equations, and solve for \(n\). Doing so, we get:

\(40+1.645\left(\frac{6}{\sqrt{n}}\right)=45-1.28\left(\frac{6}{\sqrt{n}}\right)\) \(\Rightarrow 5=(1.645+1.28)\left(\frac{6}{\sqrt{n}}\right), \qquad \Rightarrow 5=\frac{17.55}{\sqrt{n}}, \qquad n=(3.51)^2=12.3201\approx 13\)

Now that we know we will set \(n=13\), we can solve for our threshold value c :

\( c = 40 + 1.645 \left( \dfrac{6}{\sqrt{13}} \right)=42.737 \)

So, in summary, if the agricultural researcher collects data on \(n=13\) corn plots, and rejects his null hypothesis \(H_0:\mu=40\) if the average crop yield of the 13 plots is greater than 42.737 bushels per acre, he will have a 5% chance of committing a Type I error and a 10% chance of committing a Type II error if the population mean \(\mu\) were actually 45 bushels per acre.

Example 25-5

politician

Consider \(p\), the true proportion of voters who favor a particular political candidate. A pollster is interested in testing at the \(\alpha=0.01\) level, the null hypothesis \(H_0:9=0.5\) against the alternative hypothesis that \(H_A:p>0.5\). Find the sample size \(n\) that is necessary to achieve 0.80 power at the alternative \(p=0.55\).

In this case, because we are interested in performing a hypothesis test about a population proportion \(p\), we use the \(Z\)-statistic:

\(Z = \dfrac{\hat{p}-p_0}{\sqrt{\frac{p_0(1-p_0)}{n}}} \)

Again, we start by finding a threshold value \(c\), such that if the observed sample proportion is larger than \(c\), we'll reject the null hypothesis:

That is, in order for our hypothesis test to be conducted at the \(\alpha=0.01\) level, the following statement must hold:

\(c = 0.5 + 2.326 \sqrt{ \dfrac{(0.5)(0.5)}{n}} \) (**)

But, again, that's not the only condition that c must meet, because \(c\) also needs to be defined to ensure that our power is 0.80 or, alternatively, that the probability of a Type II error is 0.20. That would happen if there was a 20% chance that our test statistic fell short of \(c\) when \(p=0.55\), as the following drawing illustrates in blue:

This illustration suggests that in order for our hypothesis test to have 0.80 power, the following statement must hold:

\(c = 0.55 - 0.842 \sqrt{ \dfrac{(0.55)(0.45)}{n}} \) (**)

Again, we have two (asterisked (**)) equations and two unknowns! All we need to do is equate the equations, and solve for \(n\). Doing so, we get:

\(0.5+2.326\sqrt{\dfrac{0.5(0.5)}{n}}=0.55-0.842\sqrt{\dfrac{0.55(0.45)}{n}} \\ 2.326\dfrac{\sqrt{0.25}}{\sqrt{n}}+0.842\dfrac{\sqrt{0.2475}}{\sqrt{n}}=0.55-0.5 \\ \dfrac{1}{\sqrt{n}}(1.5818897)=0.05 \qquad \Rightarrow n\approx \left(\dfrac{1.5818897}{0.05}\right)^2 = 1000.95 \approx 1001 \)

Now that we know we will set \(n=1001\), we can solve for our threshold value \(c\):

\(c = 0.5 + 2.326 \sqrt{\dfrac{(0.5)(0.5)}{1001}}= 0.5367 \)

So, in summary, if the pollster collects data on \(n=1001\) voters, and rejects his null hypothesis \(H_0:p=0.5\) if the proportion of sampled voters who favor the political candidate is greater than 0.5367, he will have a 1% chance of committing a Type I error and a 20% chance of committing a Type II error if the population proportion \(p\) were actually 0.55.

Incidentally, we can always check our work! Conducting the survey and subsequent hypothesis test as described above, the probability of committing a Type I error is:

\(\alpha= P(\hat{p} >0.5367 \text { if } p = 0.50) = P(Z > 2.3257) = 0.01 \)

and the probability of committing a Type II error is:

\(\beta = P(\hat{p} <0.5367 \text { if } p = 0.55) = P(Z < -0.846) = 0.199 \)

just as the pollster had desired.

We've illustrated several sample size calculations. Now, let's summarize the information that goes into a sample size calculation. In order to determine a sample size for a given hypothesis test, you need to specify:

The desired \(\alpha\) level, that is, your willingness to commit a Type I error.

The desired power or, equivalently, the desired \(\beta\) level, that is, your willingness to commit a Type II error.

A meaningful difference from the value of the parameter that is specified in the null hypothesis.

The standard deviation of the sample statistic or, at least, an estimate of the standard deviation (the "standard error") of the sample statistic.

Teach yourself statistics

How to Find the Power of a Statistical Test

When a researcher designs a study to test a hypothesis, he/she should compute the power of the test (i.e., the likelihood of avoiding a Type II error).

How to Compute the Power of a Hypothesis Test

To compute the power of a hypothesis test, use the following three-step procedure.

  • Define the region of acceptance . Previously, we showed how to compute the region of acceptance for a hypothesis test.
  • Specify the critical parameter value. The critical parameter value is an alternative to the value specified in the null hypothesis. The difference between the critical parameter value and the value from the null hypothesis is called the effect size . That is, the effect size is equal to the critical parameter value minus the value from the null hypothesis.
  • Compute power. Assume that the true population parameter is equal to the critical parameter value, rather than the value specified in the null hypothesis. Based on that assumption, compute the probability that the sample estimate of the population parameter will fall outside the region of acceptance. That probability is the power of the test.

The following examples illustrate how this works. The first example involves a mean score; and the second example, a proportion.

Sample Size Calculator

The steps required to compute the power of a hypothesis test can be time-consuming and complex. Stat Trek's Sample Size Calculator does this work for you - quickly and accurately. The calculator is easy to use, and it is free. You can find the Sample Size Calculator in Stat Trek's main menu under the Stat Tools tab. Or you can tap the button below.

Example 1: Power of the Hypothesis Test of a Mean Score

Two inventors have developed a new, energy-efficient lawn mower engine. One inventor says that the engine will run continuously for 5 hours (300 minutes) on a single ounce of regular gasoline. Suppose a random sample of 50 engines is tested. The engines run for an average of 295 minutes, with a standard deviation of 20 minutes. The inventor tests the null hypothesis that the mean run time is 300 minutes against the alternative hypothesis that the mean run time is not 300 minutes, using a 0.05 level of significance.

The other inventor says that the new engine will run continuously for only 290 minutes on a ounce of gasoline. Find the power of the test to reject the null hypothesis, if the second inventor is correct.

Solution: The steps required to compute power are presented below.

  • Define the region of acceptance . In a previous lesson, we showed that the region of acceptance for this problem consists of the values between 294.46 and 305.54 (see previous lesson ).
  • Specify the critical parameter value . The null hypothesis tests the hypothesis that the run time of the engine is 300 minutes. We are interested in determining the probability that the hypothesis test will reject the null hypothesis, if the true run time is actually 290 minutes. Therefore, the critical parameter value is 290. (Another way to express the critical parameter value is through effect size. The effect size is equal to the critical parameter value minus the hypothesized value. Thus, effect size is equal to 290 - 300 or -10.)

Therefore, we need to compute the probability that the sampled run time will be less than 294.46 or greater than 305.54. To do this, we make the following assumptions:

  • The sampling distribution of the mean is normally distributed. (Because the sample size is relatively large, this assumption can be justified by the central limit theorem .)
  • The mean of the sampling distribution is the critical parameter value, 290.
  • The standard error of the sampling distribution is 2.83. The standard error of the sampling distribution was computed in a previous lesson (see previous lesson ).

Given these assumptions, we first assess the probability that the sample run time will be less than 294.46. This is easy to do, using the Normal Calculator . We enter the following values into the calculator: normal random variable = 294.46; mean = 290; and standard deviation = 2.83. Given these inputs, we find that the cumulative probability is 0.942. This means the probability that the sample mean will be less than 294.46 is 0.942.

Next, we assess the probability that the sample mean is greater than 305.54. Again, we use the Normal Calculator . We enter the following values into the calculator: normal random variable = 305.54; mean = 290; and standard deviation = 2.83. Given these inputs, we find that the probability that the sample mean is less than 305.54 (i.e., the cumulative probability) is 1.0. Thus, the probability that the sample mean is greater than 305.54 is 1 - 1.0 or 0.0.

Example 2: Power of the Hypothesis Test of a Proportion

A major corporation offers a large bonus to all of its employees if at least 80 percent of the corporation's 1,000,000 customers are very satisfied. The company conducts a survey of 100 randomly sampled customers to determine whether or not to pay the bonus. The null hypothesis states that the proportion of very satisfied customers is at least 0.80. If the null hypothesis cannot be rejected, given a significance level of 0.05, the company pays the bonus.

Suppose the true proportion of satisfied customers is 0.75. Find the power of the test to reject the null hypothesis.

  • Define the region of acceptance . In a previous lesson, we showed that the region of acceptance for this problem consists of the values between 0.734 and 1.00. (see previous lesson ).
  • Specify the critical parameter value . The null hypothesis tests the hypothesis that the proportion of very satisfied customers is 0.80. We are interested in determining the probability that the hypothesis test will reject the null hypothesis, if the true satisfaction level is 0.75. Therefore, the critical parameter value is 0.75. (Another way to express the critical parameter value is through effect size. The effect size is equal to the critical parameter value minus the hypothesized value. Thus, effect size is equal to [0.75 - 0.80] or - 0.05.)

Therefore, we need to compute the probability that the sample proportion will be less than 0.734. To do this, we take the following steps:

  • Assume that the sampling distribution of the mean is normally distributed. (Because the sample size is relatively large, this assumption can be justified by the central limit theorem .)
  • Assume that the mean of the sampling distribution is the critical parameter value, 0.75. (This assumption is justified because, for the purpose of calculating power, we assume that the true population proportion is equal to the critical parameter value. And the mean of all possible sample proportions is equal to the population proportion. Hence, the mean of the sampling distribution is equal to the critical parameter value.)

σ P = sqrt[ P * ( 1 - P ) / n ]

σ P = sqrt[ ( 0.75 * 0.25 ) / 100 ] = 0.0433

If you're seeing this message, it means we're having trouble loading external resources on our website.

If you're behind a web filter, please make sure that the domains *.kastatic.org and *.kasandbox.org are unblocked.

To log in and use all the features of Khan Academy, please enable JavaScript in your browser.

AP®︎/College Statistics

Course: ap®︎/college statistics   >   unit 10.

  • Introduction to Type I and Type II errors
  • Examples identifying Type I and Type II errors
  • Type I vs Type II error

Introduction to power in significance tests

  • Examples thinking about power in significance tests
  • Error probabilities and power
  • Consequences of errors and significance

meaning power hypothesis test

Want to join the conversation?

  • Upvote Button navigates to signup page
  • Downvote Button navigates to signup page
  • Flag Button navigates to signup page

Video transcript

Library homepage

  • school Campus Bookshelves
  • menu_book Bookshelves
  • perm_media Learning Objects
  • login Login
  • how_to_reg Request Instructor Account
  • hub Instructor Commons

Margin Size

  • Download Page (PDF)
  • Download Full Book (PDF)
  • Periodic Table
  • Physics Constants
  • Scientific Calculator
  • Reference & Cite
  • Tools expand_more
  • Readability

selected template will load here

This action is not available.

Statistics LibreTexts

13.5: Factors Affecting Power

  • Last updated
  • Save as PDF
  • Page ID 2163

  • Rice University

\( \newcommand{\vecs}[1]{\overset { \scriptstyle \rightharpoonup} {\mathbf{#1}} } \)

\( \newcommand{\vecd}[1]{\overset{-\!-\!\rightharpoonup}{\vphantom{a}\smash {#1}}} \)

\( \newcommand{\id}{\mathrm{id}}\) \( \newcommand{\Span}{\mathrm{span}}\)

( \newcommand{\kernel}{\mathrm{null}\,}\) \( \newcommand{\range}{\mathrm{range}\,}\)

\( \newcommand{\RealPart}{\mathrm{Re}}\) \( \newcommand{\ImaginaryPart}{\mathrm{Im}}\)

\( \newcommand{\Argument}{\mathrm{Arg}}\) \( \newcommand{\norm}[1]{\| #1 \|}\)

\( \newcommand{\inner}[2]{\langle #1, #2 \rangle}\)

\( \newcommand{\Span}{\mathrm{span}}\)

\( \newcommand{\id}{\mathrm{id}}\)

\( \newcommand{\kernel}{\mathrm{null}\,}\)

\( \newcommand{\range}{\mathrm{range}\,}\)

\( \newcommand{\RealPart}{\mathrm{Re}}\)

\( \newcommand{\ImaginaryPart}{\mathrm{Im}}\)

\( \newcommand{\Argument}{\mathrm{Arg}}\)

\( \newcommand{\norm}[1]{\| #1 \|}\)

\( \newcommand{\Span}{\mathrm{span}}\) \( \newcommand{\AA}{\unicode[.8,0]{x212B}}\)

\( \newcommand{\vectorA}[1]{\vec{#1}}      % arrow\)

\( \newcommand{\vectorAt}[1]{\vec{\text{#1}}}      % arrow\)

\( \newcommand{\vectorB}[1]{\overset { \scriptstyle \rightharpoonup} {\mathbf{#1}} } \)

\( \newcommand{\vectorC}[1]{\textbf{#1}} \)

\( \newcommand{\vectorD}[1]{\overrightarrow{#1}} \)

\( \newcommand{\vectorDt}[1]{\overrightarrow{\text{#1}}} \)

\( \newcommand{\vectE}[1]{\overset{-\!-\!\rightharpoonup}{\vphantom{a}\smash{\mathbf {#1}}}} \)

Learning Objectives

  • State what the effect of each of the factors is

Several factors affect the power of a statistical test. Some of the factors are under the control of the experimenter, whereas others are not. The following example will be used to illustrate the various factors.

Suppose a math achievement test were known to be normally distributed with a mean of \(75\) and a standard deviation of \(\sigma\). A researcher is interested in whether a new method of teaching results in a higher mean. Assume that although the experimenter does not know it, the population mean \(\mu\) for the new method is larger than \(75\). The researcher plans to sample \(N\) subjects and do a one-tailed test of whether the sample mean is significantly higher than \(75\). In this section, we consider factors that affect the probability that the researcher will correctly reject the false null hypothesis that the population mean is \(75\) or lower. In other words, factors that affect power.

Sample Size

Figure \(\PageIndex{1}\) shows that the larger the sample size, the higher the power. Since sample size is typically under an experimenter's control, increasing sample size is one way to increase power. However, it is sometimes difficult and/or expensive to use a large sample size.

Standard Deviation

Figure \(\PageIndex{1}\) also shows that power is higher when the standard deviation is small than when it is large. For all values of \(N\), power is higher for the standard deviation of \(10\) than for the standard deviation of \(15\) (except, of course, when \(N = 0\)). Experimenters can sometimes control the standard deviation by sampling from a homogeneous population of subjects, by reducing random measurement error, and/or by making sure the experimental procedures are applied very consistently.

Difference between Hypothesized and True Mean

Naturally, the larger the effect size, the more likely it is that an experiment would find a significant effect. Figure \(\PageIndex{2}\) shows the effect of increasing the difference between the mean specified by the null hypothesis (\(75\)) and the population mean \(\mu\) for standard deviations of \(10\) and \(15\).

Significance Level

There is a trade-off between the significance level and power: the more stringent (lower) the significance level, the lower the power. Figure \(\PageIndex{3}\) shows that power is lower for the \(0.01\) level than it is for the \(0.05\) level. Naturally, the stronger the evidence needed to reject the null hypothesis, the lower the chance that the null hypothesis will be rejected.

One- versus Two-Tailed Tests

Power is higher with a one-tailed test than with a two-tailed test as long as the hypothesized direction is correct. A one-tailed test at the \(0.05\) level has the same power as a two-tailed test at the \(0.10\) level. A one-tailed test, in effect, raises the significance level.

Biostatistics

  • Data Science
  • Programming
  • Social Science
  • Certificates
  • Undergraduate
  • For Businesses
  • FAQs and Knowledge Base

Test Yourself

  • Instructors

Power of a Hypothesis Test

Power of a Hypothesis Test:

The power of hypothesis test is a measure of how effective the test is at identifying (say) a difference in populations if such a difference exists. It is the probability of rejecting the null hypothesis when it is false.

Browse Other Glossary Entries

Planning on taking an introductory statistics course, but not sure if you need to start at the beginning? Review the course description for each of our introductory statistics courses and estimate which best matches your level, then take the self test for that course. If you get all or almost all the questions correct, move on and take the next test.

Data Analytics

Considering becoming adata scientist, customer analyst or our data science certificate program?

Advanced Statistics Quiz

Looking at statistics for graduate programs or to enhance your foundational knowledge?

Regression Quiz

Entering the biostatistics field? Test your skill here.

Stay Informed

Read up on our latest blogs

Learn about our certificate programs

Find the right course for you

We'd love to answer your questions

Our mentors and academic advisors are standing by to help guide you towards the courses or program that makes the most sense for you and your goals.

300 W Main St STE 301, Charlottesville, VA 22903

(434) 973-7673

[email protected]

By submitting your information, you agree to receive email communications from Statistics.com. All information submitted is subject to our privacy policy . You may opt out of receiving communications at any time.

Statology

Statistics Made Easy

5 Tips for Interpreting P-Values Correctly in Hypothesis Testing

5 Tips for Interpreting P-Values Correctly in Hypothesis Testing

Hypothesis testing is a critical part of statistical analysis and is often the endpoint where conclusions are drawn about larger populations based on a sample or experimental dataset. Central to this process is the p-value. Broadly, the p-value quantifies the strength of evidence against the null hypothesis. Given the importance of the p-value, it is essential to ensure its interpretation is correct. Here are five essential tips for ensuring the p-value from a hypothesis test is understood correctly. 

1. Know What the P-value Represents

First, it is essential to understand what a p-value is. In hypothesis testing, the p-value is defined as the probability of observing your data, or data more extreme, if the null hypothesis is true. As a reminder, the null hypothesis states no difference between your data and the expected population. 

For example, in a hypothesis test to see if changing a company’s logo drives more traffic to the website, a null hypothesis would state that the new traffic numbers are equal to the old traffic numbers. In this context, the p-value would be the probability that the data you observed, or data more extreme, would occur if this null hypothesis were true. 

Therefore, a smaller p-value indicates that what you observed is unlikely to have occurred if the null were true, offering evidence to reject the null hypothesis. Typically, a cut-off value of 0.05 is used where any p-value below this is considered significant evidence against the null. 

2. Understand the Directionality of Your Hypothesis

Based on the research question under exploration, there are two types of hypotheses: one-sided and two-sided. A one-sided test specifies a particular direction of effect, such as traffic to a website increasing after a design change. On the other hand, a two-sided test allows the change to be in either direction and is effective when the researcher wants to see any effect of the change. 

Either way, determining the statistical significance of a p-value is the same: if the p-value is below a threshold value, it is statistically significant. However, when calculating the p-value, it is important to ensure the correct sided calculations have been completed. 

Additionally, the interpretation of the meaning of a p-value will differ based on the directionality of the hypothesis. If a one-sided test is significant, the researchers can use the p-value to support a statistically significant increase or decrease based on the direction of the test. If a two-sided test is significant, the p-value can only be used to say that the two groups are different, but not that one is necessarily greater. 

3. Avoid Threshold Thinking

A common pitfall in interpreting p-values is falling into the threshold thinking trap. The most commonly used cut-off value for whether a calculated p-value is statistically significant is 0.05. Typically, a p-value of less than 0.05 is considered statistically significant evidence against the null hypothesis. 

However, this is just an arbitrary value. Rigid adherence to this or any other predefined cut-off value can obscure business-relevant effect sizes. For example, a hypothesis test looking at changes in traffic after a website design may find that an increase of 10,000 views is not statistically significant with a p-value of 0.055 since that value is above 0.05. However, the actual increase of 10,000 may be important to the growth of the business. 

Therefore, a p-value can be practically significant while not being statistically significant. Both types of significance and the broader context of the hypothesis test should be considered when making a final interpretation. 

4. Consider the Power of Your Study

Similarly, some study conditions can result in a non-significant p-value even if practical significance exists. Statistical power is the ability of a study to detect an effect when it truly exists. In other words, it is the probability that the null hypothesis will be rejected when it is false. 

Power is impacted by a lot of factors. These include sample size, the effect size you are looking for, and variability within the data. In the example of website traffic after a design change, if the number of visits overall is too small, there may not be enough views to have enough power to detect a difference. 

Simple ways to increase the power of a hypothesis test and increase the chances of detecting an effect are increasing the sample size, looking for a smaller effect size, changing the experiment design to control for variables that can increase variability, or adjusting the type of statistical test being run.

5. Be Aware of Multiple Comparisons

Whenever multiple p-values are calculated in a single study due to multiple comparisons, there is an increased risk of false positives. This is because each individual comparison introduces random fluctuations, and each additional comparison compounds these fluctuations. 

For example, in a hypothesis test looking at traffic before and after a website redesign, the team may be interested in making more than one comparison. This can include total visits, page views, and average time spent on the website. Since multiple comparisons are being made, there must be a correction made when interpreting the p-value. 

The Bonferroni correction is one of the most commonly used methods to account for this increased probability of false positives. In this method, the significance cut-off value, typically 0.05, is divided by the number of comparisons made. The result is used as the new significance cut-off value.  Applying this correction mitigates the risk of false positives and improves the reliability of findings from a hypothesis test. 

In conclusion, interpreting p-values requires a nuanced understanding of many statistical concepts and careful consideration of the hypothesis test’s context. By following these five tips, the interpretation of the p-value from a hypothesis test can be more accurate and reliable, leading to better data-driven decision-making.

Featured Posts

5 Tips for Interpreting P-Values Correctly in Hypothesis Testing

Leave a Reply Cancel reply

Your email address will not be published. Required fields are marked *

Join the Statology Community

Sign up to receive Statology's exclusive study resource: 100 practice problems with step-by-step solutions. Plus, get our latest insights, tutorials, and data analysis tips straight to your inbox!

By subscribing you accept Statology's Privacy Policy.

China says war games around Taiwan to test ability to ‘seize power’

The People’s Liberation Army continues land, sea and air exercises that began on Thursday around the self-ruled island .

Chinese navy missile frigate FFG 548 near the Pengjia Islet north of Taiwan.

China’s military has begun its second day of war games around Taiwan, with drills that it said were to test the armed forces’ ability to “seize power” and control key areas of the self-ruled democracy.

As the first day of exercises, codenamed Joint Sword-2024A, got under way on Thursday, China described them as “punishment” following the inauguration speech by Taiwan’s new president William Lai Ching-te in which he said Taiwan was a “sovereign and independent nation with sovereignty resting in the people”.

Keep reading

Why are thousands of people protesting in taiwan, what do china’s latest drills around taiwan mean, ‘strong punishment’: china starts two days of military drills around taiwan, william lai ching-te urges peace as he becomes taiwan’s new president.

Lai also stressed Taiwan would make no concessions on its freedoms and called on Beijing to “stop its aggression against Taiwan”.

The drills are part of an escalating campaign of political and military intimidation by Beijing, which claims the island as its own and has not ruled out the use of force to achieve its goal of unification.

The two-day exercises are testing the “capability of joint seizure of power, joint strikes and control of key territories,” said Colonel Li Xi, spokesman for the People’s Liberation Army (PLA) Eastern Theatre Command.

Taiwan mobilised its armed forces to monitor and shadow Chinese activity as the drills got under way.

On Friday, the island’s Ministry of Defence published pictures of F-16s, armed with live missiles, patrolling the skies.

It also showed images of Chinese coastguard vessels, and other navy ships taking part in the drills near the Pengjia Islet north of Taiwan.

Taiwan's President William Lai Ching-te at a visit to a military base. He is seated with two officials and punching the air. Soldiers in uniform are behind them.

Footage published by China’s military, meanwhile, showed soldiers streaming out of a building to battle stations and jets taking off to a rousing martial tune.

State broadcaster CCTV reported that Chinese sailors had called out to their Taiwanese counterparts at sea, warning them against “resisting reunification by force”.

Blunt language

Beijing considers Lai a “troublemaker” and a “separatist”. Like his predecessor Tsai Ing-wen, he says only Taiwan’s people can decide their future.

At Thursday’s regular press briefing, Chinese Ministry of Foreign Affairs spokesman Wang Wenbin used the kind of blunt language usually used by the country’s propaganda outlets.

“Taiwan independence forces will be left with their heads broken and blood flowing after colliding against the great … trend of China achieving complete unification,” Wang told reporters.

Beijing’s Xinhua news agency and ruling party newspaper, the People’s Daily, both ran editorials hailing the drills on Friday, lashing out at Lai’s “treacherous behaviour” and promising a “severe blow”.

The United Nations has called on all sides to avoid escalation, while the United States – Taiwan’s strongest ally and military backer – “strongly” urged China to act with restraint.

The defeated Republic of China government fled to Taiwan in 1949 after losing China’s civil war to Mao Zedong’s communists, who founded the People’s Republic of China.

The drills are taking place in the Taiwan Strait and to the north, south and east of the island, as well as areas around the Taipei-administered islands of Kinmen , Matsu, Wuqiu and Dongyin.

IMAGES

  1. Significance Level and Power of a Hypothesis Test Tutorial

    meaning power hypothesis test

  2. Power of a Hypothesis Test

    meaning power hypothesis test

  3. Power of a hypothesis test

    meaning power hypothesis test

  4. Significance Level and Power of a Hypothesis Test Tutorial

    meaning power hypothesis test

  5. Hypothesis Testing- Meaning, Types & Steps

    meaning power hypothesis test

  6. Calculating the Power of a Hypothesis Test: Examples

    meaning power hypothesis test

VIDEO

  1. Hypothesis testing

  2. Hypothesis Testing

  3. HYPOTHESIS MEANING||WITH EXAMPLE ||FOR UGC NET,SET EXAM ||FIRST PAPER-RESEARCH ||

  4. Flow Diagram of hypothesis testing| Power of hypothesis testing| limitations of hypothesis testing

  5. Discover the Power of Hypothesis Testing in Physics

  6. THE POWER OF HYPOTHESIS TESTS

COMMENTS

  1. Lesson 25: Power of a Statistical Test

    In the above, example, the power of the hypothesis test depends on the value of the mean \(\mu\). As the actual mean \(\mu\) moves further away from the value of the mean \(\mu=100\) under the null hypothesis, the power of the hypothesis test increases. It's that first point that leads us to what is called the power function of the hypothesis ...

  2. Power of Hypothesis Test

    The power of a hypothesis test is affected by three factors. Sample size ( n ). Other things being equal, the greater the sample size, the greater the power of the test. Significance level (α). The lower the significance level, the lower the power of the test.

  3. What is Power in Statistics?

    High statistical power occurs when a hypothesis test is likely to find an effect that exists in the population. A low power test is unlikely to detect that effect. For example, if statistical power is 80%, a hypothesis test has an 80% chance of detecting an effect that actually exists. Now imagine you're performing a study that has only 10%.

  4. Statistical Power in Hypothesis Testing

    Statistical Power is a concept in hypothesis testing that calculates the probability of detecting a positive effect when the effect is actually positive. In my previous post, we walkthrough the procedures of conducting a hypothesis testing. And in this post, we will build upon that by introducing statistical power in hypothesis testing.

  5. How to Find the Power of a Statistical Test

    The power of the test is the probability of rejecting the null hypothesis, assuming that the true population mean is equal to the critical parameter value. Since the region of acceptance is 294.46 to 305.54, the null hypothesis will be rejected when the sampled run time is less than 294.46 or greater than 305.54.

  6. Statistical Power and Why It Matters

    Statistical power, or sensitivity, is the likelihood of a significance test detecting an effect when there actually is one. A true effect is a real, non-zero relationship between variables in a population. An effect is usually indicated by a real difference between groups or a correlation between variables.

  7. Power of a test

    Power of a test. In statistics, the power of a binary hypothesis test is the probability that the test correctly rejects the null hypothesis ( ) when a specific alternative hypothesis ( ) is true. It is commonly denoted by , and represents the chances of a true positive detection conditional on the actual existence of an effect to detect.

  8. Statistical Power: What it is, How to Calculate it

    What is Power? The statistical power of a study (sometimes called sensitivity) is how likely the study is to distinguish an actual effect from one of chance. It's the likelihood that the test is correctly rejecting the null hypothesis (i.e. "proving" your hypothesis ). For example, a study that has an 80% power means that the study has an ...

  9. Introduction to power in significance tests

    And power is an idea that you might encounter in a first year statistics course. It's turns out that it's fairly difficult to calculate, but it's interesting to know what it means and what are the levers that might increase the power or decrease the power in a significance test. So just to cut to the chase, power is a probability.

  10. Power in Tests of Significance

    Power is the probability of making a correct decision (to reject the null hypothesis) when the null hypothesis is false. Power is the probability that a test of significance will pick up on an effect that is present. Power is the probability that a test of significance will detect a deviation from the null hypothesis, should such a deviation exist.

  11. Finding the Power of a Hypothesis Test

    To calculate power, you basically work two problems back-to-back. First, find a percentile assuming that H 0 is true. Then, turn it around and find the probability that you'd get that value assuming H 0 is false (and instead H a is true). Assume that H 0 is true, and. Find the percentile value corresponding to.

  12. Understanding Alpha, Beta, and Statistical Power

    Just to really drive this point home, if we had a confidence level of 95% and an alpha of 5%, that would mean that the shaded region in the image would be 5% of the area under the curve of distribution A. Statistical Power and Beta. The power of a hypothesis test is the probability that the test will correctly support the alternative hypothesis ...

  13. 9.8: Effect Size, Sample Size and Power

    The plot shown in Figure 11.6 captures a fairly basic point about hypothesis testing. If the true state of the world is very different from what the null hypothesis predicts, then your power will be very high; but if the true state of the world is similar to the null (but not identical) then the power of the test is going to be very low.

  14. Hypothesis Testing

    Table of contents. Step 1: State your null and alternate hypothesis. Step 2: Collect data. Step 3: Perform a statistical test. Step 4: Decide whether to reject or fail to reject your null hypothesis. Step 5: Present your findings. Other interesting articles. Frequently asked questions about hypothesis testing.

  15. 9.1: Introduction to Hypothesis Testing

    In hypothesis testing, the goal is to see if there is sufficient statistical evidence to reject a presumed null hypothesis in favor of a conjectured alternative hypothesis.The null hypothesis is usually denoted \(H_0\) while the alternative hypothesis is usually denoted \(H_1\). An hypothesis test is a statistical decision; the conclusion will either be to reject the null hypothesis in favor ...

  16. A Gentle Introduction to Statistical Power and Power Analysis in Python

    The statistical power of a hypothesis test is the probability of detecting an effect, if there is a true effect present to detect. Power can be calculated and reported for a completed experiment to comment on the confidence one might have in the conclusions drawn from the results of the study. It can also be used as a tool to estimate

  17. 13.5: Factors Affecting Power

    There is a trade-off between the significance level and power: the more stringent (lower) the significance level, the lower the power. Figure 13.5.3 13.5. 3 shows that power is lower for the 0.01 0.01 level than it is for the 0.05 0.05 level. Naturally, the stronger the evidence needed to reject the null hypothesis, the lower the chance that ...

  18. Power of a Hypothesis Test

    Power of a Hypothesis Test: The power of hypothesis test is a measure of how effective the test is at identifying (say) a difference in populations if such a difference exists. It is the probability of rejecting the null hypothesis when it is false. Browse Other Glossary Entries.

  19. Introduction to Hypothesis Testing

    A statistical hypothesis is an assumption about a population parameter.. For example, we may assume that the mean height of a male in the U.S. is 70 inches. The assumption about the height is the statistical hypothesis and the true mean height of a male in the U.S. is the population parameter.. A hypothesis test is a formal statistical test we use to reject or fail to reject a statistical ...

  20. 5 Tips for Interpreting P-Values Correctly in Hypothesis Testing

    Here are five essential tips for ensuring the p-value from a hypothesis test is understood correctly. 1. Know What the P-value Represents. First, it is essential to understand what a p-value is. In hypothesis testing, the p-value is defined as the probability of observing your data, or data more extreme, if the null hypothesis is true.

  21. Hypothesis Test Calculator

    Calculation Example: There are six steps you would follow in hypothesis testing: Formulate the null and alternative hypotheses in three different ways: H0: θ = θ0 versus H1: θ ≠ θ0. H0: θ ≤ θ0 versus H1: θ > θ0. H0: θ ≥ θ0 versus H1: θ < θ0.

  22. China says war games around Taiwan to test ability to 'seize power

    24 May 2024. Save articles to read later and create your own reading list. China's military has begun its second day of war games around Taiwan, with drills that it said were to test the armed ...