User Preferences

Content preview.

Arcu felis bibendum ut tristique et egestas quis:

  • Ut enim ad minim veniam, quis nostrud exercitation ullamco laboris
  • Duis aute irure dolor in reprehenderit in voluptate
  • Excepteur sint occaecat cupidatat non proident

Keyboard Shortcuts

9.3 - the p-value approach, example 9-4 section  .

x-ray of someone with lung cancer

Up until now, we have used the critical region approach in conducting our hypothesis tests. Now, let's take a look at an example in which we use what is called the P -value approach .

Among patients with lung cancer, usually, 90% or more die within three years. As a result of new forms of treatment, it is felt that this rate has been reduced. In a recent study of n = 150 lung cancer patients, y = 128 died within three years. Is there sufficient evidence at the \(\alpha = 0.05\) level, say, to conclude that the death rate due to lung cancer has been reduced?

The sample proportion is:

\(\hat{p}=\dfrac{128}{150}=0.853\)

The null and alternative hypotheses are:

\(H_0 \colon p = 0.90\) and \(H_A \colon p < 0.90\)

The test statistic is, therefore:

\(Z=\dfrac{\hat{p}-p_0}{\sqrt{\dfrac{p_0(1-p_0)}{n}}}=\dfrac{0.853-0.90}{\sqrt{\dfrac{0.90(0.10)}{150}}}=-1.92\)

And, the rejection region is:

Since the test statistic Z = −1.92 < −1.645, we reject the null hypothesis. There is sufficient evidence at the \(\alpha = 0.05\) level to conclude that the rate has been reduced.

Example 9-4 (continued) Section  

What if we set the significance level \(\alpha\) = P (Type I Error) to 0.01? Is there still sufficient evidence to conclude that the death rate due to lung cancer has been reduced?

In this case, with \(\alpha = 0.01\), the rejection region is Z ≤ −2.33. That is, we reject if the test statistic falls in the rejection region defined by Z ≤ −2.33:

Because the test statistic Z = −1.92 > −2.33, we do not reject the null hypothesis. There is insufficient evidence at the \(\alpha = 0.01\) level to conclude that the rate has been reduced.

threshold

In the first part of this example, we rejected the null hypothesis when \(\alpha = 0.05\). And, in the second part of this example, we failed to reject the null hypothesis when \(\alpha = 0.01\). There must be some level of \(\alpha\), then, in which we cross the threshold from rejecting to not rejecting the null hypothesis. What is the smallest \(\alpha \text{ -level}\) that would still cause us to reject the null hypothesis?

We would, of course, reject any time the critical value was smaller than our test statistic −1.92:

That is, we would reject if the critical value were −1.645, −1.83, and −1.92. But, we wouldn't reject if the critical value were −1.93. The \(\alpha \text{ -level}\) associated with the test statistic −1.92 is called the P -value . It is the smallest \(\alpha \text{ -level}\) that would lead to rejection. In this case, the P -value is:

P ( Z < −1.92) = 0.0274

So far, all of the examples we've considered have involved a one-tailed hypothesis test in which the alternative hypothesis involved either a less than (<) or a greater than (>) sign. What happens if we weren't sure of the direction in which the proportion could deviate from the hypothesized null value? That is, what if the alternative hypothesis involved a not-equal sign (≠)? Let's take a look at an example.

two zebra tails

What if we wanted to perform a " two-tailed " test? That is, what if we wanted to test:

\(H_0 \colon p = 0.90\) versus \(H_A \colon p \ne 0.90\)

at the \(\alpha = 0.05\) level?

Let's first consider the critical value approach . If we allow for the possibility that the sample proportion could either prove to be too large or too small, then we need to specify a threshold value, that is, a critical value, in each tail of the distribution. In this case, we divide the " significance level " \(\alpha\) by 2 to get \(\alpha/2\):

That is, our rejection rule is that we should reject the null hypothesis \(H_0 \text{ if } Z ≥ 1.96\) or we should reject the null hypothesis \(H_0 \text{ if } Z ≤ −1.96\). Alternatively, we can write that we should reject the null hypothesis \(H_0 \text{ if } |Z| ≥ 1.96\). Because our test statistic is −1.92, we just barely fail to reject the null hypothesis, because 1.92 < 1.96. In this case, we would say that there is insufficient evidence at the \(\alpha = 0.05\) level to conclude that the sample proportion differs significantly from 0.90.

Now for the P -value approach . Again, needing to allow for the possibility that the sample proportion is either too large or too small, we multiply the P -value we obtain for the one-tailed test by 2:

That is, the P -value is:

\(P=P(|Z|\geq 1.92)=P(Z>1.92 \text{ or } Z<-1.92)=2 \times 0.0274=0.055\)

Because the P -value 0.055 is (just barely) greater than the significance level \(\alpha = 0.05\), we barely fail to reject the null hypothesis. Again, we would say that there is insufficient evidence at the \(\alpha = 0.05\) level to conclude that the sample proportion differs significantly from 0.90.

Let's close this example by formalizing the definition of a P -value, as well as summarizing the P -value approach to conducting a hypothesis test.

The P -value is the smallest significance level \(\alpha\) that leads us to reject the null hypothesis.

Alternatively (and the way I prefer to think of P -values), the P -value is the probability that we'd observe a more extreme statistic than we did if the null hypothesis were true.

If the P -value is small, that is, if \(P ≤ \alpha\), then we reject the null hypothesis \(H_0\).

Note! Section  

writing hand

By the way, to test \(H_0 \colon p = p_0\), some statisticians will use the test statistic:

\(Z=\dfrac{\hat{p}-p_0}{\sqrt{\dfrac{\hat{p}(1-\hat{p})}{n}}}\)

rather than the one we've been using:

\(Z=\dfrac{\hat{p}-p_0}{\sqrt{\dfrac{p_0(1-p_0)}{n}}}\)

One advantage of doing so is that the interpretation of the confidence interval — does it contain \(p_0\)? — is always consistent with the hypothesis test decision, as illustrated here:

For the sake of ease, let:

\(se(\hat{p})=\sqrt{\dfrac{\hat{p}(1-\hat{p})}{n}}\)

Two-tailed test. In this case, the critical region approach tells us to reject the null hypothesis \(H_0 \colon p = p_0\) against the alternative hypothesis \(H_A \colon p \ne p_0\):

if \(Z=\dfrac{\hat{p}-p_0}{se(\hat{p})} \geq z_{\alpha/2}\) or if \(Z=\dfrac{\hat{p}-p_0}{se(\hat{p})} \leq -z_{\alpha/2}\)

which is equivalent to rejecting the null hypothesis:

if \(\hat{p}-p_0 \geq z_{\alpha/2}se(\hat{p})\) or if \(\hat{p}-p_0 \leq -z_{\alpha/2}se(\hat{p})\)

if \(p_0 \geq \hat{p}+z_{\alpha/2}se(\hat{p})\) or if \(p_0 \leq \hat{p}-z_{\alpha/2}se(\hat{p})\)

That's the same as saying that we should reject the null hypothesis \(H_0 \text{ if } p_0\) is not in the \(\left(1-\alpha\right)100\%\) confidence interval!

Left-tailed test. In this case, the critical region approach tells us to reject the null hypothesis \(H_0 \colon p = p_0\) against the alternative hypothesis \(H_A \colon p < p_0\):

if \(Z=\dfrac{\hat{p}-p_0}{se(\hat{p})} \leq -z_{\alpha}\)

if \(\hat{p}-p_0 \leq -z_{\alpha}se(\hat{p})\)

if \(p_0 \geq \hat{p}+z_{\alpha}se(\hat{p})\)

That's the same as saying that we should reject the null hypothesis \(H_0 \text{ if } p_0\) is not in the upper \(\left(1-\alpha\right)100\%\) confidence interval:

\((0,\hat{p}+z_{\alpha}se(\hat{p}))\)

If you could change one thing about college, what would it be?

Graduate faster

Better quality online classes

Flexible schedule

Access to top-rated instructors

P Value Main Image

Calculating p-Value in Hypothesis Testing

10.15.2021 • 9 min read

Sarah Thomas

Subject Matter Expert

In this article, we'll take a deep dive on p-values, beginning with a description and definition of this key component of statistical hypothesis testing, before moving on to look at how to calculate it for different types of variables.

In This Article

What is a p-value, calculating p-values for discrete random variables, calculating p-values for continuous random variables.

A p-value (short for probability value) is a probability used in hypothesis testing. It represents the probability of observing sample data that is at least as extreme as the observed sample data, assuming that the null hypothesis is true .  

In a hypothesis test, you have two competing hypotheses: a null (or starting) hypothesis, H 0 H_0 H 0 ​ and an alternative hypothesis, H a H_a H a ​ . The goal of a hypothesis test is to use statistical evidence from a sample or multiple samples to determine which of the hypotheses is more likely to be true. The p-value can be used in the final stage of the test to make this determination.

Interpreting a p-value

Because it is a probability, the p-value can be expressed as a decimal or a percentage ranging from 0 to 1 or 0% to 100%. The closer the p-value is to zero, the stronger the evidence is in support of the alternative hypothesis, H a H_a H a ​ .

Reject or Fail to Reject the Null Hypothesis?

When the p-value is below a certain threshold, the null hypothesis is rejected in favor of the alternative hypothesis. This threshold is known as the significance level (or alpha level) of the test. 

The most commonly used significance level is 0.05 or 5%, but the choice of the significance level is up to the researcher. You could just as easily use a significance level of 0.1 or 0.01, for example. Remember, however, that the lower the p-value, the stronger the evidence is in support of the alternative hypothesis. For this reason, choosing a lower significance level means that you can have more confidence in your decision to reject a null hypothesis.

When the p-value is greater than the significance level, the evidence favors the null hypothesis, and the researcher or statistician must fail to reject the null hypothesis.

As mentioned earlier, the p-value is the probability of observing sample data that’s at least as extreme as the observed sample data, assuming that the null hypothesis is true. 

If your data consists of a discrete random variable, you can map out the entire set of possible outcomes and their respective probabilities in order to calculate the p-value. 

The p-value will then be the sum of three things:

the probability of the observed outcome

the probability of all outcomes that are just as likely as the observed outcome

and the probability of any outcome that is less likely than the observed outcome

Here is an example. 

A stranger invites you to play a game of dice, and claims her dice are fair. The rules of the game are as follows: You roll a single die. If you roll an even number, you count that as a win (or success) and earn $1. If you roll an odd number, you count that as a loss (or failure) and lose $0.80. You can play the game for as many rounds as you like. 

Let’s say you play four rounds of the game, and you lose all four rounds. This leaves you $3.20 poorer than before you started playing.

Given your losses, you may be interested in conducting a hypothesis test. The null hypothesis will be that the dice used in the game are indeed fair and that there is an equal chance of rolling an even or odd number with each roll. Your alternative hypothesis is that the dice are weighted towards landing on odd numbers.

To calculate the p-value, we map all of the possible outcomes of playing four rounds of the game. In each round, there are only two possible outcomes (odd or even), and after four rounds, there are a total of 2 4 2^4 2 4 , or 16, outcomes. If we assume the null hypothesis is true—that the dice are fair)—each of these outcomes is equally likely, with a probability of 1/16.

E/O Diamond

Since we are only concerned about the total number of wins and losses, and not concerned at all with their order, the outcomes and probabilities we care about are the following:

the probability of getting 4 wins and 0 losses = 1/16

the probability of getting 3 wins and 1 loss = 4/16

the probability of getting 2 wins and 2 losses = 6/16

the probability of getting 1 win and 3 losses = 4/16

the probability of getting 0 wins and 4 losses = 1/16

To calculate the p-value, we sum up the following:

the probability of the observed outcome (0 wins and 4 losses) 

the probability of any outcome that is just as likely as the observed outcome (4 wins and 0 losses)

the probability of any outcome that is less likely than the observed outcome (in this example, there are no outcomes that are less likely than the observed outcome, so this value is zero)

p-Value =  1/16 + 1/16 = 1/8 or 0.125

The p-value we found is 0.125. Surprisingly, this is still well above a 0.05 significance level. It is even above a 0.10 (or 10%) significance level. Regardless of which of these thresholds you choose, you must fail to reject the null hypothesis. In other words, despite four losses in a row, the evidence still favors the hypothesis that the dice are fair! It may be a different story if you experience 10 or even 5 losses in a row. Calculate the p-value to find out!

When the hypothesis test involves a continuous random variable, we use a test statistic and the area under the probability density function to determine the p-value. The intuition behind the p-value is the same as in the discrete case. Assuming that the null hypothesis is true, we are calculating the probability of observing sample data that is at least as extreme as the sample data we have observed.

Let’s take a look at another example.

Say you have an orange grove, and you’re convinced that your oranges now grow larger than when you first started growing citrus. You happen to know that the standard deviation of the weights of your oranges, σ \sigma σ , is equal to 0.8 oz. This is the perfect opportunity to conduct a hypothesis test.

Your null hypothesis, in this case, is that the mean weight of your oranges has remained unchanged over the years and is equal to 5 oz (the null hypothesis typically represents the hypothesis that you are trying to move away from). Your alternative hypothesis is that the average weight of your oranges is now greater than 5 oz.

Because you can’t weigh every orange in your grove, you pick a large random sample of oranges (with a sample size of 100), weigh those, and observe that the average weight in your sample, x ‾ \overline x x , is equal to 5.2 oz. 

Does this result support the null hypothesis or the alternative hypothesis? It’s not immediately clear. By pure chance, you could have had a handful of extra-large oranges in your sample, and this could have pushed your sample mean above a population mean of 5 oz. Alternatively, the sample mean could indicate that the population mean is, in fact, greater than 5 oz. 

Here is where we begin the hypothesis test. We’ll conduct the test at a 0.05 significance level.

We start by asking the following question: Assuming that the null hypothesis is true, how likely or unlikely is it to observe a sample mean x ‾ \overline x x = 5.2 oz?

From the central limit theorem, we know that if our sample is randomly drawn and large enough, we can assume that the sampling distribution of the sample means is normally distributed with a mean equal to the true population mean, μ \mu μ , and a standard error equal to σ n \frac\sigma{\sqrt n} n ​ σ ​ . This means that if the null hypothesis is true, the sampling distribution for the sample mean of our orange weights will be normally distributed, with a mean equal to 5 and a standard error equal to 0.08.

p-Value Chart 1

From here, we can convert our sample mean of 5.2 into what is known as a test statistic. To do this we use the exact same process we use when calculating standardized units such as z-scores or t-scores. Since we know the sampling distribution is approximately normal, and since we know the population standard deviation ​​ σ \sigma σ and the standard error σ n \frac\sigma{\sqrt n} n ​ σ ​ of the sampling distribution, we can calculate a Z-test statistic in the same way that we would calculate a z-score (if we did not know σ \sigma σ , we would use the sample standard deviation, s, to calculate a t-test statistic in the same way that we calculate t-scores).

p-Value chart 2

The test statistic is telling us that if our null hypothesis is true, then our observed sample mean, x ‾ \overline x x , is 2.5 standard deviations above the mean of the sampling distribution. To put the p-value to work we can do one of two things.

1. We can calculate the p-value associated with the test statistic. This can be done by finding the area under the standard normal distribution that lies to the right of 2.5. This gives us a p-value of 0.0062. The p-value is telling us that if the null hypothesis is true, we would only observe a sample mean of 5.2 or greater 0.0062 (or 0.62%) of the time. Because this probability is so low, it’s likely that the null hypothesis is false.

Since the p-value of 0.0062 is less than the significance level of 0.05, we can reject the null hypothesis at the 0.05 significance level. We can even reject it at the 0.01 significance level! You’re likely to be right about your oranges: the average weights have likely increased over time.

2. If you are familiar with standard normal distributions you may have realized that the significance level of our test (alpha = 0.05) is associated with the 95th percentile of the standard normal distribution. You may also know that the 95th percentile of a standard normal distribution is associated with a Z-score of 1.64.  Since the test statistic 2.5 lies to the right of the Z-score, we can assume that the p-value will be less than 0.05. This is another way to complete the hypothesis test without having to do additional calculations. 

Two-sided, upper-tailed, and lower-tailed hypothesis tests

In the orange grove example above, we conducted an upper-tailed hypothesis test, because the alternative hypothesis H a H_a H a ​ was of the form μ > μ 0 \mu>\mu_0 μ > μ 0 ​ . It’s important to know, however, how the calculation of p-values differs when you have a two-tailed or a lower-tailed hypothesis test.

For a two-tailed test (when the alternative hypothesis, H a H_a H a ​ , stipulates that a population parameter is ≠ to some number), the p-value is equal to twice the probability associated with the test statistic. If we had conducted a two-tailed test in the orange grove example ( H a H_a H a ​ : μ ≠ 5 \mu\neq5 μ  = 5 ), the p-value would be equal to the probability that x ‾ \overline x x was greater than 2.5 plus the probability that x ‾ \overline x x is less than -2.5. Because the standard normal is symmetric about the mean, this is equal to (0.0062 * 2 = 0.0124).

For a lower-tailed test (when the alternative hypothesis, H a H_a H a ​ , stipulates that a population parameter is ≤ to some number) the process is similar to the upper-tailed test, but the p-value will be the probability of getting a sample statistic that lies to the left of the test-statistic, rather than to the right of it. 

Explore Outlier's Award-Winning For-Credit Courses

Outlier (from the co-founder of MasterClass) has brought together some of the world's best instructors, game designers, and filmmakers to create the future of online college.

Check out these related courses:

Intro to Statistics

Intro to Statistics

How data describes our world.

Intro to Microeconomics

Intro to Microeconomics

Why small choices have big impact.

Intro to Macroeconomics

Intro to Macroeconomics

How money moves our world.

Intro to Psychology

Intro to Psychology

The science of the mind.

Related Articles

Mound of letters and numbers that represent the use of sets and subsets

What Do Subsets Mean in Statistics?

This article explains what subsets are in statistics and why they are important. You’ll learn about different types of subsets with formulas and examples for each.

Outlier Blog Set Operation HighRes

Set Operations: Formulas, Properties, Examples & Exercises

Here is an overview of set operations, what they are, properties, examples, and exercises.

Outlier Blog Definite Integrals HighRes

Definite Integrals: What Are They and How to Calculate Them

Knowing how to find definite integrals is an essential skill in calculus. In this article, we’ll learn the definition of definite integrals, how to evaluate definite integrals, and practice with some examples.

Rachel McLean

P-Value And Statistical Significance: What It Is & Why It Matters

Saul Mcleod, PhD

Editor-in-Chief for Simply Psychology

BSc (Hons) Psychology, MRes, PhD, University of Manchester

Saul Mcleod, PhD., is a qualified psychology teacher with over 18 years of experience in further and higher education. He has been published in peer-reviewed journals, including the Journal of Clinical Psychology.

Learn about our Editorial Process

Olivia Guy-Evans, MSc

Associate Editor for Simply Psychology

BSc (Hons) Psychology, MSc Psychology of Education

Olivia Guy-Evans is a writer and associate editor for Simply Psychology. She has previously worked in healthcare and educational sectors.

On This Page:

The p-value in statistics quantifies the evidence against a null hypothesis. A low p-value suggests data is inconsistent with the null, potentially favoring an alternative hypothesis. Common significance thresholds are 0.05 or 0.01.

P-Value Explained in Normal Distribution

Hypothesis testing

When you perform a statistical test, a p-value helps you determine the significance of your results in relation to the null hypothesis.

The null hypothesis (H0) states no relationship exists between the two variables being studied (one variable does not affect the other). It states the results are due to chance and are not significant in supporting the idea being investigated. Thus, the null hypothesis assumes that whatever you try to prove did not happen.

The alternative hypothesis (Ha or H1) is the one you would believe if the null hypothesis is concluded to be untrue.

The alternative hypothesis states that the independent variable affected the dependent variable, and the results are significant in supporting the theory being investigated (i.e., the results are not due to random chance).

What a p-value tells you

A p-value, or probability value, is a number describing how likely it is that your data would have occurred by random chance (i.e., that the null hypothesis is true).

The level of statistical significance is often expressed as a p-value between 0 and 1.

The smaller the p -value, the less likely the results occurred by random chance, and the stronger the evidence that you should reject the null hypothesis.

Remember, a p-value doesn’t tell you if the null hypothesis is true or false. It just tells you how likely you’d see the data you observed (or more extreme data) if the null hypothesis was true. It’s a piece of evidence, not a definitive proof.

Example: Test Statistic and p-Value

Suppose you’re conducting a study to determine whether a new drug has an effect on pain relief compared to a placebo. If the new drug has no impact, your test statistic will be close to the one predicted by the null hypothesis (no difference between the drug and placebo groups), and the resulting p-value will be close to 1. It may not be precisely 1 because real-world variations may exist. Conversely, if the new drug indeed reduces pain significantly, your test statistic will diverge further from what’s expected under the null hypothesis, and the p-value will decrease. The p-value will never reach zero because there’s always a slim possibility, though highly improbable, that the observed results occurred by random chance.

P-value interpretation

The significance level (alpha) is a set probability threshold (often 0.05), while the p-value is the probability you calculate based on your study or analysis.

A p-value less than or equal to your significance level (typically ≤ 0.05) is statistically significant.

A p-value less than or equal to a predetermined significance level (often 0.05 or 0.01) indicates a statistically significant result, meaning the observed data provide strong evidence against the null hypothesis.

This suggests the effect under study likely represents a real relationship rather than just random chance.

For instance, if you set α = 0.05, you would reject the null hypothesis if your p -value ≤ 0.05. 

It indicates strong evidence against the null hypothesis, as there is less than a 5% probability the null is correct (and the results are random).

Therefore, we reject the null hypothesis and accept the alternative hypothesis.

Example: Statistical Significance

Upon analyzing the pain relief effects of the new drug compared to the placebo, the computed p-value is less than 0.01, which falls well below the predetermined alpha value of 0.05. Consequently, you conclude that there is a statistically significant difference in pain relief between the new drug and the placebo.

What does a p-value of 0.001 mean?

A p-value of 0.001 is highly statistically significant beyond the commonly used 0.05 threshold. It indicates strong evidence of a real effect or difference, rather than just random variation.

Specifically, a p-value of 0.001 means there is only a 0.1% chance of obtaining a result at least as extreme as the one observed, assuming the null hypothesis is correct.

Such a small p-value provides strong evidence against the null hypothesis, leading to rejecting the null in favor of the alternative hypothesis.

A p-value more than the significance level (typically p > 0.05) is not statistically significant and indicates strong evidence for the null hypothesis.

This means we retain the null hypothesis and reject the alternative hypothesis. You should note that you cannot accept the null hypothesis; we can only reject it or fail to reject it.

Note : when the p-value is above your threshold of significance,  it does not mean that there is a 95% probability that the alternative hypothesis is true.

One-Tailed Test

Probability and statistical significance in ab testing. Statistical significance in a b experiments

Two-Tailed Test

statistical significance two tailed

How do you calculate the p-value ?

Most statistical software packages like R, SPSS, and others automatically calculate your p-value. This is the easiest and most common way.

Online resources and tables are available to estimate the p-value based on your test statistic and degrees of freedom.

These tables help you understand how often you would expect to see your test statistic under the null hypothesis.

Understanding the Statistical Test:

Different statistical tests are designed to answer specific research questions or hypotheses. Each test has its own underlying assumptions and characteristics.

For example, you might use a t-test to compare means, a chi-squared test for categorical data, or a correlation test to measure the strength of a relationship between variables.

Be aware that the number of independent variables you include in your analysis can influence the magnitude of the test statistic needed to produce the same p-value.

This factor is particularly important to consider when comparing results across different analyses.

Example: Choosing a Statistical Test

If you’re comparing the effectiveness of just two different drugs in pain relief, a two-sample t-test is a suitable choice for comparing these two groups. However, when you’re examining the impact of three or more drugs, it’s more appropriate to employ an Analysis of Variance ( ANOVA) . Utilizing multiple pairwise comparisons in such cases can lead to artificially low p-values and an overestimation of the significance of differences between the drug groups.

How to report

A statistically significant result cannot prove that a research hypothesis is correct (which implies 100% certainty).

Instead, we may state our results “provide support for” or “give evidence for” our research hypothesis (as there is still a slight probability that the results occurred by chance and the null hypothesis was correct – e.g., less than 5%).

Example: Reporting the results

In our comparison of the pain relief effects of the new drug and the placebo, we observed that participants in the drug group experienced a significant reduction in pain ( M = 3.5; SD = 0.8) compared to those in the placebo group ( M = 5.2; SD  = 0.7), resulting in an average difference of 1.7 points on the pain scale (t(98) = -9.36; p < 0.001).

The 6th edition of the APA style manual (American Psychological Association, 2010) states the following on the topic of reporting p-values:

“When reporting p values, report exact p values (e.g., p = .031) to two or three decimal places. However, report p values less than .001 as p < .001.

The tradition of reporting p values in the form p < .10, p < .05, p < .01, and so forth, was appropriate in a time when only limited tables of critical values were available.” (p. 114)

  • Do not use 0 before the decimal point for the statistical value p as it cannot equal 1. In other words, write p = .001 instead of p = 0.001.
  • Please pay attention to issues of italics ( p is always italicized) and spacing (either side of the = sign).
  • p = .000 (as outputted by some statistical packages such as SPSS) is impossible and should be written as p < .001.
  • The opposite of significant is “nonsignificant,” not “insignificant.”

Why is the p -value not enough?

A lower p-value  is sometimes interpreted as meaning there is a stronger relationship between two variables.

However, statistical significance means that it is unlikely that the null hypothesis is true (less than 5%).

To understand the strength of the difference between the two groups (control vs. experimental) a researcher needs to calculate the effect size .

When do you reject the null hypothesis?

In statistical hypothesis testing, you reject the null hypothesis when the p-value is less than or equal to the significance level (α) you set before conducting your test. The significance level is the probability of rejecting the null hypothesis when it is true. Commonly used significance levels are 0.01, 0.05, and 0.10.

Remember, rejecting the null hypothesis doesn’t prove the alternative hypothesis; it just suggests that the alternative hypothesis may be plausible given the observed data.

The p -value is conditional upon the null hypothesis being true but is unrelated to the truth or falsity of the alternative hypothesis.

What does p-value of 0.05 mean?

If your p-value is less than or equal to 0.05 (the significance level), you would conclude that your result is statistically significant. This means the evidence is strong enough to reject the null hypothesis in favor of the alternative hypothesis.

Are all p-values below 0.05 considered statistically significant?

No, not all p-values below 0.05 are considered statistically significant. The threshold of 0.05 is commonly used, but it’s just a convention. Statistical significance depends on factors like the study design, sample size, and the magnitude of the observed effect.

A p-value below 0.05 means there is evidence against the null hypothesis, suggesting a real effect. However, it’s essential to consider the context and other factors when interpreting results.

Researchers also look at effect size and confidence intervals to determine the practical significance and reliability of findings.

How does sample size affect the interpretation of p-values?

Sample size can impact the interpretation of p-values. A larger sample size provides more reliable and precise estimates of the population, leading to narrower confidence intervals.

With a larger sample, even small differences between groups or effects can become statistically significant, yielding lower p-values. In contrast, smaller sample sizes may not have enough statistical power to detect smaller effects, resulting in higher p-values.

Therefore, a larger sample size increases the chances of finding statistically significant results when there is a genuine effect, making the findings more trustworthy and robust.

Can a non-significant p-value indicate that there is no effect or difference in the data?

No, a non-significant p-value does not necessarily indicate that there is no effect or difference in the data. It means that the observed data do not provide strong enough evidence to reject the null hypothesis.

There could still be a real effect or difference, but it might be smaller or more variable than the study was able to detect.

Other factors like sample size, study design, and measurement precision can influence the p-value. It’s important to consider the entire body of evidence and not rely solely on p-values when interpreting research findings.

Can P values be exactly zero?

While a p-value can be extremely small, it cannot technically be absolute zero. When a p-value is reported as p = 0.000, the actual p-value is too small for the software to display. This is often interpreted as strong evidence against the null hypothesis. For p values less than 0.001, report as p < .001

Further Information

  • P-values and significance tests (Kahn Academy)
  • Hypothesis testing and p-values (Kahn Academy)
  • Wasserstein, R. L., Schirm, A. L., & Lazar, N. A. (2019). Moving to a world beyond “ p “< 0.05”.
  • Criticism of using the “ p “< 0.05”.
  • Publication manual of the American Psychological Association
  • Statistics for Psychology Book Download

Bland, J. M., & Altman, D. G. (1994). One and two sided tests of significance: Authors’ reply.  BMJ: British Medical Journal ,  309 (6958), 874.

Goodman, S. N., & Royall, R. (1988). Evidence and scientific research.  American Journal of Public Health ,  78 (12), 1568-1574.

Goodman, S. (2008, July). A dirty dozen: twelve p-value misconceptions . In  Seminars in hematology  (Vol. 45, No. 3, pp. 135-140). WB Saunders.

Lang, J. M., Rothman, K. J., & Cann, C. I. (1998). That confounded P-value.  Epidemiology (Cambridge, Mass.) ,  9 (1), 7-8.

Print Friendly, PDF & Email

Related Articles

Exploratory Data Analysis

Exploratory Data Analysis

What Is Face Validity In Research? Importance & How To Measure

Research Methodology , Statistics

What Is Face Validity In Research? Importance & How To Measure

Criterion Validity: Definition & Examples

Criterion Validity: Definition & Examples

Convergent Validity: Definition and Examples

Convergent Validity: Definition and Examples

Content Validity in Research: Definition & Examples

Content Validity in Research: Definition & Examples

Construct Validity In Psychology Research

Construct Validity In Psychology Research

[banner]

Summary and Analysis of Extension Program Evaluation in R

Salvatore S. Mangiafico

Search Rcompanion.org

  • Purpose of this Book
  • Author of this Book
  • Statistics Textbooks and Other Resources
  • Why Statistics?
  • Evaluation Tools and Surveys
  • Types of Variables
  • Descriptive Statistics
  • Confidence Intervals
  • Basic Plots

Hypothesis Testing and p-values

  • Reporting Results of Data and Analyses
  • Choosing a Statistical Test
  • Independent and Paired Values
  • Introduction to Likert Data
  • Descriptive Statistics for Likert Item Data
  • Descriptive Statistics with the likert Package
  • Confidence Intervals for Medians
  • Converting Numeric Data to Categories
  • Introduction to Traditional Nonparametric Tests
  • One-sample Wilcoxon Signed-rank Test
  • Sign Test for One-sample Data
  • Two-sample Mann–Whitney U Test
  • Mood’s Median Test for Two-sample Data
  • Two-sample Paired Signed-rank Test
  • Sign Test for Two-sample Paired Data
  • Kruskal–Wallis Test
  • Mood’s Median Test
  • Friedman Test
  • Scheirer–Ray–Hare Test
  • Aligned Ranks Transformation ANOVA
  • Nonparametric Regression and Local Regression
  • Nonparametric Regression for Time Series
  • Introduction to Permutation Tests
  • One-way Permutation Test for Ordinal Data
  • One-way Permutation Test for Paired Ordinal Data
  • Permutation Tests for Medians and Percentiles
  • Association Tests for Ordinal Tables
  • Measures of Association for Ordinal Tables
  • Introduction to Linear Models
  • Using Random Effects in Models
  • What are Estimated Marginal Means?
  • Estimated Marginal Means for Multiple Comparisons
  • Factorial ANOVA: Main Effects, Interaction Effects, and Interaction Plots
  • p-values and R-square Values for Models
  • Accuracy and Errors for Models
  • Introduction to Cumulative Link Models (CLM) for Ordinal Data
  • Two-sample Ordinal Test with CLM
  • Two-sample Paired Ordinal Test with CLMM
  • One-way Ordinal Regression with CLM
  • One-way Repeated Ordinal Regression with CLMM
  • Two-way Ordinal Regression with CLM
  • Two-way Repeated Ordinal Regression with CLMM
  • Introduction to Tests for Nominal Variables
  • Confidence Intervals for Proportions
  • Goodness-of-Fit Tests for Nominal Variables
  • Association Tests for Nominal Variables
  • Measures of Association for Nominal Variables
  • Tests for Paired Nominal Data
  • Cochran–Mantel–Haenszel Test for 3-Dimensional Tables
  • Cochran’s Q Test for Paired Nominal Data
  • Models for Nominal Data
  • Introduction to Parametric Tests
  • One-sample t-test
  • Two-sample t-test
  • Paired t-test
  • One-way ANOVA
  • One-way ANOVA with Blocks
  • One-way ANOVA with Random Blocks
  • Two-way ANOVA
  • Repeated Measures ANOVA
  • Correlation and Linear Regression
  • Advanced Parametric Methods
  • Transforming Data
  • Normal Scores Transformation
  • Regression for Count Data
  • Beta Regression for Percent and Proportion Data
  • An R Companion for the Handbook of Biological Statistics

Initial comments

Traditionally when students first learn about the analysis of experiments, there is a strong focus on hypothesis testing and making decisions based on p -values. Hypothesis testing is important for determining if there are statistically significant effects.  However, readers of this book should not place undo emphasis on p -values. Instead, they should realize that p -values are affected by sample size, and that a low p -value does not necessarily suggest a large effect or a practically meaningful effect.  Summary statistics, plots, effect size statistics, and practical considerations should be used. The goal is to determine: a) statistical significance, b) effect size, c) practical importance.  These are all different concepts, and they will be explored below.

Statistical inference

Most of what we’ve covered in this book so far is about producing descriptive statistics: calculating means and medians, plotting data in various ways, and producing confidence intervals.  The bulk of the rest of this book will cover statistical inference:  using statistical tests to draw some conclusion about the data.  We’ve already done this a little bit in earlier chapters by using confidence intervals to conclude if means are different or not among groups.

As Dr. Nic mentions in her article in the “References and further reading” section, this is the part where people sometimes get stumped.  It is natural for most of us to use summary statistics or plots, but jumping to statistical inference needs a little change in perspective.  The idea of using some statistical test to answer a question isn’t a difficult concept, but some of the following discussion gets a little theoretical.  The video from the Statistics Learning Center in the “References and further reading” section does a good job of explaining the basis of statistical inference.

One important thing to gain from this chapter is an understanding of how to use the p -value, alpha , and decision rule to test the null hypothesis.  But once you are comfortable with that, you will want to return to this chapter to have a better understanding of the theory behind this process.

Another important thing is to understand the limitations of relying on p -values, and why it is important to assess the size of effects and weigh practical considerations.

Packages used in this chapter

The packages used in this chapter include:

The following commands will install these packages if they are not already installed:

if(!require(lsr)){install.packages("lsr")}

Hypothesis testing

The null and alternative hypotheses.

The statistical tests in this book rely on testing a null hypothesis, which has a specific formulation for each test.  The null hypothesis always describes the case where e.g. two groups are not different or there is no correlation between two variables, etc.

The alternative hypothesis is the contrary of the null hypothesis, and so describes the cases where there is a difference among groups or a correlation between two variables, etc.

Notice that the definitions of null hypothesis and alternative hypothesis have nothing to do with what you want to find or don't want to find, or what is interesting or not interesting, or what you expect to find or what you don’t expect to find.  If you were comparing the height of men and women, the null hypothesis would be that the height of men and the height of women were not different.  Yet, you might find it surprising if you found this hypothesis to be true for some population you were studying.  Likewise, if you were studying the income of men and women, the null hypothesis would be that the income of men and women are not different, in the population you are studying.  In this case you might be hoping the null hypothesis is true, though you might be unsurprised if the alternative hypothesis were true.  In any case, the null hypothesis will take the form that there is no difference between groups, there is no correlation between two variables, or there is no effect of this variable in our model.

p -value definition

Most of the tests in this book rely on using a statistic called the p -value to evaluate if we should reject, or fail to reject, the null hypothesis.

Given the assumption that the null hypothesis is true , the p -value is defined as the probability of obtaining a result equal to or more extreme than what was actually observed in the data.

We’ll unpack this definition in a little bit.

Decision rule

The p -value for the given data will be determined by conducting the statistical test.

This p -value is then compared to a pre-determined value alpha .  Most commonly, an alpha value of 0.05 is used, but there is nothing magic about this value.

If the p -value for the test is less than alpha , we reject the null hypothesis.

If the p -value is greater than or equal to alpha , we fail to reject the null hypothesis.

Coin flipping example

For an example of using the p -value for hypothesis testing, imagine you have a coin you will toss 100 times.  The null hypothesis is that the coin is fair—that is, that it is equally likely that the coin will land on heads as land on tails.  The alternative hypothesis is that the coin is not fair.  Let’s say for this experiment you throw the coin 100 times and it lands on heads 95 times out of those hundred.  The p -value in this case would be the probability of getting 95, 96, 97, 98, 99, or 100 heads, or 0, 1, 2, 3, 4, or 5 heads, assuming that the null hypothesis is true . 

This is what we call a two-sided test, since we are testing both extremes suggested by our data:  getting 95 or greater heads or getting 95 or greater tails.  In most cases we will use two sided tests.

You can imagine that the p -value for this data will be quite small.  If the null hypothesis is true, and the coin is fair, there would be a low probability of getting 95 or more heads or 95 or more tails.

Using a binomial test, the p -value is < 0.0001.

(Actually, R reports it as < 2.2e-16, which is shorthand for the number in scientific notation, 2.2 x 10 -16 , which is 0.00000000000000022, with 15 zeros after the decimal point.)

Assuming an alpha of 0.05, since the p -value is less than alpha , we reject the null hypothesis.  That is, we conclude that the coin is not fair.

binom.test(5, 100, 0.5)

Exact binomial test number of successes = 5, number of trials = 100, p-value < 2.2e-16 alternative hypothesis: true probability of success is not equal to 0.5

Passing and failing example

As another example, imagine we are considering two classrooms, and we have counts of students who passed a certain exam.  We want to know if one classroom had statistically more passes or failures than the other.

In our example each classroom will have 10 students.  The data is arranged into a contingency table.

Classroom   Passed   Failed A          8       2 B          3       7

We will use Fisher’s exact test to test if there is an association between Classroom and the counts of passed and failed students.  The null hypothesis is that there is no association between Classroom and Passed/Failed , based on the relative counts in each cell of the contingency table.

Input =("  Classroom  Passed  Failed  A          8       2  B          3       7 ") Matrix = as.matrix(read.table(textConnection(Input),                    header=TRUE,                    row.names=1)) Matrix 

  Passed Failed A      8      2 B      3      7

fisher.test(Matrix)

Fisher's Exact Test for Count Data p-value = 0.06978

The reported p -value is 0.070.  If we use an alpha of 0.05, then the p -value is greater than alpha , so we fail to reject the null hypothesis.  That is, we did not have sufficient evidence to say that there is an association between Classroom and Passed/Failed .

More extreme data in this case would be if the counts in the upper left or lower right (or both!) were greater. 

Classroom   Passed   Failed A          9       1 B          3       7 Classroom   Passed   Failed A          10      0 B           3      7 and so on, with Classroom B...

In most cases we would want to consider as "extreme" not only the results when Classroom A has a high frequency of passing students, but also results when Classroom B has a high frequency of passing students.  This is called a two-sided or two-tailed test.  If we were only concerned with one classroom having a high frequency of passing students, relatively, we would instead perform a one-sided test.  The default for the fisher.test function is two-sided, and usually you will want to use two-sided tests.

Classroom   Passed   Failed A          2       8 B          7       3 Classroom   Passed   Failed A          1       9 B          7       3 Classroom   Passed   Failed A          0       10 B          7        3 and so on, with Classroom B...

In both cases, "extreme" means there is a stronger association between Classroom and Passed/Failed .

Theory and practice of using p -values

Wait, does this make any sense.

Recall that the definition of the p -value is:

The astute reader might be asking herself, “If I’m trying to determine if the null hypothesis is true or not, why would I start with the assumption that the null hypothesis is true?  And why am I using a probability of getting certain data given that a hypothesis is true?  Don’t I want to instead determine the probability of the hypothesis given my data?”

The answer is yes , we would like a method to determine the likelihood of our hypothesis being true given our data, but we use the Null Hypothesis Significance Test approach since it is relatively straightforward, and has wide acceptance historically and across disciplines.

In practice we do use the results of the statistical tests to reach conclusions about the null hypothesis.

Technically, the p -value says nothing about the alternative hypothesis.  But logically, if the null hypothesis is rejected, then its logical complement, the alternative hypothesis, is supported.  Practically, this is how we handle significant p -values, though this practical approach generates disapproval in some theoretical circles.

Statistics is like a jury?

Note the language used when testing the null hypothesis.  Based on the results of our statistical tests, we either reject the null hypothesis, or fail to reject the null hypothesis.

This is somewhat similar to the approach of a jury in a trial.  The jury either finds sufficient evidence to declare someone guilty, or fails to find sufficient evidence to declare someone guilty. 

Failing to convict someone isn’t necessarily the same as declaring someone innocent.  Likewise, if we fail to reject the null hypothesis, we shouldn’t assume that the null hypothesis is true.  It may be that we didn’t have sufficient samples to get a result that would have allowed us to reject the null hypothesis, or maybe there are some other factors affecting the results that we didn’t account for.  This is similar to an “innocent until proven guilty” stance.

Errors in inference

For the most part, the statistical tests we use are based on probability, and our data could always be the result of chance.  Considering the coin flipping example above, if we did flip a coin 100 times and came up with 95 heads, we would be compelled to conclude that the coin was not fair.  But 95 heads could happen with a fair coin strictly by chance.

We can, therefore, make two kinds of errors in testing the null hypothesis:

•  A Type I error occurs when the null hypothesis really is true, but based on our decision rule we reject the null hypothesis.  In this case, our result is a false positive ; we think there is an effect (unfair coin, association between variables, difference among groups) when really there isn’t.  The probability of making this kind error is alpha , the same alpha we used in our decision rule.

•  A Type II error occurs when the null hypothesis is really false, but based on our decision rule we fail to reject the null hypothesis.  In this case, our result is a false negative ; we have failed to find an effect that really does exist.  The probability of making this kind of error is called beta .

The following table summarizes these errors.

                            Reality                             ___________________________________ Decision of Test             Null is true             Null is false Reject null hypothesis      Type I error           Correctly                              (prob. = alpha)          reject null                                                      (prob. = 1 – beta) Retain null hypothesis      Correctly               Type II error                              retain null             (prob. = beta)                              (prob. = 1 – alpha)

Statistical power

The statistical power of a test is a measure of the ability of the test to detect a real effect.  It is related to the effect size, the sample size, and our chosen alpha level. 

The effect size is a measure of how unfair a coin is, how strong the association is between two variables, or how large the difference is among groups.  As the effect size increases or as the number of observations we collect increases, or as the alpha level increases, the power of the test increases.

Statistical power in the table above is indicated by 1 – beta , and power is the probability of correctly rejecting the null hypothesis.

An example should make these relationship clear.  Imagine we are sampling a large group of 7 th grade students for their height.  That is, the group is the population, and we are sampling a sub-set of these students.  In reality, for students in the population, the girls are taller than the boys, but the difference is small (that is, the effect size is small), and there is a lot of variability in students’ heights.  You can imagine that in order to detect the difference between girls and boys that we would have to measure many students.  If we fail to sample enough students, we might make a Type II error.  That is, we might fail to detect the actual difference in heights between sexes.

If we had a different experiment with a larger effect size—for example the weight difference between mature hamsters and mature hedgehogs—we might need fewer samples to detect the difference.

Note also, that our chosen alpha plays a role in the power of our test, too.  All things being equal, across many tests, if we decrease our alph a, that is, insist on a lower rate of Type I errors, we are more likely to commit a Type II error, and so have a lower power.  This is analogous to a case of a meticulous jury that has a very high standard of proof to convict someone.  In this case, the likelihood of a false conviction is low, but the likelihood of a letting a guilty person go free is relatively high.

The 0.05 alpha value is not dogma

The level of alpha is traditionally set at 0.05 in some disciplines, though there is sometimes reason to choose a different value.

One situation in which the alpha level is increased is in preliminary studies in which it is better to include potentially significant effects even if there is not strong evidence for keeping them.  In this case, the researcher is accepting an inflated chance of Type I errors in order to decrease the chance of Type II errors.

Imagine an experiment in which you wanted to see if various environmental treatments would improve student learning.  In a preliminary study, you might have many treatments, with few observations each, and you want to retain any potentially successful treatments for future study.  For example, you might try playing classical music, improved lighting, complimenting students, and so on, and see if there is any effect on student learning.  You might relax your alpha value to 0.10 or 0.15 in the preliminary study to see what treatments to include in future studies.

On the other hand, in situations where a Type I, false positive, error might be costly in terms of money or people’s health, a lower alpha can be used, perhaps, 0.01 or 0.001.  You can imagine a case in which there is an established treatment for cancer, and a new treatment is being tested.  Because the new treatment is likely to be expensive and to hold people’s lives in the balance, a researcher would want to be very sure that the new treatment is more effective than the established treatment.  In reality, the researchers would not just lower the alpha level, but also look at the effect size, submit the research for peer review, replicate the study, be sure there were no problems with the design of the study or the data collection, and weigh the practical implications.

The 0.05 alpha value is almost dogma

In theory, as a researcher, you would determine the alpha level you feel is appropriate.  That is, the probability of making a Type I error when the null hypothesis is in fact true. 

In reality, though, 0.05 is almost always used in most fields for readers of this book.  Choosing a different alpha value will rarely go without question.  It is best to keep with the 0.05 level unless you have good justification for another value, or are in a discipline where other values are routinely used.

Practical advice

One good practice is to report actual p -values from analyses.  It is fine to also simply say, e.g. “The dependent variable was significantly correlated with variable A ( p < 0.05).”  But I prefer when possible to say, “The dependent variable was significantly correlated with variable A ( p = 0.026).

It is probably best to avoid using terms like “marginally significant” or “borderline significant” for p -values less than 0.10 but greater than 0.05, though you might encounter similar phrases.  It is better to simply report the p -values of tests or effects in straight-forward manner.  If you had cause to include certain model effects or results from other tests, they can be reported as e.g., “Variables correlated with the dependent variable with p < 0.15 were A , B , and C .”

Is the p -value every really true?

Considering some of the examples presented, it may have occurred to the reader to ask if the null hypothesis is ever really true.   For example, in some population of 7 th graders, if we could measure everyone in the population to a high degree of precision, then there must be some difference in height between girls and boys.  This is an important limitation of null hypothesis significance testing.  Often, if we have many observations, even small effects will be reported as significant.  This is one reason why it is important to not rely too heavily on p -values, but to also look at the size of the effect and practical considerations.  In this example, if we sampled many students and the difference in heights was 0.5 cm, even if significant, we might decide that this effect is too small to be of practical importance, especially relative to an average height of 150 cm.  (Here, the difference would be  0.3% of the average height).

Effect sizes and practical importance

Practical importance and statistical significance.

It is important to remember to not let p -values be the only guide for drawing conclusions.  It is equally important to look at the size of the effects you are measuring, as well as take into account other practical considerations like the costs of choosing a certain path of action.

For example, imagine we want to compare the SAT scores of two SAT preparation classes with a t -test.

Class.A = c(1500, 1505, 1505, 1510, 1510, 1510, 1515, 1515, 1520, 1520) Class.B = c(1510, 1515, 1515, 1520, 1520, 1520, 1525, 1525, 1530, 1530) t.test(Class.A, Class.B)

Welch Two Sample t-test t = -3.3968, df = 18, p-value = 0.003214 mean of x mean of y      1511      1521

The p -value is reported as 0.003, so we would consider there to be a significant difference between the two classes ( p < 0.05).

But we have to ask ourselves the practical question, is a difference of 10 points on the SAT large enough for us to care about?  What if enrolling in one class costs significantly more than the other class?  Is it worth the extra money for a difference of 10 points on average?

Sizes of effects

It should be remembered that p -values do not indicate the size of the effect being studied.  It shouldn’t be assumed that a small p -value indicates a large difference between groups, or vice-versa. 

For example, in the SAT example above, the p -value is fairly small, but the size of the effect (difference between classes) in this case is relatively small (10 points, especially small relative to the range of scores students receive on the SAT).

In converse, there could be a relatively large size of the effects, but if there is a lot of variability in the data or the sample size is not large enough, the p -value could be relatively large. 

In this example, the SAT scores differ by 100 points between classes, but because the variability is greater than in the previous example, the p -value is not significant.

Class.C = c(1000, 1100, 1200, 1250, 1300, 1300, 1400, 1400, 1450, 1500) Class.D = c(1100, 1200, 1300, 1350, 1400, 1400, 1500, 1500, 1550, 1600) t.test(Class.C, Class.D)

Welch Two Sample t-test t = -1.4174, df = 18, p-value = 0.1735 mean of x mean of y      1290      1390

boxplot(cbind(Class.C, Class.D))

image

p -values and sample sizes

It should also be remembered that p -values are affected by sample size.   For a given effect size and variability in the data, as the sample size increases, the p -value is likely to decrease.  For large data sets, small effects can result in significant p -values.

As an example, let’s take the data from Class.C and Class.D and double the number of observations for each without changing the distribution of the values in each, and rename them Class.E and Class.F .

Class.E = c(1000, 1100, 1200, 1250, 1300, 1300, 1400, 1400, 1450, 1500,             1000, 1100, 1200, 1250, 1300, 1300, 1400, 1400, 1450, 1500) Class.F = c(1100, 1200, 1300, 1350, 1400, 1400, 1500, 1500, 1550, 1600,             1100, 1200, 1300, 1350, 1400, 1400, 1500, 1500, 1550, 1600) t.test(Class.E, Class.F)

Welch Two Sample t-test t = -2.0594, df = 38, p-value = 0.04636 mean of x mean of y      1290      1390

boxplot(cbind(Class.E, Class.F))

Notice that the p -value is lower for the t -test for Class.E and Class.F than it was for Class.C and Class.D .  Also notice that the means reported in the output are the same, and the box plots would look the same.

Effect size statistics

One way to account for the effect of sample size on our statistical tests is to consider effect size statistics.  These statistics reflect the size of the effect in a standardized way, and are unaffected by sample size.

An appropriate effect size statistic for a t -test is Cohen’s d .  It takes the difference in means between the two groups and divides by the pooled standard deviation of the groups.  Cohen’s d equals zero if the means are the same, and increases to infinity as the difference in means increases relative to the standard deviation.

In the following, note that Cohen’s d is not affected by the sample size difference in the Class.C / Class.D and the Class.E /  Class.F examples.

library(lsr) cohensD(Class.C, Class.D,         method = "raw")

cohensD(Class.E, Class.F,         method = "raw")

Effect size statistics are standardized so that they are not affected by the units of measurements of the data.  This makes them interpretable across different situations, or if the reader is not familiar with the units of measurement in the original data.  A Cohen’s d of 1 suggests that the two means differ by one pooled standard deviation.  A Cohen’s d of 0.5 suggests that the two means differ by one-half the pooled standard deviation.

For example, if we create new variables— Class.G and Class.H —that are the SAT scores from the previous example expressed as a proportion of a 1600 score, Cohen’s d will be the same as in the previous example.

Class.G = Class.E / 1600 Class.H = Class.F / 1600 Class.G Class.H cohensD(Class.G, Class.H,         method="raw")

Good practices for statistical analyses

Statistics is not like a trial.

When analyzing data, the analyst should not approach the task as would a lawyer for the prosecution.  That is, the analyst should not be searching for significant effects and tests, but should instead be like an independent investigator using lines of evidence to find out what is most likely to true given the data, graphical analysis, and statistical analysis available.

The problem of multiple p -values

One concept that will be in important in the following discussion is that when there are multiple tests producing multiple p -values, that there is an inflation of the Type I error rate.  That is, there is a higher chance of making false-positive errors.

This simply follows mathematically from the definition of alpha .  If we allow a probability of 0.05, or 5% chance, of making a Type I error for any one test, as we do more and more tests, the chances that at least one of them having a false positive becomes greater and greater.

p -value adjustment

One way we deal with the problem of multiple p -values in statistical analyses is to adjust p -values when we do a series of tests together (for example, if we are comparing the means of multiple groups).

Don’t use Bonferroni adjustments

There are various p -value adjustments available in R.  In some cases, we will use FDR, which stands for false discovery rate , and in R is an alias for the Benjamini and Hochberg method.  There are also cases in which we’ll use Tukey range adjustment to correct for the family-wise error rate. 

Unfortunately, students in analysis of experiments courses often learn to use Bonferroni adjustment for p -values.  This method is simple to do with hand calculations, but is excessively conservative in most situations, and, in my opinion, antiquated.

There are other p -value adjustment methods, and the choice of which one to use is dictated either by which are common in your field of study, or by doing enough reading to understand which are statistically most appropriate for your application.

Preplanned tests

The statistical tests covered in this book assume that tests are preplanned for their p -values to be accurate.  That is, in theory, you set out an experiment, collect the data as planned, and then say “I’m going to analyze it with kind of model and do these post-hoc tests afterwards”, and report these results, and that’s all you would do.

Some authors emphasize this idea of preplanned tests.  In contrast is an exploratory data analysis approach that relies upon examining the data with plots and using simple tests like correlation tests to suggest what statistical analysis makes sense.

If an experiment is set out in a specific design, then usually it is appropriate to use the analysis suggested by this design.

p -value hacking

It is important when approaching data from an exploratory approach, to avoid committing p -value hacking.  Imagine the case in which the researcher collects many different measurements across a range of subjects.  The researcher might be tempted to simply try different tests and models to relate one variable to another, for all the variables.  He might continue to do this until he found a test with a significant p -value.

But this would be a form of p -value hacking.

Because an alpha value of 0.05 allows us to make a false-positive error five percent of the time, finding one p -value below 0.05 after several successive tests may simply be due to chance.

Some forms of p -value hacking are more egregious.  For example, if one were to collect some data, run a test, and then continue to collect data and run tests iteratively until a significant p -value is found.

Publication bias

A related issue in science is that there is a bias to publish, or to report, only significant results.  This can also lead to an inflation of the false-positive rate.  As a hypothetical example, imagine if there are currently 20 similar studies being conducted testing a similar effect—let’s say the effect of glucosamine supplements on joint pain.  If 19 of those studies found no effect and so were discarded, but one study found an effect using an alpha of 0.05, and was published, is this really any support that glucosamine supplements decrease joint pain?

Clarification of terms and reporting on assignments

"statistically significant".

In the context of this book, the term "significant" means "statistically significant". 

Whenever the decision rule finds that p < alpha , the difference in groups, the association, or the correlation under consideration is then considered "statistically significant" or "significant". 

No effect size or practical considerations enter into determining whether an effect is “significant” or not.  The only exception is that test assumptions and requirements for appropriate data must also be met in order for the p -value to be valid.

What you need to consider :

 •  The null hypothesis

 •  p , alpha , and the decision rule,

 •  Your result.  That is, whether the difference in groups, the association, or the correlation is significant or not.

What you should report on your assignments:

•  The p -value

•  The conclusion, e.g. "There was a significant difference in the mean heights of boys and girls in the class." It is best to preface this with the "reject" or "fail to reject" language concerning your decision about the null hypothesis.

“Size of the effect” / “effect size”

In the context of this book, I use the term "size of the effect" to suggest the use of summary statistics to indicate how large an effect is.  This may be, for example the difference in two medians.  I try reserve the term “effect size” to refer to the use of effect size statistics. This distinction isn’t necessarily common.

Usually you will consider an effect in relation to the magnitude of measurements.  That is, you might look at the difference in medians as a percent of the median of one group or of the global median.  Or, you might look at the difference in medians in relation to the range of answers.  For example, a one-point difference on a 5-point Likert item.  Counts might be expressed as proportions of totals or subsets.

What you should report on assignments :

 •  The size of the effect.  That is, the difference in medians or means, the difference in counts, or the  proportions of counts among groups.

 •  Where appropriate, the size of the effect expressed as a percentage or proportion.

•  If there is an effect size statistic—such as r , epsilon -squared, phi , Cramér's V , or Cohen's d —:  report this and its interpretation (small, medium, large), and incorporate this into your conclusion.

"Practical" / "Practical importance"

If there is a significant result, the question of practical importance asks if the difference or association is large enough to matter in the real world.

If there is no significant result, the question of practical importance asks if the a difference or association is large enough to warrant another look, for example by running another test with a larger sample size or that controls variability in observations better.

•  Your conclusion as to whether this effect is large enough to be important in the real world.

•  The context, explanation, or support to justify your conclusion.

•  In some cases you might include considerations that aren't included in the data presented.  Examples might include the cost of one treatment over another, including time investment, or whether there is a large risk in selecting one treatment over another (e.g., if people's lives are on the line).

A few of xkcd comics

Significant.

xkcd.com/882/

Null hypothesis

xkcd.com/892/

xkcd.com/1478/

Experiments, sampling, and causation

Types of experimental designs, experimental designs.

A true experimental design assigns treatments in a systematic manner.  The experimenter must be able to manipulate the experimental treatments and assign them to subjects.  Since treatments are randomly assigned to subjects, a causal inference can be made for significant results.  That is, we can say that the variation in the dependent variable is caused by the variation in the independent variable.

For interval/ratio data, traditional experimental designs can be analyzed with specific parametric models, assuming other model assumptions are met.  These traditional experimental designs include:

•  Completely random design

•  Randomized complete block design

•  Factorial

•  Split-plot

•  Latin square

Quasi-experiment designs

Often a researcher cannot assign treatments to individual experimental units, but can assign treatments to groups.  For example, if students are in a specific grade or class, it would not be practical to randomly assign students to grades or classes.  But different classes could receive different treatments (such as different curricula).  Causality can be inferred cautiously if treatments are randomly assigned and there is some understanding of the factors that affect the outcome.

Observational studies

In observational studies, the independent variables are not manipulated, and no treatments are assigned.  Surveys are often like this, as are studies of natural systems without experimental manipulation.  Statistical analysis can reveal the relationships among variables, but causality cannot be inferred.  This is because there may be other unstudied variables that affect the measured variables in the study.

Good sampling practices are critical for producing good data.  In general, samples need to be collected in a random fashion so that bias is avoided.

In survey data, bias is often introduced by a self-selection bias.  For example, internet or telephone surveys include only those who respond to these requests.  Might there be some relevant difference in the variables of interest between those who respond to such requests and the general population being surveyed?  Or bias could be introduced by the researcher selecting some subset of potential subjects, for example only surveying a 4-H program with particularly cooperative students and ignoring other clubs.  This is sometimes called “convenience sampling”.

In election forecasting, good pollsters need to account for selection bias and other biases in the survey process.  For example, if a survey is done by landline telephone, those being surveyed are more likely to be older than the general population of voters, and so likely to have a bias in their voting patterns.

Plan ahead and be consistent

It is sometimes necessary to change experimental conditions during the course of an experiment.  Equipment might fail, or unusual weather may prevent making meaningful measurements.

But in general, it is much better to plan ahead and be consistent with measurements. 

Consistency

People sometimes have the tendency to change measurement frequency or experimental treatments during the course of a study.  This inevitably causes headaches in trying to analyze data, and makes writing up the results messy.  Try to avoid this.

Controls and checks

If you are testing an experimental treatment, include a check treatment that almost certainly will have an effect and a control treatment that almost certainly won’t.  A control treatment will receive no treatment and a check treatment will receive a treatment known to be successful.  In an educational setting, perhaps a control group receives no instruction on the topic but on another topic, and the check group will receive standard instruction.

Including checks and controls helps with the analysis in a practical sense, since they serve as standard treatments against which to compare the experimental treatments.  In the case where the experimental treatments have similar effects, controls and checks allow you say, for example, “Means for the all experimental treatments were similar, but were higher than the mean for control, and lower than the mean for check treatment.”

Include alternate measurements

It often happens that measuring equipment fails or that a certain measurement doesn’t produce the expected results.  It is therefore helpful to include measurements of several variables that can capture the potential effects.  Perhaps test scores of students won’t show an effect, but a self-assessment question on how much students learned will.

Include covariates

Including additional independent variables that might affect the dependent variable is often helpful in an analysis.  In an educational setting, you might assess student age, grade, school, town, background level in the subject, or how well they are feeling that day.

The effects of covariates on the dependent variable may be of interest in itself.  But also, including co-variates in an analysis can better model the data, sometimes making treatment effects more clear or making a model better meet model assumptions.

Optional discussion: Alternative methods to the Null Hypothesis Significance Test

The nhst controversy.

Particularly in the fields of psychology and education, there has been much criticism of the null hypothesis significance test approach.  From my reading, the main complaints against NHST tend to be:

•  Students and researchers don’t really understand the meaning of p -values.

•  p -values don’t include important information like confidence intervals or parameter estimates.

•  p -values have properties that may be misleading, for example that they do not represent effect size, and that they change with sample size.

•  We often treat an alpha of 0.05 as a magical cutoff value.

Personally, I don’t find these to be very convincing arguments against the NHST approach. 

The first complaint is in some sense pedantic:  Like so many things, students and researchers learn the definition of p -values at some point and then eventually forget.  This doesn’t seem to impact the usefulness of the approach.

The second point has weight only if researchers use only p -values to draw conclusions from statistical tests.  As this book points out, one should always consider the size of the effects and practical considerations of the effects, as well present finding in table or graphical form, including confidence intervals or measures of dispersion.  There is no reason why parameter estimates, goodness-of-fit statistics, and confidence intervals can’t be included when a NHST approach is followed.

The properties in the third point also don’t count much as criticism if one is using p -values correctly.  One should understand that it is possible to have a small effect size and a small p -value, and vice-versa.  This is not a problem, because p -values and effect sizes are two different concepts.  We shouldn’t expect them to be the same.  The fact that p -values change with sample size is also in no way problematic to me.  It makes sense that when there is a small effect size or a lot of variability in the data that we need many samples to conclude the effect is likely to be real.

(One case where I think the considerations in the preceding point are commonly problematic is when people use statistical tests to check for the normality or homogeneity of data or model residuals.  As sample size increases, these tests are better able to detect small deviations from normality or homoscedasticity.  Too many people use them and think their model is inappropriate because the test can detect a small effect size, that is, a small deviation from normality or homoscedasticity).

The fourth point is a good one.  It doesn’t make much sense to come to one conclusion if our p -value is 0.049 and the opposite conclusion if our p -value is 0.051.  But I think this can be ameliorated by reporting the actual p -values from analyses, and relying less on p -values to evaluate results.

Overall it seems to me that these complaints condemn poor practices that the authors observe: not reporting the size of effects in some manner; not including confidence intervals or measures of dispersion; basing conclusions solely on p -values; and not including important results like parameter estimates and goodness-of-fit statistics.

Alternatives to the NHST approach

Estimates and confidence intervals.

One approach to determining statistical significance is to use estimates and confidence intervals.  Estimates could be statistics like means, medians, proportions, or other calculated statistics.  This approach can be very straightforward, easy for readers to understand, and easy to present clearly.

Bayesian approach

The most popular competitor to the NHST approach is Bayesian inference.  Bayesian inference has the advantage of calculating the probability of the hypothesis given the data , which is what we thought we should be doing in the “Wait, does this make any sense?” section above.  Essentially it takes prior knowledge about the distribution of the parameters of interest for a population and adds the information from the measured data to reassess some hypothesis related to the parameters of interest.  If the reader will excuse the vagueness of this description, it makes intuitive sense.  We start with what we suspect to be the case, and then use new data to assess our hypothesis.

One disadvantage of the Bayesian approach is that it is not obvious in most cases what could be used for legitimate prior information.  A second disadvantage is that conducting Bayesian analysis is not as straightforward as the tests presented in this book.

References and further reading

[Video]  “Understanding statistical inference” from Statistics Learning Center (Dr. Nic). 2015. www.youtube.com/watch?v=tFRXsngz4UQ .

[Video]  “Hypothesis tests, p-value” from Statistics Learning Center (Dr. Nic). 2011. www.youtube.com/watch?v=0zZYBALbZgg .

[Video]   “Understanding the p-value” from Statistics Learning Center (Dr. Nic). 2011.

www.youtube.com/watch?v=eyknGvncKLw .

[Video]  “Important statistical concepts: significance, strength, association, causation” from Statistics Learning Center (Dr. Nic). 2012. www.youtube.com/watch?v=FG7xnWmZlPE .

“Understanding statistical inference” from Dr. Nic. 2015. Learn and Teach Statistics & Operations Research. creativemaths.net/blog/understanding-statistical-inference/ .

“Basic concepts of hypothesis testing” in McDonald, J.H. 2014. Handbook of Biological Statistics . www.biostathandbook.com/hypothesistesting.html .

“Hypothesis testing” , section 4.3, in Diez, D.M., C.D. Barr , and M. Çetinkaya-Rundel. 2012. OpenIntro Statistics , 2nd ed. www.openintro.org/ .

“Hypothesis Testing with One Sample”, sections 9.1–9.2 in Openstax. 2013. Introductory Statistics . openstax.org/textbooks/introductory-statistics .

"Proving causation" from Dr. Nic. 2013. Learn and Teach Statistics & Operations Research. creativemaths.net/blog/proving-causation/ .

[Video]   “Variation and Sampling Error” from Statistics Learning Center (Dr. Nic). 2014. www.youtube.com/watch?v=y3A0lUkpAko .

[Video]   “Sampling: Simple Random, Convenience, systematic, cluster, stratified” from Statistics Learning Center (Dr. Nic). 2012. www.youtube.com/watch?v=be9e-Q-jC-0 .

“Confounding variables” in McDonald, J.H. 2014. Handbook of Biological Statistics . www.biostathandbook.com/confounding.html .

“Overview of data collection principles” , section 1.3, in Diez, D.M., C.D. Barr , and M. Çetinkaya-Rundel. 2012. OpenIntro Statistics , 2nd ed. www.openintro.org/ .

“Observational studies and sampling strategies” , section 1.4, in Diez, D.M., C.D. Barr , and M. Çetinkaya-Rundel. 2012. OpenIntro Statistics , 2nd ed. www.openintro.org/ .

“Experiments” , section 1.5, in Diez, D.M., C.D. Barr , and M. Çetinkaya-Rundel. 2012. OpenIntro Statistics , 2nd ed. www.openintro.org/ .

What Is P-Value?

Understanding p-value.

  • P-Value in Hypothesis Testing

The Bottom Line

  • Corporate Finance
  • Financial Analysis

P-Value: What It Is, How to Calculate It, and Why It Matters

how to find p value hypothesis test

Yarilet Perez is an experienced multimedia journalist and fact-checker with a Master of Science in Journalism. She has worked in multiple cities covering breaking news, politics, education, and more. Her expertise is in personal finance and investing, and real estate.

how to find p value hypothesis test

In statistics, a p-value is defined as a number that indicates how likely you are to obtain a value that is at least equal to or more than the actual observation if the null hypothesis is correct.

The p-value serves as an alternative to rejection points to provide the smallest level of significance at which the null hypothesis would be rejected. A smaller p-value means stronger evidence in favor of the alternative hypothesis.

P-value is often used to promote credibility for studies or reports by government agencies. For example, the U.S. Census Bureau stipulates that any analysis with a p-value greater than 0.10 must be accompanied by a statement that the difference is not statistically different from zero. The Census Bureau also has standards in place stipulating which p-values are acceptable for various publications.

Key Takeaways

  • A p-value is a statistical measurement used to validate a hypothesis against observed data.
  • A p-value measures the probability of obtaining the observed results, assuming that the null hypothesis is true.
  • The lower the p-value, the greater the statistical significance of the observed difference.
  • A p-value of 0.05 or lower is generally considered statistically significant.
  • P-value can serve as an alternative to—or in addition to—preselected confidence levels for hypothesis testing.

Jessica Olah / Investopedia

P-values are usually found using p-value tables or spreadsheets/statistical software. These calculations are based on the assumed or known probability distribution of the specific statistic tested. The sample size, which determines the reliability of the observed data, directly influences the accuracy of the p-value calculation. he p-value approach to hypothesis testing uses the calculated he p-value approach to hypothesis testing uses the calculated P-values are calculated from the deviation between the observed value and a chosen reference value, given the probability distribution of the statistic, with a greater difference between the two values corresponding to a lower p-value.

Mathematically, the p-value is calculated using integral calculus from the area under the probability distribution curve for all values of statistics that are at least as far from the reference value as the observed value is, relative to the total area under the probability distribution curve. Standard deviations, which quantify the dispersion of data points from the mean, are instrumental in this calculation.

The calculation for a p-value varies based on the type of test performed. The three test types describe the location on the probability distribution curve: lower-tailed test, upper-tailed test, or two-tailed test . In each case, the degrees of freedom play a crucial role in determining the shape of the distribution and thus, the calculation of the p-value.

In a nutshell, the greater the difference between two observed values, the less likely it is that the difference is due to simple random chance, and this is reflected by a lower p-value.

The P-Value Approach to Hypothesis Testing

The p-value approach to hypothesis testing uses the calculated probability to determine whether there is evidence to reject the null hypothesis. This determination relies heavily on the test statistic, which summarizes the information from the sample relevant to the hypothesis being tested. The null hypothesis, also known as the conjecture, is the initial claim about a population (or data-generating process). The alternative hypothesis states whether the population parameter differs from the value of the population parameter stated in the conjecture.

In practice, the significance level is stated in advance to determine how small the p-value must be to reject the null hypothesis. Because different researchers use different levels of significance when examining a question, a reader may sometimes have difficulty comparing results from two different tests. P-values provide a solution to this problem.

Even a low p-value is not necessarily proof of statistical significance, since there is still a possibility that the observed data are the result of chance. Only repeated experiments or studies can confirm if a relationship is statistically significant.

For example, suppose a study comparing returns from two particular assets was undertaken by different researchers who used the same data but different significance levels. The researchers might come to opposite conclusions regarding whether the assets differ.

If one researcher used a confidence level of 90% and the other required a confidence level of 95% to reject the null hypothesis, and if the p-value of the observed difference between the two returns was 0.08 (corresponding to a confidence level of 92%), then the first researcher would find that the two assets have a difference that is statistically significant , while the second would find no statistically significant difference between the returns.

To avoid this problem, the researchers could report the p-value of the hypothesis test and allow readers to interpret the statistical significance themselves. This is called a p-value approach to hypothesis testing. Independent observers could note the p-value and decide for themselves whether that represents a statistically significant difference or not.

Example of P-Value

An investor claims that their investment portfolio’s performance is equivalent to that of the Standard & Poor’s (S&P) 500 Index . To determine this, the investor conducts a two-tailed test.

The null hypothesis states that the portfolio’s returns are equivalent to the S&P 500’s returns over a specified period, while the alternative hypothesis states that the portfolio’s returns and the S&P 500’s returns are not equivalent—if the investor conducted a one-tailed test , the alternative hypothesis would state that the portfolio’s returns are either less than or greater than the S&P 500’s returns.

The p-value hypothesis test does not necessarily make use of a preselected confidence level at which the investor should reset the null hypothesis that the returns are equivalent. Instead, it provides a measure of how much evidence there is to reject the null hypothesis. The smaller the p-value, the greater the evidence against the null hypothesis.

Thus, if the investor finds that the p-value is 0.001, there is strong evidence against the null hypothesis, and the investor can confidently conclude that the portfolio’s returns and the S&P 500’s returns are not equivalent.

Although this does not provide an exact threshold as to when the investor should accept or reject the null hypothesis, it does have another very practical advantage. P-value hypothesis testing offers a direct way to compare the relative confidence that the investor can have when choosing among multiple different types of investments or portfolios relative to a benchmark such as the S&P 500.

For example, for two portfolios, A and B, whose performance differs from the S&P 500 with p-values of 0.10 and 0.01, respectively, the investor can be much more confident that portfolio B, with a lower p-value, will actually show consistently different results.

Is a 0.05 P-Value Significant?

A p-value less than 0.05 is typically considered to be statistically significant, in which case the null hypothesis should be rejected. A p-value greater than 0.05 means that deviation from the null hypothesis is not statistically significant, and the null hypothesis is not rejected.

What Does a P-Value of 0.001 Mean?

A p-value of 0.001 indicates that if the null hypothesis tested were indeed true, then there would be a one-in-1,000 chance of observing results at least as extreme. This leads the observer to reject the null hypothesis because either a highly rare data result has been observed or the null hypothesis is incorrect.

How Can You Use P-Value to Compare 2 Different Results of a Hypothesis Test?

If you have two different results, one with a p-value of 0.04 and one with a p-value of 0.06, the result with a p-value of 0.04 will be considered more statistically significant than the p-value of 0.06. Beyond this simplified example, you could compare a 0.04 p-value to a 0.001 p-value. Both are statistically significant, but the 0.001 example provides an even stronger case against the null hypothesis than the 0.04.

The p-value is used to measure the significance of observational data. When researchers identify an apparent relationship between two variables, there is always a possibility that this correlation might be a coincidence. A p-value calculation helps determine if the observed relationship could arise as a result of chance.

U.S. Census Bureau. “ Statistical Quality Standard E1: Analyzing Data .”

how to find p value hypothesis test

  • Terms of Service
  • Editorial Policy
  • Privacy Policy
  • Your Privacy Choices

P-value Calculator

Statistical significance calculator to easily calculate the p-value and determine whether the difference between two proportions or means (independent groups) is statistically significant. T-test calculator & z-test calculator to compute the Z-score or T-score for inference about absolute or relative difference (percentage change, percent effect). Suitable for analysis of simple A/B tests.

Related calculators

  • Using the p-value calculator
  • What is "p-value" and "significance level"
  • P-value formula
  • Why do we need a p-value?
  • How to interpret a statistically significant result / low p-value
  • P-value and significance for relative difference in means or proportions

    Using the p-value calculator

This statistical significance calculator allows you to perform a post-hoc statistical evaluation of a set of data when the outcome of interest is difference of two proportions (binomial data, e.g. conversion rate or event rate) or difference of two means (continuous data, e.g. height, weight, speed, time, revenue, etc.). You can use a Z-test (recommended) or a T-test to find the observed significance level (p-value statistic). The Student's T-test is recommended mostly for very small sample sizes, e.g. n < 30. In order to avoid type I error inflation which might occur with unequal variances the calculator automatically applies the Welch's T-test instead of Student's T-test if the sample sizes differ significantly or if one of them is less than 30 and the sampling ratio is different than one.

If entering proportions data, you need to know the sample sizes of the two groups as well as the number or rate of events. These can be entered as proportions (e.g. 0.10), percentages (e.g. 10%) or just raw numbers of events (e.g. 50).

If entering means data, simply copy/paste or type in the raw data, each observation separated by comma, space, new line or tab. Copy-pasting from a Google or Excel spreadsheet works fine.

The p-value calculator will output : p-value, significance level, T-score or Z-score (depending on the choice of statistical hypothesis test), degrees of freedom, and the observed difference. For means data it will also output the sample sizes, means, and pooled standard error of the mean. The p-value is for a one-sided hypothesis (one-tailed test), allowing you to infer the direction of the effect (more on one vs. two-tailed tests ). However, the probability value for the two-sided hypothesis (two-tailed p-value) is also calculated and displayed, although it should see little to no practical applications.

Warning: You must have fixed the sample size / stopping time of your experiment in advance, otherwise you will be guilty of optional stopping (fishing for significance) which will inflate the type I error of the test rendering the statistical significance level unusable. Also, you should not use this significance calculator for comparisons of more than two means or proportions, or for comparisons of two groups based on more than one metric. If a test involves more than one treatment group or more than one outcome variable you need a more advanced tool which corrects for multiple comparisons and multiple testing. This statistical calculator might help.

    What is "p-value" and "significance level"

The p-value is a heavily used test statistic that quantifies the uncertainty of a given measurement, usually as a part of an experiment, medical trial, as well as in observational studies. By definition, it is inseparable from inference through a Null-Hypothesis Statistical Test (NHST) . In it we pose a null hypothesis reflecting the currently established theory or a model of the world we don't want to dismiss without solid evidence (the tested hypothesis), and an alternative hypothesis: an alternative model of the world. For example, the statistical null hypothesis could be that exposure to ultraviolet light for prolonged periods of time has positive or neutral effects regarding developing skin cancer, while the alternative hypothesis can be that it has a negative effect on development of skin cancer.

In this framework a p-value is defined as the probability of observing the result which was observed, or a more extreme one, assuming the null hypothesis is true . In notation this is expressed as:

p(x 0 ) = Pr(d(X) > d(x 0 ); H 0 )

where x 0 is the observed data (x 1 ,x 2 ...x n ), d is a special function (statistic, e.g. calculating a Z-score), X is a random sample (X 1 ,X 2 ...X n ) from the sampling distribution of the null hypothesis. This equation is used in this p-value calculator and can be visualized as such:

p value statistical significance explained

Therefore the p-value expresses the probability of committing a type I error : rejecting the null hypothesis if it is in fact true. See below for a full proper interpretation of the p-value statistic .

Another way to think of the p-value is as a more user-friendly expression of how many standard deviations away from the normal a given observation is. For example, in a one-tailed test of significance for a normally-distributed variable like the difference of two means, a result which is 1.6448 standard deviations away (1.6448σ) results in a p-value of 0.05.

The term "statistical significance" or "significance level" is often used in conjunction to the p-value, either to say that a result is "statistically significant", which has a specific meaning in statistical inference ( see interpretation below ), or to refer to the percentage representation the level of significance: (1 - p value), e.g. a p-value of 0.05 is equivalent to significance level of 95% (1 - 0.05 * 100). A significance level can also be expressed as a T-score or Z-score, e.g. a result would be considered significant only if the Z-score is in the critical region above 1.96 (equivalent to a p-value of 0.025).

    P-value formula

There are different ways to arrive at a p-value depending on the assumption about the underlying distribution. This tool supports two such distributions: the Student's T-distribution and the normal Z-distribution (Gaussian) resulting in a T test and a Z test, respectively.

In both cases, to find the p-value start by estimating the variance and standard deviation, then derive the standard error of the mean, after which a standard score is found using the formula [2] :

test statistic

X (read "X bar") is the arithmetic mean of the population baseline or the control, μ 0 is the observed mean / treatment group mean, while σ x is the standard error of the mean (SEM, or standard deviation of the error of the mean).

When calculating a p-value using the Z-distribution the formula is Φ(Z) or Φ(-Z) for lower and upper-tailed tests, respectively. Φ is the standard normal cumulative distribution function and a Z-score is computed. In this mode the tool functions as a Z score calculator.

When using the T-distribution the formula is T n (Z) or T n (-Z) for lower and upper-tailed tests, respectively. T n is the cumulative distribution function for a T-distribution with n degrees of freedom and so a T-score is computed. Selecting this mode makes the tool behave as a T test calculator.

The population standard deviation is often unknown and is thus estimated from the samples, usually from the pooled samples variance. Knowing or estimating the standard deviation is a prerequisite for using a significance calculator. Note that differences in means or proportions are normally distributed according to the Central Limit Theorem (CLT) hence a Z-score is the relevant statistic for such a test.

    Why do we need a p-value?

If you are in the sciences, it is often a requirement by scientific journals. If you apply in business experiments (e.g. A/B testing) it is reported alongside confidence intervals and other estimates. However, what is the utility of p-values and by extension that of significance levels?

First, let us define the problem the p-value is intended to solve. People need to share information about the evidential strength of data that can be easily understood and easily compared between experiments. The picture below represents, albeit imperfectly, the results of two simple experiments, each ending up with the control with 10% event rate treatment group at 12% event rate.

why p value and significance

However, it is obvious that the evidential input of the data is not the same, demonstrating that communicating just the observed proportions or their difference (effect size) is not enough to estimate and communicate the evidential strength of the experiment. In order to fully describe the evidence and associated uncertainty , several statistics need to be communicated, for example, the sample size, sample proportions and the shape of the error distribution. Their interaction is not trivial to understand, so communicating them separately makes it very difficult for one to grasp what information is present in the data. What would you infer if told that the observed proportions are 0.1 and 0.12 (e.g. conversion rate of 10% and 12%), the sample sizes are 10,000 users each, and the error distribution is binomial?

Instead of communicating several statistics, a single statistic was developed that communicates all the necessary information in one piece: the p-value . A p-value was first derived in the late 18-th century by Pierre-Simon Laplace, when he observed data about a million births that showed an excess of boys, compared to girls. Using the calculation of significance he argued that the effect was real but unexplained at the time. We know this now to be true and there are several explanations for the phenomena coming from evolutionary biology. Statistical significance calculations were formally introduced in the early 20-th century by Pearson and popularized by Sir Ronald Fisher in his work, most notably "The Design of Experiments" (1935) [1] in which p-values were featured extensively. In business settings significance levels and p-values see widespread use in process control and various business experiments (such as online A/B tests, i.e. as part of conversion rate optimization, marketing optimization, etc.).

    How to interpret a statistically significant result / low p-value

Saying that a result is statistically significant means that the p-value is below the evidential threshold (significance level) decided for the statistical test before it was conducted. For example, if observing something which would only happen 1 out of 20 times if the null hypothesis is true is considered sufficient evidence to reject the null hypothesis, the threshold will be 0.05. In such case, observing a p-value of 0.025 would mean that the result is interpreted as statistically significant.

But what does that really mean? What inference can we make from seeing a result which was quite improbable if the null was true?

Observing any given low p-value can mean one of three things [3] :

  • There is a true effect from the tested treatment or intervention.
  • There is no true effect, but we happened to observe a rare outcome. The lower the p-value, the rarer (less likely, less probable) the outcome.
  • The statistical model is invalid (does not reflect reality).

Obviously, one can't simply jump to conclusion 1.) and claim it with one hundred percent certainty, as this would go against the whole idea of the p-value and statistical significance. In order to use p-values as a part of a decision process external factors part of the experimental design process need to be considered which includes deciding on the significance level (threshold), sample size and power (power analysis), and the expected effect size, among other things. If you are happy going forward with this much (or this little) uncertainty as is indicated by the p-value calculation suggests, then you have some quantifiable guarantees related to the effect and future performance of whatever you are testing, e.g. the efficacy of a vaccine or the conversion rate of an online shopping cart.

Note that it is incorrect to state that a Z-score or a p-value obtained from any statistical significance calculator tells how likely it is that the observation is "due to chance" or conversely - how unlikely it is to observe such an outcome due to "chance alone". P-values are calculated under specified statistical models hence 'chance' can be used only in reference to that specific data generating mechanism and has a technical meaning quite different from the colloquial one. For a deeper take on the p-value meaning and interpretation, including common misinterpretations, see: definition and interpretation of the p-value in statistics .

    P-value and significance for relative difference in means or proportions

When comparing two independent groups and the variable of interest is the relative (a.k.a. relative change, relative difference, percent change, percentage difference), as opposed to the absolute difference between the two means or proportions, the standard deviation of the variable is different which compels a different way of calculating p-values [5] . The need for a different statistical test is due to the fact that in calculating relative difference involves performing an additional division by a random variable: the event rate of the control during the experiment which adds more variance to the estimation and the resulting statistical significance is usually higher (the result will be less statistically significant). What this means is that p-values from a statistical hypothesis test for absolute difference in means would nominally meet the significance level, but they will be inadequate given the statistical inference for the hypothesis at hand.

In simulations I performed the difference in p-values was about 50% of nominal: a 0.05 p-value for absolute difference corresponded to probability of about 0.075 of observing the relative difference corresponding to the observed absolute difference. Therefore, if you are using p-values calculated for absolute difference when making an inference about percentage difference, you are likely reporting error rates which are about 50% of the actual, thus significantly overstating the statistical significance of your results and underestimating the uncertainty attached to them.

In short - switching from absolute to relative difference requires a different statistical hypothesis test. With this calculator you can avoid the mistake of using the wrong test simply by indicating the inference you want to make.

    References

1 Fisher R.A. (1935) – "The Design of Experiments", Edinburgh: Oliver & Boyd

2 Mayo D.G., Spanos A. (2010) – "Error Statistics", in P. S. Bandyopadhyay & M. R. Forster (Eds.), Philosophy of Statistics, (7, 152–198). Handbook of the Philosophy of Science . The Netherlands: Elsevier.

3 Georgiev G.Z. (2017) "Statistical Significance in A/B Testing – a Complete Guide", [online] https://blog.analytics-toolkit.com/2017/statistical-significance-ab-testing-complete-guide/ (accessed Apr 27, 2018)

4 Mayo D.G., Spanos A. (2006) – "Severe Testing as a Basic Concept in a Neyman–Pearson Philosophy of Induction", British Society for the Philosophy of Science , 57:323-357

5 Georgiev G.Z. (2018) "Confidence Intervals & P-values for Percent Change / Relative Difference", [online] https://blog.analytics-toolkit.com/2018/confidence-intervals-p-values-percent-change-relative-difference/ (accessed May 20, 2018)

Cite this calculator & page

If you'd like to cite this online calculator resource and information as provided on the page, you can use the following citation: Georgiev G.Z., "P-value Calculator" , [online] Available at: https://www.gigacalculator.com/calculators/p-value-significance-calculator.php URL [Accessed Date: 27 May, 2024].

Our statistical calculators have been featured in scientific papers and articles published in high-profile science journals by:

springer

The author of this tool

Georgi Z. Georgiev

     Statistical calculators

t-test Calculator

Table of contents

Welcome to our t-test calculator! Here you can not only easily perform one-sample t-tests , but also two-sample t-tests , as well as paired t-tests .

Do you prefer to find the p-value from t-test, or would you rather find the t-test critical values? Well, this t-test calculator can do both! 😊

What does a t-test tell you? Take a look at the text below, where we explain what actually gets tested when various types of t-tests are performed. Also, we explain when to use t-tests (in particular, whether to use the z-test vs. t-test) and what assumptions your data should satisfy for the results of a t-test to be valid. If you've ever wanted to know how to do a t-test by hand, we provide the necessary t-test formula, as well as tell you how to determine the number of degrees of freedom in a t-test.

When to use a t-test?

A t-test is one of the most popular statistical tests for location , i.e., it deals with the population(s) mean value(s).

There are different types of t-tests that you can perform:

  • A one-sample t-test;
  • A two-sample t-test; and
  • A paired t-test.

In the next section , we explain when to use which. Remember that a t-test can only be used for one or two groups . If you need to compare three (or more) means, use the analysis of variance ( ANOVA ) method.

The t-test is a parametric test, meaning that your data has to fulfill some assumptions :

  • The data points are independent; AND
  • The data, at least approximately, follow a normal distribution .

If your sample doesn't fit these assumptions, you can resort to nonparametric alternatives. Visit our Mann–Whitney U test calculator or the Wilcoxon rank-sum test calculator to learn more. Other possibilities include the Wilcoxon signed-rank test or the sign test.

Which t-test?

Your choice of t-test depends on whether you are studying one group or two groups:

One sample t-test

Choose the one-sample t-test to check if the mean of a population is equal to some pre-set hypothesized value .

The average volume of a drink sold in 0.33 l cans — is it really equal to 330 ml?

The average weight of people from a specific city — is it different from the national average?

Two-sample t-test

Choose the two-sample t-test to check if the difference between the means of two populations is equal to some pre-determined value when the two samples have been chosen independently of each other.

In particular, you can use this test to check whether the two groups are different from one another .

The average difference in weight gain in two groups of people: one group was on a high-carb diet and the other on a high-fat diet.

The average difference in the results of a math test from students at two different universities.

This test is sometimes referred to as an independent samples t-test , or an unpaired samples t-test .

Paired t-test

A paired t-test is used to investigate the change in the mean of a population before and after some experimental intervention , based on a paired sample, i.e., when each subject has been measured twice: before and after treatment.

In particular, you can use this test to check whether, on average, the treatment has had any effect on the population .

The change in student test performance before and after taking a course.

The change in blood pressure in patients before and after administering some drug.

How to do a t-test?

So, you've decided which t-test to perform. These next steps will tell you how to calculate the p-value from t-test or its critical values, and then which decision to make about the null hypothesis.

Decide on the alternative hypothesis :

Use a two-tailed t-test if you only care whether the population's mean (or, in the case of two populations, the difference between the populations' means) agrees or disagrees with the pre-set value.

Use a one-tailed t-test if you want to test whether this mean (or difference in means) is greater/less than the pre-set value.

Compute your T-score value :

Formulas for the test statistic in t-tests include the sample size , as well as its mean and standard deviation . The exact formula depends on the t-test type — check the sections dedicated to each particular test for more details.

Determine the degrees of freedom for the t-test:

The degrees of freedom are the number of observations in a sample that are free to vary as we estimate statistical parameters. In the simplest case, the number of degrees of freedom equals your sample size minus the number of parameters you need to estimate . Again, the exact formula depends on the t-test you want to perform — check the sections below for details.

The degrees of freedom are essential, as they determine the distribution followed by your T-score (under the null hypothesis). If there are d degrees of freedom, then the distribution of the test statistics is the t-Student distribution with d degrees of freedom . This distribution has a shape similar to N(0,1) (bell-shaped and symmetric) but has heavier tails . If the number of degrees of freedom is large (>30), which generically happens for large samples, the t-Student distribution is practically indistinguishable from N(0,1).

💡 The t-Student distribution owes its name to William Sealy Gosset, who, in 1908, published his paper on the t-test under the pseudonym "Student". Gosset worked at the famous Guinness Brewery in Dublin, Ireland, and devised the t-test as an economical way to monitor the quality of beer. Cheers! 🍺🍺🍺

p-value from t-test

Recall that the p-value is the probability (calculated under the assumption that the null hypothesis is true) that the test statistic will produce values at least as extreme as the T-score produced for your sample . As probabilities correspond to areas under the density function, p-value from t-test can be nicely illustrated with the help of the following pictures:

p-value from t-test

The following formulae say how to calculate p-value from t-test. By cdf t,d we denote the cumulative distribution function of the t-Student distribution with d degrees of freedom:

p-value from left-tailed t-test:

p-value = cdf t,d (t score )

p-value from right-tailed t-test:

p-value = 1 − cdf t,d (t score )

p-value from two-tailed t-test:

p-value = 2 × cdf t,d (−|t score |)

or, equivalently: p-value = 2 − 2 × cdf t,d (|t score |)

However, the cdf of the t-distribution is given by a somewhat complicated formula. To find the p-value by hand, you would need to resort to statistical tables, where approximate cdf values are collected, or to specialized statistical software. Fortunately, our t-test calculator determines the p-value from t-test for you in the blink of an eye!

t-test critical values

Recall, that in the critical values approach to hypothesis testing, you need to set a significance level, α, before computing the critical values , which in turn give rise to critical regions (a.k.a. rejection regions).

Formulas for critical values employ the quantile function of t-distribution, i.e., the inverse of the cdf :

Critical value for left-tailed t-test: cdf t,d -1 (α)

critical region:

(-∞, cdf t,d -1 (α)]

Critical value for right-tailed t-test: cdf t,d -1 (1-α)

[cdf t,d -1 (1-α), ∞)

Critical values for two-tailed t-test: ±cdf t,d -1 (1-α/2)

(-∞, -cdf t,d -1 (1-α/2)] ∪ [cdf t,d -1 (1-α/2), ∞)

To decide the fate of the null hypothesis, just check if your T-score lies within the critical region:

If your T-score belongs to the critical region , reject the null hypothesis and accept the alternative hypothesis.

If your T-score is outside the critical region , then you don't have enough evidence to reject the null hypothesis.

How to use our t-test calculator

Choose the type of t-test you wish to perform:

A one-sample t-test (to test the mean of a single group against a hypothesized mean);

A two-sample t-test (to compare the means for two groups); or

A paired t-test (to check how the mean from the same group changes after some intervention).

Two-tailed;

Left-tailed; or

Right-tailed.

This t-test calculator allows you to use either the p-value approach or the critical regions approach to hypothesis testing!

Enter your T-score and the number of degrees of freedom . If you don't know them, provide some data about your sample(s): sample size, mean, and standard deviation, and our t-test calculator will compute the T-score and degrees of freedom for you .

Once all the parameters are present, the p-value, or critical region, will immediately appear underneath the t-test calculator, along with an interpretation!

One-sample t-test

The null hypothesis is that the population mean is equal to some value μ 0 \mu_0 μ 0 ​ .

The alternative hypothesis is that the population mean is:

  • different from μ 0 \mu_0 μ 0 ​ ;
  • smaller than μ 0 \mu_0 μ 0 ​ ; or
  • greater than μ 0 \mu_0 μ 0 ​ .

One-sample t-test formula :

  • μ 0 \mu_0 μ 0 ​ — Mean postulated in the null hypothesis;
  • n n n — Sample size;
  • x ˉ \bar{x} x ˉ — Sample mean; and
  • s s s — Sample standard deviation.

Number of degrees of freedom in t-test (one-sample) = n − 1 n-1 n − 1 .

The null hypothesis is that the actual difference between these groups' means, μ 1 \mu_1 μ 1 ​ , and μ 2 \mu_2 μ 2 ​ , is equal to some pre-set value, Δ \Delta Δ .

The alternative hypothesis is that the difference μ 1 − μ 2 \mu_1 - \mu_2 μ 1 ​ − μ 2 ​ is:

  • Different from Δ \Delta Δ ;
  • Smaller than Δ \Delta Δ ; or
  • Greater than Δ \Delta Δ .

In particular, if this pre-determined difference is zero ( Δ = 0 \Delta = 0 Δ = 0 ):

The null hypothesis is that the population means are equal.

The alternate hypothesis is that the population means are:

  • μ 1 \mu_1 μ 1 ​ and μ 2 \mu_2 μ 2 ​ are different from one another;
  • μ 1 \mu_1 μ 1 ​ is smaller than μ 2 \mu_2 μ 2 ​ ; and
  • μ 1 \mu_1 μ 1 ​ is greater than μ 2 \mu_2 μ 2 ​ .

Formally, to perform a t-test, we should additionally assume that the variances of the two populations are equal (this assumption is called the homogeneity of variance ).

There is a version of a t-test that can be applied without the assumption of homogeneity of variance: it is called a Welch's t-test . For your convenience, we describe both versions.

Two-sample t-test if variances are equal

Use this test if you know that the two populations' variances are the same (or very similar).

Two-sample t-test formula (with equal variances) :

where s p s_p s p ​ is the so-called pooled standard deviation , which we compute as:

  • Δ \Delta Δ — Mean difference postulated in the null hypothesis;
  • n 1 n_1 n 1 ​ — First sample size;
  • x ˉ 1 \bar{x}_1 x ˉ 1 ​ — Mean for the first sample;
  • s 1 s_1 s 1 ​ — Standard deviation in the first sample;
  • n 2 n_2 n 2 ​ — Second sample size;
  • x ˉ 2 \bar{x}_2 x ˉ 2 ​ — Mean for the second sample; and
  • s 2 s_2 s 2 ​ — Standard deviation in the second sample.

Number of degrees of freedom in t-test (two samples, equal variances) = n 1 + n 2 − 2 n_1 + n_2 - 2 n 1 ​ + n 2 ​ − 2 .

Two-sample t-test if variances are unequal (Welch's t-test)

Use this test if the variances of your populations are different.

Two-sample Welch's t-test formula if variances are unequal:

  • s 1 s_1 s 1 ​ — Standard deviation in the first sample;
  • s 2 s_2 s 2 ​ — Standard deviation in the second sample.

The number of degrees of freedom in a Welch's t-test (two-sample t-test with unequal variances) is very difficult to count. We can approximate it with the help of the following Satterthwaite formula :

Alternatively, you can take the smaller of n 1 − 1 n_1 - 1 n 1 ​ − 1 and n 2 − 1 n_2 - 1 n 2 ​ − 1 as a conservative estimate for the number of degrees of freedom.

🔎 The Satterthwaite formula for the degrees of freedom can be rewritten as a scaled weighted harmonic mean of the degrees of freedom of the respective samples: n 1 − 1 n_1 - 1 n 1 ​ − 1 and n 2 − 1 n_2 - 1 n 2 ​ − 1 , and the weights are proportional to the standard deviations of the corresponding samples.

As we commonly perform a paired t-test when we have data about the same subjects measured twice (before and after some treatment), let us adopt the convention of referring to the samples as the pre-group and post-group.

The null hypothesis is that the true difference between the means of pre- and post-populations is equal to some pre-set value, Δ \Delta Δ .

The alternative hypothesis is that the actual difference between these means is:

Typically, this pre-determined difference is zero. We can then reformulate the hypotheses as follows:

The null hypothesis is that the pre- and post-means are the same, i.e., the treatment has no impact on the population .

The alternative hypothesis:

  • The pre- and post-means are different from one another (treatment has some effect);
  • The pre-mean is smaller than the post-mean (treatment increases the result); or
  • The pre-mean is greater than the post-mean (treatment decreases the result).

Paired t-test formula

In fact, a paired t-test is technically the same as a one-sample t-test! Let us see why it is so. Let x 1 , . . . , x n x_1, ... , x_n x 1 ​ , ... , x n ​ be the pre observations and y 1 , . . . , y n y_1, ... , y_n y 1 ​ , ... , y n ​ the respective post observations. That is, x i , y i x_i, y_i x i ​ , y i ​ are the before and after measurements of the i -th subject.

For each subject, compute the difference, d i : = x i − y i d_i := x_i - y_i d i ​ := x i ​ − y i ​ . All that happens next is just a one-sample t-test performed on the sample of differences d 1 , . . . , d n d_1, ... , d_n d 1 ​ , ... , d n ​ . Take a look at the formula for the T-score :

Δ \Delta Δ — Mean difference postulated in the null hypothesis;

n n n — Size of the sample of differences, i.e., the number of pairs;

x ˉ \bar{x} x ˉ — Mean of the sample of differences; and

s s s  — Standard deviation of the sample of differences.

Number of degrees of freedom in t-test (paired): n − 1 n - 1 n − 1

t-test vs Z-test

We use a Z-test when we want to test the population mean of a normally distributed dataset, which has a known population variance . If the number of degrees of freedom is large, then the t-Student distribution is very close to N(0,1).

Hence, if there are many data points (at least 30), you may swap a t-test for a Z-test, and the results will be almost identical. However, for small samples with unknown variance, remember to use the t-test because, in such cases, the t-Student distribution differs significantly from the N(0,1)!

🙋 Have you concluded you need to perform the z-test? Head straight to our z-test calculator !

What is a t-test?

A t-test is a widely used statistical test that analyzes the means of one or two groups of data. For instance, a t-test is performed on medical data to determine whether a new drug really helps.

What are different types of t-tests?

Different types of t-tests are:

  • One-sample t-test;
  • Two-sample t-test; and
  • Paired t-test.

How to find the t value in a one sample t-test?

To find the t-value:

  • Subtract the null hypothesis mean from the sample mean value.
  • Divide the difference by the standard deviation of the sample.
  • Multiply the resultant with the square root of the sample size.

.css-slt4t3.css-slt4t3{color:#2B3148;background-color:transparent;font-family:"Roboto","Helvetica","Arial",sans-serif;font-size:20px;line-height:24px;overflow:visible;padding-top:0px;position:relative;}.css-slt4t3.css-slt4t3:after{content:'';-webkit-transform:scale(0);-moz-transform:scale(0);-ms-transform:scale(0);transform:scale(0);position:absolute;border:2px solid #EA9430;border-radius:2px;inset:-8px;z-index:1;}.css-slt4t3 .js-external-link-button.link-like,.css-slt4t3 .js-external-link-anchor{color:inherit;border-radius:1px;-webkit-text-decoration:underline;text-decoration:underline;}.css-slt4t3 .js-external-link-button.link-like:hover,.css-slt4t3 .js-external-link-anchor:hover,.css-slt4t3 .js-external-link-button.link-like:active,.css-slt4t3 .js-external-link-anchor:active{text-decoration-thickness:2px;text-shadow:1px 0 0;}.css-slt4t3 .js-external-link-button.link-like:focus-visible,.css-slt4t3 .js-external-link-anchor:focus-visible{outline:transparent 2px dotted;box-shadow:0 0 0 2px #6314E6;}.css-slt4t3 p,.css-slt4t3 div{margin:0px;display:block;}.css-slt4t3 pre{margin:0px;display:block;}.css-slt4t3 pre code{display:block;width:-webkit-fit-content;width:-moz-fit-content;width:fit-content;}.css-slt4t3 pre:not(:first-child){padding-top:8px;}.css-slt4t3 ul,.css-slt4t3 ol{display:block margin:0px;padding-left:20px;}.css-slt4t3 ul li,.css-slt4t3 ol li{padding-top:8px;}.css-slt4t3 ul ul,.css-slt4t3 ol ul,.css-slt4t3 ul ol,.css-slt4t3 ol ol{padding-top:0px;}.css-slt4t3 ul:not(:first-child),.css-slt4t3 ol:not(:first-child){padding-top:4px;} .css-4okk7a{margin:auto;background-color:white;overflow:auto;overflow-wrap:break-word;word-break:break-word;}.css-4okk7a code,.css-4okk7a kbd,.css-4okk7a pre,.css-4okk7a samp{font-family:monospace;}.css-4okk7a code{padding:2px 4px;color:#444;background:#ddd;border-radius:4px;}.css-4okk7a figcaption,.css-4okk7a caption{text-align:center;}.css-4okk7a figcaption{font-size:12px;font-style:italic;overflow:hidden;}.css-4okk7a h3{font-size:1.75rem;}.css-4okk7a h4{font-size:1.5rem;}.css-4okk7a .mathBlock{font-size:24px;-webkit-padding-start:4px;padding-inline-start:4px;}.css-4okk7a .mathBlock .katex{font-size:24px;text-align:left;}.css-4okk7a .math-inline{background-color:#f0f0f0;display:inline-block;font-size:inherit;padding:0 3px;}.css-4okk7a .videoBlock,.css-4okk7a .imageBlock{margin-bottom:16px;}.css-4okk7a .imageBlock__image-align--left,.css-4okk7a .videoBlock__video-align--left{float:left;}.css-4okk7a .imageBlock__image-align--right,.css-4okk7a .videoBlock__video-align--right{float:right;}.css-4okk7a .imageBlock__image-align--center,.css-4okk7a .videoBlock__video-align--center{display:block;margin-left:auto;margin-right:auto;clear:both;}.css-4okk7a .imageBlock__image-align--none,.css-4okk7a .videoBlock__video-align--none{clear:both;margin-left:0;margin-right:0;}.css-4okk7a .videoBlock__video--wrapper{position:relative;padding-bottom:56.25%;height:0;}.css-4okk7a .videoBlock__video--wrapper iframe{position:absolute;top:0;left:0;width:100%;height:100%;}.css-4okk7a .videoBlock__caption{text-align:left;}@font-face{font-family:'KaTeX_AMS';src:url(/katex-fonts/KaTeX_AMS-Regular.woff2) format('woff2'),url(/katex-fonts/KaTeX_AMS-Regular.woff) format('woff'),url(/katex-fonts/KaTeX_AMS-Regular.ttf) format('truetype');font-weight:normal;font-style:normal;}@font-face{font-family:'KaTeX_Caligraphic';src:url(/katex-fonts/KaTeX_Caligraphic-Bold.woff2) format('woff2'),url(/katex-fonts/KaTeX_Caligraphic-Bold.woff) format('woff'),url(/katex-fonts/KaTeX_Caligraphic-Bold.ttf) format('truetype');font-weight:bold;font-style:normal;}@font-face{font-family:'KaTeX_Caligraphic';src:url(/katex-fonts/KaTeX_Caligraphic-Regular.woff2) format('woff2'),url(/katex-fonts/KaTeX_Caligraphic-Regular.woff) format('woff'),url(/katex-fonts/KaTeX_Caligraphic-Regular.ttf) format('truetype');font-weight:normal;font-style:normal;}@font-face{font-family:'KaTeX_Fraktur';src:url(/katex-fonts/KaTeX_Fraktur-Bold.woff2) format('woff2'),url(/katex-fonts/KaTeX_Fraktur-Bold.woff) format('woff'),url(/katex-fonts/KaTeX_Fraktur-Bold.ttf) format('truetype');font-weight:bold;font-style:normal;}@font-face{font-family:'KaTeX_Fraktur';src:url(/katex-fonts/KaTeX_Fraktur-Regular.woff2) format('woff2'),url(/katex-fonts/KaTeX_Fraktur-Regular.woff) format('woff'),url(/katex-fonts/KaTeX_Fraktur-Regular.ttf) format('truetype');font-weight:normal;font-style:normal;}@font-face{font-family:'KaTeX_Main';src:url(/katex-fonts/KaTeX_Main-Bold.woff2) format('woff2'),url(/katex-fonts/KaTeX_Main-Bold.woff) format('woff'),url(/katex-fonts/KaTeX_Main-Bold.ttf) format('truetype');font-weight:bold;font-style:normal;}@font-face{font-family:'KaTeX_Main';src:url(/katex-fonts/KaTeX_Main-BoldItalic.woff2) format('woff2'),url(/katex-fonts/KaTeX_Main-BoldItalic.woff) format('woff'),url(/katex-fonts/KaTeX_Main-BoldItalic.ttf) format('truetype');font-weight:bold;font-style:italic;}@font-face{font-family:'KaTeX_Main';src:url(/katex-fonts/KaTeX_Main-Italic.woff2) format('woff2'),url(/katex-fonts/KaTeX_Main-Italic.woff) format('woff'),url(/katex-fonts/KaTeX_Main-Italic.ttf) format('truetype');font-weight:normal;font-style:italic;}@font-face{font-family:'KaTeX_Main';src:url(/katex-fonts/KaTeX_Main-Regular.woff2) format('woff2'),url(/katex-fonts/KaTeX_Main-Regular.woff) format('woff'),url(/katex-fonts/KaTeX_Main-Regular.ttf) format('truetype');font-weight:normal;font-style:normal;}@font-face{font-family:'KaTeX_Math';src:url(/katex-fonts/KaTeX_Math-BoldItalic.woff2) format('woff2'),url(/katex-fonts/KaTeX_Math-BoldItalic.woff) format('woff'),url(/katex-fonts/KaTeX_Math-BoldItalic.ttf) format('truetype');font-weight:bold;font-style:italic;}@font-face{font-family:'KaTeX_Math';src:url(/katex-fonts/KaTeX_Math-Italic.woff2) format('woff2'),url(/katex-fonts/KaTeX_Math-Italic.woff) format('woff'),url(/katex-fonts/KaTeX_Math-Italic.ttf) format('truetype');font-weight:normal;font-style:italic;}@font-face{font-family:'KaTeX_SansSerif';src:url(/katex-fonts/KaTeX_SansSerif-Bold.woff2) format('woff2'),url(/katex-fonts/KaTeX_SansSerif-Bold.woff) format('woff'),url(/katex-fonts/KaTeX_SansSerif-Bold.ttf) format('truetype');font-weight:bold;font-style:normal;}@font-face{font-family:'KaTeX_SansSerif';src:url(/katex-fonts/KaTeX_SansSerif-Italic.woff2) format('woff2'),url(/katex-fonts/KaTeX_SansSerif-Italic.woff) format('woff'),url(/katex-fonts/KaTeX_SansSerif-Italic.ttf) format('truetype');font-weight:normal;font-style:italic;}@font-face{font-family:'KaTeX_SansSerif';src:url(/katex-fonts/KaTeX_SansSerif-Regular.woff2) format('woff2'),url(/katex-fonts/KaTeX_SansSerif-Regular.woff) format('woff'),url(/katex-fonts/KaTeX_SansSerif-Regular.ttf) format('truetype');font-weight:normal;font-style:normal;}@font-face{font-family:'KaTeX_Script';src:url(/katex-fonts/KaTeX_Script-Regular.woff2) format('woff2'),url(/katex-fonts/KaTeX_Script-Regular.woff) format('woff'),url(/katex-fonts/KaTeX_Script-Regular.ttf) format('truetype');font-weight:normal;font-style:normal;}@font-face{font-family:'KaTeX_Size1';src:url(/katex-fonts/KaTeX_Size1-Regular.woff2) format('woff2'),url(/katex-fonts/KaTeX_Size1-Regular.woff) format('woff'),url(/katex-fonts/KaTeX_Size1-Regular.ttf) format('truetype');font-weight:normal;font-style:normal;}@font-face{font-family:'KaTeX_Size2';src:url(/katex-fonts/KaTeX_Size2-Regular.woff2) format('woff2'),url(/katex-fonts/KaTeX_Size2-Regular.woff) format('woff'),url(/katex-fonts/KaTeX_Size2-Regular.ttf) format('truetype');font-weight:normal;font-style:normal;}@font-face{font-family:'KaTeX_Size3';src:url(/katex-fonts/KaTeX_Size3-Regular.woff2) format('woff2'),url(/katex-fonts/KaTeX_Size3-Regular.woff) format('woff'),url(/katex-fonts/KaTeX_Size3-Regular.ttf) format('truetype');font-weight:normal;font-style:normal;}@font-face{font-family:'KaTeX_Size4';src:url(/katex-fonts/KaTeX_Size4-Regular.woff2) format('woff2'),url(/katex-fonts/KaTeX_Size4-Regular.woff) format('woff'),url(/katex-fonts/KaTeX_Size4-Regular.ttf) format('truetype');font-weight:normal;font-style:normal;}@font-face{font-family:'KaTeX_Typewriter';src:url(/katex-fonts/KaTeX_Typewriter-Regular.woff2) format('woff2'),url(/katex-fonts/KaTeX_Typewriter-Regular.woff) format('woff'),url(/katex-fonts/KaTeX_Typewriter-Regular.ttf) format('truetype');font-weight:normal;font-style:normal;}.css-4okk7a .katex{font:normal 1.21em KaTeX_Main,Times New Roman,serif;line-height:1.2;text-indent:0;text-rendering:auto;}.css-4okk7a .katex *{-ms-high-contrast-adjust:none!important;border-color:currentColor;}.css-4okk7a .katex .katex-version::after{content:'0.13.13';}.css-4okk7a .katex .katex-mathml{position:absolute;clip:rect(1px, 1px, 1px, 1px);padding:0;border:0;height:1px;width:1px;overflow:hidden;}.css-4okk7a .katex .katex-html>.newline{display:block;}.css-4okk7a .katex .base{position:relative;display:inline-block;white-space:nowrap;width:-webkit-min-content;width:-moz-min-content;width:-webkit-min-content;width:-moz-min-content;width:min-content;}.css-4okk7a .katex .strut{display:inline-block;}.css-4okk7a .katex .textbf{font-weight:bold;}.css-4okk7a .katex .textit{font-style:italic;}.css-4okk7a .katex .textrm{font-family:KaTeX_Main;}.css-4okk7a .katex .textsf{font-family:KaTeX_SansSerif;}.css-4okk7a .katex .texttt{font-family:KaTeX_Typewriter;}.css-4okk7a .katex .mathnormal{font-family:KaTeX_Math;font-style:italic;}.css-4okk7a .katex .mathit{font-family:KaTeX_Main;font-style:italic;}.css-4okk7a .katex .mathrm{font-style:normal;}.css-4okk7a .katex .mathbf{font-family:KaTeX_Main;font-weight:bold;}.css-4okk7a .katex .boldsymbol{font-family:KaTeX_Math;font-weight:bold;font-style:italic;}.css-4okk7a .katex .amsrm{font-family:KaTeX_AMS;}.css-4okk7a .katex .mathbb,.css-4okk7a .katex .textbb{font-family:KaTeX_AMS;}.css-4okk7a .katex .mathcal{font-family:KaTeX_Caligraphic;}.css-4okk7a .katex .mathfrak,.css-4okk7a .katex .textfrak{font-family:KaTeX_Fraktur;}.css-4okk7a .katex .mathtt{font-family:KaTeX_Typewriter;}.css-4okk7a .katex .mathscr,.css-4okk7a .katex .textscr{font-family:KaTeX_Script;}.css-4okk7a .katex .mathsf,.css-4okk7a .katex .textsf{font-family:KaTeX_SansSerif;}.css-4okk7a .katex .mathboldsf,.css-4okk7a .katex .textboldsf{font-family:KaTeX_SansSerif;font-weight:bold;}.css-4okk7a .katex .mathitsf,.css-4okk7a .katex .textitsf{font-family:KaTeX_SansSerif;font-style:italic;}.css-4okk7a .katex .mainrm{font-family:KaTeX_Main;font-style:normal;}.css-4okk7a .katex .vlist-t{display:inline-table;table-layout:fixed;border-collapse:collapse;}.css-4okk7a .katex .vlist-r{display:table-row;}.css-4okk7a .katex .vlist{display:table-cell;vertical-align:bottom;position:relative;}.css-4okk7a .katex .vlist>span{display:block;height:0;position:relative;}.css-4okk7a .katex .vlist>span>span{display:inline-block;}.css-4okk7a .katex .vlist>span>.pstrut{overflow:hidden;width:0;}.css-4okk7a .katex .vlist-t2{margin-right:-2px;}.css-4okk7a .katex .vlist-s{display:table-cell;vertical-align:bottom;font-size:1px;width:2px;min-width:2px;}.css-4okk7a .katex .vbox{display:-webkit-inline-box;display:-webkit-inline-flex;display:-ms-inline-flexbox;display:inline-flex;-webkit-flex-direction:column;-ms-flex-direction:column;flex-direction:column;-webkit-align-items:baseline;-webkit-box-align:baseline;-ms-flex-align:baseline;align-items:baseline;}.css-4okk7a .katex .hbox{display:-webkit-inline-box;display:-webkit-inline-flex;display:-ms-inline-flexbox;display:inline-flex;-webkit-flex-direction:row;-ms-flex-direction:row;flex-direction:row;width:100%;}.css-4okk7a .katex .thinbox{display:-webkit-inline-box;display:-webkit-inline-flex;display:-ms-inline-flexbox;display:inline-flex;-webkit-flex-direction:row;-ms-flex-direction:row;flex-direction:row;width:0;max-width:0;}.css-4okk7a .katex .msupsub{text-align:left;}.css-4okk7a .katex .mfrac>span>span{text-align:center;}.css-4okk7a .katex .mfrac .frac-line{display:inline-block;width:100%;border-bottom-style:solid;}.css-4okk7a .katex .mfrac .frac-line,.css-4okk7a .katex .overline .overline-line,.css-4okk7a .katex .underline .underline-line,.css-4okk7a .katex .hline,.css-4okk7a .katex .hdashline,.css-4okk7a .katex .rule{min-height:1px;}.css-4okk7a .katex .mspace{display:inline-block;}.css-4okk7a .katex .llap,.css-4okk7a .katex .rlap,.css-4okk7a .katex .clap{width:0;position:relative;}.css-4okk7a .katex .llap>.inner,.css-4okk7a .katex .rlap>.inner,.css-4okk7a .katex .clap>.inner{position:absolute;}.css-4okk7a .katex .llap>.fix,.css-4okk7a .katex .rlap>.fix,.css-4okk7a .katex .clap>.fix{display:inline-block;}.css-4okk7a .katex .llap>.inner{right:0;}.css-4okk7a .katex .rlap>.inner,.css-4okk7a .katex .clap>.inner{left:0;}.css-4okk7a .katex .clap>.inner>span{margin-left:-50%;margin-right:50%;}.css-4okk7a .katex .rule{display:inline-block;border:solid 0;position:relative;}.css-4okk7a .katex .overline .overline-line,.css-4okk7a .katex .underline .underline-line,.css-4okk7a .katex .hline{display:inline-block;width:100%;border-bottom-style:solid;}.css-4okk7a .katex .hdashline{display:inline-block;width:100%;border-bottom-style:dashed;}.css-4okk7a .katex .sqrt>.root{margin-left:0.27777778em;margin-right:-0.55555556em;}.css-4okk7a .katex .sizing.reset-size1.size1,.css-4okk7a .katex .fontsize-ensurer.reset-size1.size1{font-size:1em;}.css-4okk7a .katex .sizing.reset-size1.size2,.css-4okk7a .katex .fontsize-ensurer.reset-size1.size2{font-size:1.2em;}.css-4okk7a .katex .sizing.reset-size1.size3,.css-4okk7a .katex .fontsize-ensurer.reset-size1.size3{font-size:1.4em;}.css-4okk7a .katex .sizing.reset-size1.size4,.css-4okk7a .katex .fontsize-ensurer.reset-size1.size4{font-size:1.6em;}.css-4okk7a .katex .sizing.reset-size1.size5,.css-4okk7a .katex .fontsize-ensurer.reset-size1.size5{font-size:1.8em;}.css-4okk7a .katex .sizing.reset-size1.size6,.css-4okk7a .katex .fontsize-ensurer.reset-size1.size6{font-size:2em;}.css-4okk7a .katex .sizing.reset-size1.size7,.css-4okk7a .katex .fontsize-ensurer.reset-size1.size7{font-size:2.4em;}.css-4okk7a .katex .sizing.reset-size1.size8,.css-4okk7a .katex .fontsize-ensurer.reset-size1.size8{font-size:2.88em;}.css-4okk7a .katex .sizing.reset-size1.size9,.css-4okk7a .katex .fontsize-ensurer.reset-size1.size9{font-size:3.456em;}.css-4okk7a .katex .sizing.reset-size1.size10,.css-4okk7a .katex .fontsize-ensurer.reset-size1.size10{font-size:4.148em;}.css-4okk7a .katex .sizing.reset-size1.size11,.css-4okk7a .katex .fontsize-ensurer.reset-size1.size11{font-size:4.976em;}.css-4okk7a .katex .sizing.reset-size2.size1,.css-4okk7a .katex .fontsize-ensurer.reset-size2.size1{font-size:0.83333333em;}.css-4okk7a .katex .sizing.reset-size2.size2,.css-4okk7a .katex .fontsize-ensurer.reset-size2.size2{font-size:1em;}.css-4okk7a .katex .sizing.reset-size2.size3,.css-4okk7a .katex .fontsize-ensurer.reset-size2.size3{font-size:1.16666667em;}.css-4okk7a .katex .sizing.reset-size2.size4,.css-4okk7a .katex .fontsize-ensurer.reset-size2.size4{font-size:1.33333333em;}.css-4okk7a .katex .sizing.reset-size2.size5,.css-4okk7a .katex .fontsize-ensurer.reset-size2.size5{font-size:1.5em;}.css-4okk7a .katex .sizing.reset-size2.size6,.css-4okk7a .katex .fontsize-ensurer.reset-size2.size6{font-size:1.66666667em;}.css-4okk7a .katex .sizing.reset-size2.size7,.css-4okk7a .katex .fontsize-ensurer.reset-size2.size7{font-size:2em;}.css-4okk7a .katex .sizing.reset-size2.size8,.css-4okk7a .katex .fontsize-ensurer.reset-size2.size8{font-size:2.4em;}.css-4okk7a .katex .sizing.reset-size2.size9,.css-4okk7a .katex .fontsize-ensurer.reset-size2.size9{font-size:2.88em;}.css-4okk7a .katex .sizing.reset-size2.size10,.css-4okk7a .katex .fontsize-ensurer.reset-size2.size10{font-size:3.45666667em;}.css-4okk7a .katex .sizing.reset-size2.size11,.css-4okk7a .katex .fontsize-ensurer.reset-size2.size11{font-size:4.14666667em;}.css-4okk7a .katex .sizing.reset-size3.size1,.css-4okk7a .katex .fontsize-ensurer.reset-size3.size1{font-size:0.71428571em;}.css-4okk7a .katex .sizing.reset-size3.size2,.css-4okk7a .katex .fontsize-ensurer.reset-size3.size2{font-size:0.85714286em;}.css-4okk7a .katex .sizing.reset-size3.size3,.css-4okk7a .katex .fontsize-ensurer.reset-size3.size3{font-size:1em;}.css-4okk7a .katex .sizing.reset-size3.size4,.css-4okk7a .katex .fontsize-ensurer.reset-size3.size4{font-size:1.14285714em;}.css-4okk7a .katex .sizing.reset-size3.size5,.css-4okk7a .katex .fontsize-ensurer.reset-size3.size5{font-size:1.28571429em;}.css-4okk7a .katex .sizing.reset-size3.size6,.css-4okk7a .katex .fontsize-ensurer.reset-size3.size6{font-size:1.42857143em;}.css-4okk7a .katex .sizing.reset-size3.size7,.css-4okk7a .katex .fontsize-ensurer.reset-size3.size7{font-size:1.71428571em;}.css-4okk7a .katex .sizing.reset-size3.size8,.css-4okk7a .katex .fontsize-ensurer.reset-size3.size8{font-size:2.05714286em;}.css-4okk7a .katex .sizing.reset-size3.size9,.css-4okk7a .katex .fontsize-ensurer.reset-size3.size9{font-size:2.46857143em;}.css-4okk7a .katex .sizing.reset-size3.size10,.css-4okk7a .katex .fontsize-ensurer.reset-size3.size10{font-size:2.96285714em;}.css-4okk7a .katex .sizing.reset-size3.size11,.css-4okk7a .katex .fontsize-ensurer.reset-size3.size11{font-size:3.55428571em;}.css-4okk7a .katex .sizing.reset-size4.size1,.css-4okk7a .katex .fontsize-ensurer.reset-size4.size1{font-size:0.625em;}.css-4okk7a .katex .sizing.reset-size4.size2,.css-4okk7a .katex .fontsize-ensurer.reset-size4.size2{font-size:0.75em;}.css-4okk7a .katex .sizing.reset-size4.size3,.css-4okk7a .katex .fontsize-ensurer.reset-size4.size3{font-size:0.875em;}.css-4okk7a .katex .sizing.reset-size4.size4,.css-4okk7a .katex .fontsize-ensurer.reset-size4.size4{font-size:1em;}.css-4okk7a .katex .sizing.reset-size4.size5,.css-4okk7a .katex .fontsize-ensurer.reset-size4.size5{font-size:1.125em;}.css-4okk7a .katex .sizing.reset-size4.size6,.css-4okk7a .katex .fontsize-ensurer.reset-size4.size6{font-size:1.25em;}.css-4okk7a .katex .sizing.reset-size4.size7,.css-4okk7a .katex .fontsize-ensurer.reset-size4.size7{font-size:1.5em;}.css-4okk7a .katex .sizing.reset-size4.size8,.css-4okk7a .katex .fontsize-ensurer.reset-size4.size8{font-size:1.8em;}.css-4okk7a .katex .sizing.reset-size4.size9,.css-4okk7a .katex .fontsize-ensurer.reset-size4.size9{font-size:2.16em;}.css-4okk7a .katex .sizing.reset-size4.size10,.css-4okk7a .katex .fontsize-ensurer.reset-size4.size10{font-size:2.5925em;}.css-4okk7a .katex .sizing.reset-size4.size11,.css-4okk7a .katex .fontsize-ensurer.reset-size4.size11{font-size:3.11em;}.css-4okk7a .katex .sizing.reset-size5.size1,.css-4okk7a .katex .fontsize-ensurer.reset-size5.size1{font-size:0.55555556em;}.css-4okk7a .katex .sizing.reset-size5.size2,.css-4okk7a .katex .fontsize-ensurer.reset-size5.size2{font-size:0.66666667em;}.css-4okk7a .katex .sizing.reset-size5.size3,.css-4okk7a .katex .fontsize-ensurer.reset-size5.size3{font-size:0.77777778em;}.css-4okk7a .katex .sizing.reset-size5.size4,.css-4okk7a .katex .fontsize-ensurer.reset-size5.size4{font-size:0.88888889em;}.css-4okk7a .katex .sizing.reset-size5.size5,.css-4okk7a .katex .fontsize-ensurer.reset-size5.size5{font-size:1em;}.css-4okk7a .katex .sizing.reset-size5.size6,.css-4okk7a .katex .fontsize-ensurer.reset-size5.size6{font-size:1.11111111em;}.css-4okk7a .katex .sizing.reset-size5.size7,.css-4okk7a .katex .fontsize-ensurer.reset-size5.size7{font-size:1.33333333em;}.css-4okk7a .katex .sizing.reset-size5.size8,.css-4okk7a .katex .fontsize-ensurer.reset-size5.size8{font-size:1.6em;}.css-4okk7a .katex .sizing.reset-size5.size9,.css-4okk7a .katex .fontsize-ensurer.reset-size5.size9{font-size:1.92em;}.css-4okk7a .katex .sizing.reset-size5.size10,.css-4okk7a .katex .fontsize-ensurer.reset-size5.size10{font-size:2.30444444em;}.css-4okk7a .katex .sizing.reset-size5.size11,.css-4okk7a .katex .fontsize-ensurer.reset-size5.size11{font-size:2.76444444em;}.css-4okk7a .katex .sizing.reset-size6.size1,.css-4okk7a .katex .fontsize-ensurer.reset-size6.size1{font-size:0.5em;}.css-4okk7a .katex .sizing.reset-size6.size2,.css-4okk7a .katex .fontsize-ensurer.reset-size6.size2{font-size:0.6em;}.css-4okk7a .katex .sizing.reset-size6.size3,.css-4okk7a .katex .fontsize-ensurer.reset-size6.size3{font-size:0.7em;}.css-4okk7a .katex .sizing.reset-size6.size4,.css-4okk7a .katex .fontsize-ensurer.reset-size6.size4{font-size:0.8em;}.css-4okk7a .katex .sizing.reset-size6.size5,.css-4okk7a .katex .fontsize-ensurer.reset-size6.size5{font-size:0.9em;}.css-4okk7a .katex .sizing.reset-size6.size6,.css-4okk7a .katex .fontsize-ensurer.reset-size6.size6{font-size:1em;}.css-4okk7a .katex .sizing.reset-size6.size7,.css-4okk7a .katex .fontsize-ensurer.reset-size6.size7{font-size:1.2em;}.css-4okk7a .katex .sizing.reset-size6.size8,.css-4okk7a .katex .fontsize-ensurer.reset-size6.size8{font-size:1.44em;}.css-4okk7a .katex .sizing.reset-size6.size9,.css-4okk7a .katex .fontsize-ensurer.reset-size6.size9{font-size:1.728em;}.css-4okk7a .katex .sizing.reset-size6.size10,.css-4okk7a .katex .fontsize-ensurer.reset-size6.size10{font-size:2.074em;}.css-4okk7a .katex .sizing.reset-size6.size11,.css-4okk7a .katex .fontsize-ensurer.reset-size6.size11{font-size:2.488em;}.css-4okk7a .katex .sizing.reset-size7.size1,.css-4okk7a .katex .fontsize-ensurer.reset-size7.size1{font-size:0.41666667em;}.css-4okk7a .katex .sizing.reset-size7.size2,.css-4okk7a .katex .fontsize-ensurer.reset-size7.size2{font-size:0.5em;}.css-4okk7a .katex .sizing.reset-size7.size3,.css-4okk7a .katex .fontsize-ensurer.reset-size7.size3{font-size:0.58333333em;}.css-4okk7a .katex .sizing.reset-size7.size4,.css-4okk7a .katex .fontsize-ensurer.reset-size7.size4{font-size:0.66666667em;}.css-4okk7a .katex .sizing.reset-size7.size5,.css-4okk7a .katex .fontsize-ensurer.reset-size7.size5{font-size:0.75em;}.css-4okk7a .katex .sizing.reset-size7.size6,.css-4okk7a .katex .fontsize-ensurer.reset-size7.size6{font-size:0.83333333em;}.css-4okk7a .katex .sizing.reset-size7.size7,.css-4okk7a .katex .fontsize-ensurer.reset-size7.size7{font-size:1em;}.css-4okk7a .katex .sizing.reset-size7.size8,.css-4okk7a .katex .fontsize-ensurer.reset-size7.size8{font-size:1.2em;}.css-4okk7a .katex .sizing.reset-size7.size9,.css-4okk7a .katex .fontsize-ensurer.reset-size7.size9{font-size:1.44em;}.css-4okk7a .katex .sizing.reset-size7.size10,.css-4okk7a .katex .fontsize-ensurer.reset-size7.size10{font-size:1.72833333em;}.css-4okk7a .katex .sizing.reset-size7.size11,.css-4okk7a .katex .fontsize-ensurer.reset-size7.size11{font-size:2.07333333em;}.css-4okk7a .katex .sizing.reset-size8.size1,.css-4okk7a .katex .fontsize-ensurer.reset-size8.size1{font-size:0.34722222em;}.css-4okk7a .katex .sizing.reset-size8.size2,.css-4okk7a .katex .fontsize-ensurer.reset-size8.size2{font-size:0.41666667em;}.css-4okk7a .katex .sizing.reset-size8.size3,.css-4okk7a .katex .fontsize-ensurer.reset-size8.size3{font-size:0.48611111em;}.css-4okk7a .katex .sizing.reset-size8.size4,.css-4okk7a .katex .fontsize-ensurer.reset-size8.size4{font-size:0.55555556em;}.css-4okk7a .katex .sizing.reset-size8.size5,.css-4okk7a .katex .fontsize-ensurer.reset-size8.size5{font-size:0.625em;}.css-4okk7a .katex .sizing.reset-size8.size6,.css-4okk7a .katex .fontsize-ensurer.reset-size8.size6{font-size:0.69444444em;}.css-4okk7a .katex .sizing.reset-size8.size7,.css-4okk7a .katex .fontsize-ensurer.reset-size8.size7{font-size:0.83333333em;}.css-4okk7a .katex .sizing.reset-size8.size8,.css-4okk7a .katex .fontsize-ensurer.reset-size8.size8{font-size:1em;}.css-4okk7a .katex .sizing.reset-size8.size9,.css-4okk7a .katex .fontsize-ensurer.reset-size8.size9{font-size:1.2em;}.css-4okk7a .katex .sizing.reset-size8.size10,.css-4okk7a .katex .fontsize-ensurer.reset-size8.size10{font-size:1.44027778em;}.css-4okk7a .katex .sizing.reset-size8.size11,.css-4okk7a .katex .fontsize-ensurer.reset-size8.size11{font-size:1.72777778em;}.css-4okk7a .katex .sizing.reset-size9.size1,.css-4okk7a .katex .fontsize-ensurer.reset-size9.size1{font-size:0.28935185em;}.css-4okk7a .katex .sizing.reset-size9.size2,.css-4okk7a .katex .fontsize-ensurer.reset-size9.size2{font-size:0.34722222em;}.css-4okk7a .katex .sizing.reset-size9.size3,.css-4okk7a .katex .fontsize-ensurer.reset-size9.size3{font-size:0.40509259em;}.css-4okk7a .katex .sizing.reset-size9.size4,.css-4okk7a .katex .fontsize-ensurer.reset-size9.size4{font-size:0.46296296em;}.css-4okk7a .katex .sizing.reset-size9.size5,.css-4okk7a .katex .fontsize-ensurer.reset-size9.size5{font-size:0.52083333em;}.css-4okk7a .katex .sizing.reset-size9.size6,.css-4okk7a .katex .fontsize-ensurer.reset-size9.size6{font-size:0.5787037em;}.css-4okk7a .katex .sizing.reset-size9.size7,.css-4okk7a .katex .fontsize-ensurer.reset-size9.size7{font-size:0.69444444em;}.css-4okk7a .katex .sizing.reset-size9.size8,.css-4okk7a .katex .fontsize-ensurer.reset-size9.size8{font-size:0.83333333em;}.css-4okk7a .katex .sizing.reset-size9.size9,.css-4okk7a .katex .fontsize-ensurer.reset-size9.size9{font-size:1em;}.css-4okk7a .katex .sizing.reset-size9.size10,.css-4okk7a .katex .fontsize-ensurer.reset-size9.size10{font-size:1.20023148em;}.css-4okk7a .katex .sizing.reset-size9.size11,.css-4okk7a .katex .fontsize-ensurer.reset-size9.size11{font-size:1.43981481em;}.css-4okk7a .katex .sizing.reset-size10.size1,.css-4okk7a .katex .fontsize-ensurer.reset-size10.size1{font-size:0.24108004em;}.css-4okk7a .katex .sizing.reset-size10.size2,.css-4okk7a .katex .fontsize-ensurer.reset-size10.size2{font-size:0.28929605em;}.css-4okk7a .katex .sizing.reset-size10.size3,.css-4okk7a .katex .fontsize-ensurer.reset-size10.size3{font-size:0.33751205em;}.css-4okk7a .katex .sizing.reset-size10.size4,.css-4okk7a .katex .fontsize-ensurer.reset-size10.size4{font-size:0.38572806em;}.css-4okk7a .katex .sizing.reset-size10.size5,.css-4okk7a .katex .fontsize-ensurer.reset-size10.size5{font-size:0.43394407em;}.css-4okk7a .katex .sizing.reset-size10.size6,.css-4okk7a .katex .fontsize-ensurer.reset-size10.size6{font-size:0.48216008em;}.css-4okk7a .katex .sizing.reset-size10.size7,.css-4okk7a .katex .fontsize-ensurer.reset-size10.size7{font-size:0.57859209em;}.css-4okk7a .katex .sizing.reset-size10.size8,.css-4okk7a .katex .fontsize-ensurer.reset-size10.size8{font-size:0.69431051em;}.css-4okk7a .katex .sizing.reset-size10.size9,.css-4okk7a .katex .fontsize-ensurer.reset-size10.size9{font-size:0.83317261em;}.css-4okk7a .katex .sizing.reset-size10.size10,.css-4okk7a .katex .fontsize-ensurer.reset-size10.size10{font-size:1em;}.css-4okk7a .katex .sizing.reset-size10.size11,.css-4okk7a .katex .fontsize-ensurer.reset-size10.size11{font-size:1.19961427em;}.css-4okk7a .katex .sizing.reset-size11.size1,.css-4okk7a .katex .fontsize-ensurer.reset-size11.size1{font-size:0.20096463em;}.css-4okk7a .katex .sizing.reset-size11.size2,.css-4okk7a .katex .fontsize-ensurer.reset-size11.size2{font-size:0.24115756em;}.css-4okk7a .katex .sizing.reset-size11.size3,.css-4okk7a .katex .fontsize-ensurer.reset-size11.size3{font-size:0.28135048em;}.css-4okk7a .katex .sizing.reset-size11.size4,.css-4okk7a .katex .fontsize-ensurer.reset-size11.size4{font-size:0.32154341em;}.css-4okk7a .katex .sizing.reset-size11.size5,.css-4okk7a .katex .fontsize-ensurer.reset-size11.size5{font-size:0.36173633em;}.css-4okk7a .katex .sizing.reset-size11.size6,.css-4okk7a .katex .fontsize-ensurer.reset-size11.size6{font-size:0.40192926em;}.css-4okk7a .katex .sizing.reset-size11.size7,.css-4okk7a .katex .fontsize-ensurer.reset-size11.size7{font-size:0.48231511em;}.css-4okk7a .katex .sizing.reset-size11.size8,.css-4okk7a .katex .fontsize-ensurer.reset-size11.size8{font-size:0.57877814em;}.css-4okk7a .katex .sizing.reset-size11.size9,.css-4okk7a .katex .fontsize-ensurer.reset-size11.size9{font-size:0.69453376em;}.css-4okk7a .katex .sizing.reset-size11.size10,.css-4okk7a .katex .fontsize-ensurer.reset-size11.size10{font-size:0.83360129em;}.css-4okk7a .katex .sizing.reset-size11.size11,.css-4okk7a .katex .fontsize-ensurer.reset-size11.size11{font-size:1em;}.css-4okk7a .katex .delimsizing.size1{font-family:KaTeX_Size1;}.css-4okk7a .katex .delimsizing.size2{font-family:KaTeX_Size2;}.css-4okk7a .katex .delimsizing.size3{font-family:KaTeX_Size3;}.css-4okk7a .katex .delimsizing.size4{font-family:KaTeX_Size4;}.css-4okk7a .katex .delimsizing.mult .delim-size1>span{font-family:KaTeX_Size1;}.css-4okk7a .katex .delimsizing.mult .delim-size4>span{font-family:KaTeX_Size4;}.css-4okk7a .katex .nulldelimiter{display:inline-block;width:0.12em;}.css-4okk7a .katex .delimcenter{position:relative;}.css-4okk7a .katex .op-symbol{position:relative;}.css-4okk7a .katex .op-symbol.small-op{font-family:KaTeX_Size1;}.css-4okk7a .katex .op-symbol.large-op{font-family:KaTeX_Size2;}.css-4okk7a .katex .op-limits>.vlist-t{text-align:center;}.css-4okk7a .katex .accent>.vlist-t{text-align:center;}.css-4okk7a .katex .accent .accent-body{position:relative;}.css-4okk7a .katex .accent .accent-body:not(.accent-full){width:0;}.css-4okk7a .katex .overlay{display:block;}.css-4okk7a .katex .mtable .vertical-separator{display:inline-block;min-width:1px;}.css-4okk7a .katex .mtable .arraycolsep{display:inline-block;}.css-4okk7a .katex .mtable .col-align-c>.vlist-t{text-align:center;}.css-4okk7a .katex .mtable .col-align-l>.vlist-t{text-align:left;}.css-4okk7a .katex .mtable .col-align-r>.vlist-t{text-align:right;}.css-4okk7a .katex .svg-align{text-align:left;}.css-4okk7a .katex svg{display:block;position:absolute;width:100%;height:inherit;fill:currentColor;stroke:currentColor;fill-rule:nonzero;fill-opacity:1;stroke-width:1;stroke-linecap:butt;stroke-linejoin:miter;stroke-miterlimit:4;stroke-dasharray:none;stroke-dashoffset:0;stroke-opacity:1;}.css-4okk7a .katex svg path{stroke:none;}.css-4okk7a .katex img{border-style:none;min-width:0;min-height:0;max-width:none;max-height:none;}.css-4okk7a .katex .stretchy{width:100%;display:block;position:relative;overflow:hidden;}.css-4okk7a .katex .stretchy::before,.css-4okk7a .katex .stretchy::after{content:'';}.css-4okk7a .katex .hide-tail{width:100%;position:relative;overflow:hidden;}.css-4okk7a .katex .halfarrow-left{position:absolute;left:0;width:50.2%;overflow:hidden;}.css-4okk7a .katex .halfarrow-right{position:absolute;right:0;width:50.2%;overflow:hidden;}.css-4okk7a .katex .brace-left{position:absolute;left:0;width:25.1%;overflow:hidden;}.css-4okk7a .katex .brace-center{position:absolute;left:25%;width:50%;overflow:hidden;}.css-4okk7a .katex .brace-right{position:absolute;right:0;width:25.1%;overflow:hidden;}.css-4okk7a .katex .x-arrow-pad{padding:0 0.5em;}.css-4okk7a .katex .cd-arrow-pad{padding:0 0.55556em 0 0.27778em;}.css-4okk7a .katex .x-arrow,.css-4okk7a .katex .mover,.css-4okk7a .katex .munder{text-align:center;}.css-4okk7a .katex .boxpad{padding:0 0.3em 0 0.3em;}.css-4okk7a .katex .fbox,.css-4okk7a .katex .fcolorbox{box-sizing:border-box;border:0.04em solid;}.css-4okk7a .katex .cancel-pad{padding:0 0.2em 0 0.2em;}.css-4okk7a .katex .cancel-lap{margin-left:-0.2em;margin-right:-0.2em;}.css-4okk7a .katex .sout{border-bottom-style:solid;border-bottom-width:0.08em;}.css-4okk7a .katex .angl{box-sizing:border-box;border-top:0.049em solid;border-right:0.049em solid;margin-right:0.03889em;}.css-4okk7a .katex .anglpad{padding:0 0.03889em 0 0.03889em;}.css-4okk7a .katex .eqn-num::before{counter-increment:katexEqnNo;content:'(' counter(katexEqnNo) ')';}.css-4okk7a .katex .mml-eqn-num::before{counter-increment:mmlEqnNo;content:'(' counter(mmlEqnNo) ')';}.css-4okk7a .katex .mtr-glue{width:50%;}.css-4okk7a .katex .cd-vert-arrow{display:inline-block;position:relative;}.css-4okk7a .katex .cd-label-left{display:inline-block;position:absolute;right:calc(50% + 0.3em);text-align:left;}.css-4okk7a .katex .cd-label-right{display:inline-block;position:absolute;left:calc(50% + 0.3em);text-align:right;}.css-4okk7a .katex-display{display:block;margin:1em 0;text-align:center;}.css-4okk7a .katex-display>.katex{display:block;white-space:nowrap;}.css-4okk7a .katex-display>.katex>.katex-html{display:block;position:relative;}.css-4okk7a .katex-display>.katex>.katex-html>.tag{position:absolute;right:0;}.css-4okk7a .katex-display.leqno>.katex>.katex-html>.tag{left:0;right:auto;}.css-4okk7a .katex-display.fleqn>.katex{text-align:left;padding-left:2em;}.css-4okk7a body{counter-reset:katexEqnNo mmlEqnNo;}.css-4okk7a table{width:-webkit-max-content;width:-moz-max-content;width:max-content;}.css-4okk7a .tableBlock{max-width:100%;margin-bottom:1rem;overflow-y:scroll;}.css-4okk7a .tableBlock thead,.css-4okk7a .tableBlock thead th{border-bottom:1px solid #333!important;}.css-4okk7a .tableBlock th,.css-4okk7a .tableBlock td{padding:10px;text-align:left;}.css-4okk7a .tableBlock th{font-weight:bold!important;}.css-4okk7a .tableBlock caption{caption-side:bottom;color:#555;font-size:12px;font-style:italic;text-align:center;}.css-4okk7a .tableBlock caption>p{margin:0;}.css-4okk7a .tableBlock th>p,.css-4okk7a .tableBlock td>p{margin:0;}.css-4okk7a .tableBlock [data-background-color='aliceblue']{background-color:#f0f8ff;color:#000;}.css-4okk7a .tableBlock [data-background-color='black']{background-color:#000;color:#fff;}.css-4okk7a .tableBlock [data-background-color='chocolate']{background-color:#d2691e;color:#fff;}.css-4okk7a .tableBlock [data-background-color='cornflowerblue']{background-color:#6495ed;color:#fff;}.css-4okk7a .tableBlock [data-background-color='crimson']{background-color:#dc143c;color:#fff;}.css-4okk7a .tableBlock [data-background-color='darkblue']{background-color:#00008b;color:#fff;}.css-4okk7a .tableBlock [data-background-color='darkseagreen']{background-color:#8fbc8f;color:#000;}.css-4okk7a .tableBlock [data-background-color='deepskyblue']{background-color:#00bfff;color:#000;}.css-4okk7a .tableBlock [data-background-color='gainsboro']{background-color:#dcdcdc;color:#000;}.css-4okk7a .tableBlock [data-background-color='grey']{background-color:#808080;color:#fff;}.css-4okk7a .tableBlock [data-background-color='lemonchiffon']{background-color:#fffacd;color:#000;}.css-4okk7a .tableBlock [data-background-color='lightpink']{background-color:#ffb6c1;color:#000;}.css-4okk7a .tableBlock [data-background-color='lightsalmon']{background-color:#ffa07a;color:#000;}.css-4okk7a .tableBlock [data-background-color='lightskyblue']{background-color:#87cefa;color:#000;}.css-4okk7a .tableBlock [data-background-color='mediumblue']{background-color:#0000cd;color:#fff;}.css-4okk7a .tableBlock [data-background-color='omnigrey']{background-color:#f0f0f0;color:#000;}.css-4okk7a .tableBlock [data-background-color='white']{background-color:#fff;color:#000;}.css-4okk7a .tableBlock [data-text-align='center']{text-align:center;}.css-4okk7a .tableBlock [data-text-align='left']{text-align:left;}.css-4okk7a .tableBlock [data-text-align='right']{text-align:right;}.css-4okk7a .tableBlock [data-vertical-align='bottom']{vertical-align:bottom;}.css-4okk7a .tableBlock [data-vertical-align='middle']{vertical-align:middle;}.css-4okk7a .tableBlock [data-vertical-align='top']{vertical-align:top;}.css-4okk7a .tableBlock__font-size--xxsmall{font-size:10px;}.css-4okk7a .tableBlock__font-size--xsmall{font-size:12px;}.css-4okk7a .tableBlock__font-size--small{font-size:14px;}.css-4okk7a .tableBlock__font-size--large{font-size:18px;}.css-4okk7a .tableBlock__border--some tbody tr:not(:last-child){border-bottom:1px solid #e2e5e7;}.css-4okk7a .tableBlock__border--bordered td,.css-4okk7a .tableBlock__border--bordered th{border:1px solid #e2e5e7;}.css-4okk7a .tableBlock__border--borderless tbody+tbody,.css-4okk7a .tableBlock__border--borderless td,.css-4okk7a .tableBlock__border--borderless th,.css-4okk7a .tableBlock__border--borderless tr,.css-4okk7a .tableBlock__border--borderless thead,.css-4okk7a .tableBlock__border--borderless thead th{border:0!important;}.css-4okk7a .tableBlock:not(.tableBlock__table-striped) tbody tr{background-color:unset!important;}.css-4okk7a .tableBlock__table-striped tbody tr:nth-of-type(odd){background-color:#f9fafc!important;}.css-4okk7a .tableBlock__table-compactl th,.css-4okk7a .tableBlock__table-compact td{padding:3px!important;}.css-4okk7a .tableBlock__full-size{width:100%;}.css-4okk7a .textBlock{margin-bottom:16px;}.css-4okk7a .textBlock__text-formatting--finePrint{font-size:12px;}.css-4okk7a .textBlock__text-infoBox{padding:0.75rem 1.25rem;margin-bottom:1rem;border:1px solid transparent;border-radius:0.25rem;}.css-4okk7a .textBlock__text-infoBox p{margin:0;}.css-4okk7a .textBlock__text-infoBox--primary{background-color:#cce5ff;border-color:#b8daff;color:#004085;}.css-4okk7a .textBlock__text-infoBox--secondary{background-color:#e2e3e5;border-color:#d6d8db;color:#383d41;}.css-4okk7a .textBlock__text-infoBox--success{background-color:#d4edda;border-color:#c3e6cb;color:#155724;}.css-4okk7a .textBlock__text-infoBox--danger{background-color:#f8d7da;border-color:#f5c6cb;color:#721c24;}.css-4okk7a .textBlock__text-infoBox--warning{background-color:#fff3cd;border-color:#ffeeba;color:#856404;}.css-4okk7a .textBlock__text-infoBox--info{background-color:#d1ecf1;border-color:#bee5eb;color:#0c5460;}.css-4okk7a .textBlock__text-infoBox--dark{background-color:#d6d8d9;border-color:#c6c8ca;color:#1b1e21;}.css-4okk7a .text-overline{-webkit-text-decoration:overline;text-decoration:overline;}.css-4okk7a.css-4okk7a{color:#2B3148;background-color:transparent;font-family:"Roboto","Helvetica","Arial",sans-serif;font-size:20px;line-height:24px;overflow:visible;padding-top:0px;position:relative;}.css-4okk7a.css-4okk7a:after{content:'';-webkit-transform:scale(0);-moz-transform:scale(0);-ms-transform:scale(0);transform:scale(0);position:absolute;border:2px solid #EA9430;border-radius:2px;inset:-8px;z-index:1;}.css-4okk7a .js-external-link-button.link-like,.css-4okk7a .js-external-link-anchor{color:inherit;border-radius:1px;-webkit-text-decoration:underline;text-decoration:underline;}.css-4okk7a .js-external-link-button.link-like:hover,.css-4okk7a .js-external-link-anchor:hover,.css-4okk7a .js-external-link-button.link-like:active,.css-4okk7a .js-external-link-anchor:active{text-decoration-thickness:2px;text-shadow:1px 0 0;}.css-4okk7a .js-external-link-button.link-like:focus-visible,.css-4okk7a .js-external-link-anchor:focus-visible{outline:transparent 2px dotted;box-shadow:0 0 0 2px #6314E6;}.css-4okk7a p,.css-4okk7a div{margin:0px;display:block;}.css-4okk7a pre{margin:0px;display:block;}.css-4okk7a pre code{display:block;width:-webkit-fit-content;width:-moz-fit-content;width:fit-content;}.css-4okk7a pre:not(:first-child){padding-top:8px;}.css-4okk7a ul,.css-4okk7a ol{display:block margin:0px;padding-left:20px;}.css-4okk7a ul li,.css-4okk7a ol li{padding-top:8px;}.css-4okk7a ul ul,.css-4okk7a ol ul,.css-4okk7a ul ol,.css-4okk7a ol ol{padding-top:0px;}.css-4okk7a ul:not(:first-child),.css-4okk7a ol:not(:first-child){padding-top:4px;} Test setup

Choose test type

t-test for the population mean, μ, based on one independent sample . Null hypothesis H 0 : μ = μ 0  

Alternative hypothesis H 1

Test details

Significance level α

The probability that we reject a true H 0 (type I error).

Degrees of freedom

Calculated as sample size minus one.

Test results

Thank you for visiting nature.com. You are using a browser version with limited support for CSS. To obtain the best experience, we recommend you use a more up to date browser (or turn off compatibility mode in Internet Explorer). In the meantime, to ensure continued support, we are displaying the site without styles and JavaScript.

  • View all journals
  • My Account Login
  • Explore content
  • About the journal
  • Publish with us
  • Sign up for alerts
  • Open access
  • Published: 23 May 2024

Performance in myoelectric pattern recognition improves with transcranial direct current stimulation

  • Shahrzad Damercheli 1 , 2 ,
  • Kelly Morrenhof 1 , 2 ,
  • Kirstin Ahmed 1 , 2 &
  • Max Ortiz-Catalan   ORCID: orcid.org/0000-0002-6084-3865 1 , 2 , 3 , 4 , 5 , 6  

Scientific Reports volume  14 , Article number:  11744 ( 2024 ) Cite this article

197 Accesses

2 Altmetric

Metrics details

  • Biomedical engineering
  • Electrical and electronic engineering
  • Translational research

Sensorimotor impairments, resulting from conditions like stroke and amputations, can profoundly impact an individual’s functional abilities and overall quality of life. Assistive and rehabilitation devices such as prostheses, exo-skeletons, and serious gaming in virtual environments can help to restore some degree of function and alleviate pain after sensorimotor impairments. Myoelectric pattern recognition (MPR) has gained popularity in the past decades as it provides superior control over said devices, and therefore efforts to facilitate and improve performance in MPR can result in better rehabilitation outcomes. One possibility to enhance MPR is to employ transcranial direct current stimulation (tDCS) to facilitate motor learning. Twelve healthy able-bodied individuals participated in this crossover study to determine the effect of tDCS on MPR performance. Baseline training was followed by two sessions of either sham or anodal tDCS using the dominant and non-dominant arms. Assignments were randomized, and the MPR task consisted of 11 different hand/wrist movements, including rest or no movement. Surface electrodes were used to record EMG and the MPR open-source platform, BioPatRec, was used for decoding motor volition in real-time. The motion test was used to evaluate performance. We hypothesized that using anodal tDCS to increase the excitability of the primary motor cortex associated with non-dominant side in able-bodied individuals, will improve motor learning and thus MPR performance. Overall, we found that tDCS enhanced MPR performance, particularly in the non-dominant side. We were able to reject the null hypothesis and improvements in the motion test’s completion rate during tDCS (28% change, p-value: 0.023) indicate its potential as an adjunctive tool to enhance MPR and motor learning. tDCS appears promising as a tool to enhance the learning phase of using assistive devices using MPR, such as myoelectric prostheses.

Similar content being viewed by others

how to find p value hypothesis test

Non-invasive spinal cord electrical stimulation for arm and hand function in chronic tetraplegia: a safety and efficacy trial

how to find p value hypothesis test

Representation of internal speech by single neurons in human supramarginal gyrus

how to find p value hypothesis test

Walking naturally after spinal cord injury using a brain–spine interface

Introduction.

Sensorimotor impairments, such as those arising from stroke and amputations, have significant effects on functional capabilities and quality of life. In one study (n = 94,905), 21% of stroke patients experienced motor impairment alone and in total, 82% experienced motor impairment in addition to sensory, cognitive, or sensory and cognitive impairment 1 . The most prevalent impairment following stroke is a contralateral upper limb hemiparesis affecting > 80% of patients during the acute phase and > 40% in the chronic phase 2 .

Amputations are one of the most extreme cases of sensorimotor impairments, resulting in reduced quality of life, and often unpleasant sensations and pain in the phantom limb 3 , 4 . Although there is no perfect solution for these impairments, advanced prosthetic systems can restore a certain degree of function after an amputation 5 , 6 , 7 , and phantom limb pain (PLP) can be alleviated by guided plasticity approaches such mirror therapy 8 , 9 , 10 , 11 , graded motor imagery 12 , 13 , and phantom motor execution 14 , 15 . Similar approaches have also been used for the rehabilitation of stroke patients 2 .

One method to increase prosthetic limb control is myoelectric pattern recognition (MPR) which decodes patterns of muscle activity in the residual muscles of amputated limbs 16 , 17 . Embedded systems running MPR are able to infer movement intention using machine learning algorithms to control prosthetic devices 18 . This provides a more intuitive control of the prosthetic limb compared to simpler one-to-one electromyography (EMG) strategies (one EMG signal to drive one prosthetic function). In addition, MPR has been employed for treatment of PLP in individuals with amputation 14 , 19 , and functional restoration in patients after stroke. One way to increase prosthetic control performance using MPR is using targeted rehabilitation that includes repetitive exercise and prolonged training. Research indicates that this also increases prosthetic use 20 , 21 , however, these methods are often time-consuming and can lead to user frustration and possibly abandonment of the prosthetic device.

Treatments that are used in stroke recovery include non-invasive brain stimulation techniques which modulate brain activity by introducing external stimuli 22 . Examples include transcranial electrical stimulation (tES), such as transcranial direct current stimulation (tDCS), and repetitive transcranial magnetic stimulation (rTMS). These techniques enhance or inhibit neuronal activity 23 and consequently, alter sensorimotor and cognitive functions 24 , 25 . tES changes the cell membrane potential and thus modulate the spontaneous firing rate 22 , which has been shown to increase motor learning 26 and the recovery of motor dysfunction 27 . A meta-analysis by Bai et al., concluded that tDCS is effective for stroke recovery of patients with motor dysfunction 27 . Furthermore tDCS has been used in the rehabilitation of several additional neurological disorders such as depression, anxiety, and schizophrenia 24 , 25 , 28 , as well as for the relief of PLP 29 .

In a small number of individuals with unilateral upper limb amputation, tDCS has been shown to improve the ability of the subjects to produce distinct EMG signals, which is useful for MPR applied to the control of prosthetic limbs 30 . Similarly, tDCS has shown promising results on hand performance in able-bodied individuals using the Jebsen Taylor hand function test (JTHFT). In one study, both the dominant and non-dominant primary motor cortex were targeted resulting in improved motor function in the non-dominant hand following modulation of the corresponding primary motor cortex 31 . The effect of tDCS seems to be observable in the affected or non-dominant side, however, the evidence is limited and replication by independent groups has not been conducted.

We undertake research in the field of prosthetic limbs 5 , 6 , rehabilitation of PLP 14 , 29 , and stroke 32 which would all benefit from validating the efficacy of tDCS to improve MPR. Here, we evaluated the hypothesis that tDCS improves MPR in the non-dominant hand of able-bodied individuals in a cross-over, sham-controlled study. In addition, we also assess the effect of learning and the application of tDCS in the dominant arm. We used the completion rate of the motion test 33 as a measure of the participants ability to accomplish a motor task using MPR. In addition, we also measured the time in which each task was completed (completion time), and the reliability of decoding (accuracy).

Twelve healthy able-bodied individuals conducted baseline training followed by two sessions of either sham or anodal tDCS on the primary motor cortex of the dominant and non-dominant arms, separately. Assignments were randomized, and the MPR task involved 11 different hand/wrist movements, including rest or no movement. Surface electrodes recorded EMG signals, and real-time motor volition decoding was performed using the open-source platform BioPatRec 17 . Performance evaluation was conducted using the motion test as implanted in BioPatRec.

Participants

In total, 12 able-bodied individuals participated in this study, of which six were females and six were males ranging from 23–33 (27 ± 3.6) years old. Eleven participants were right-handed, one participant was left-handed. All study participants successfully completed all sessions, demonstrating good tolerance to the intervention. Upon initiation of anodal tDCS, all participants experienced a tingling sensation underneath the anode electrode of tDCS, attributed to the electrical current passing through the skin and underlying tissues 34 . One participant reported a side effect characterized by redness underneath the cathode electrode. This was a singular occurrence and did not necessitate termination of their participation in the study.

Statistical analysis

Due to the non-normal distribution of the data (tested by Shapiro–Wilkes and Kolmogorov) and the limited sample size, the non-parametric Wilcoxon Signed ranks test (WRST) (p-value < 0.05, two-tailed) was selected to investigate the significance of tDCS on completion rate, completion time, and accuracy (Table 1 ). The null hypothesis for this study was that there would be no significant changes in the completion rate before and after tDCS application. The analysis was conducted on data collected within each session, including baseline, sham stimulation, and active stimulation. Additionally, similar analyses were carried out for sub-groups based on dominant and non-dominant sides. Furthermore, we found no statistically significant differences between the participants performance after baseline and before active and sham sessions (Table 2 ). All participants completed three trials of the motion test, except for one participant who performed two trials during all visits and another participant who did only two trials during the baseline visit. Statistical results are presented in Table 1 and Table 2 , and the distribution of the completion rate, completion time and accuracy in Figs. 1 , 2 , and 3 respectively.

figure 1

Displays the distribution of completion rate. Top: distribution of completion rate for combined movements of the ‘non-dominant side’. Middle and bottom plot: distributions of the completion rate for the ‘dominant sides’ and ‘combined dominant and non-dominant sides’, respectively. Columns 1 and 2 correspond to the baseline session without stimulation, before and after the break, respectively. Columns 3 and 4 represent the active tDCS session, column 3 depicting the completion rate before active stimulation and column 4 after. Columns 5 and 6 represent the sham tDCS session. An asterisk indicates a statistically significant difference between the two neighboring columns.

figure 2

Displays the distribution of completion time. Top: distribution of completion time for combined movements of the ‘non-dominant side’. Middle and bottom plot: distributions of the completion time for the ‘dominant sides’ and ‘combined dominant and non-dominant sides’, respectively. Columns 1 and 2 correspond to the baseline session without stimulation, before and after the break, respectively. Columns 3 and 4 represent the active tDCS session, column 3 depicting the completion time before active stimulation and column 4 after. Columns 5 and 6 represent the sham tDCS session. An asterisk indicates a statistically significant difference between the two neighboring columns.

figure 3

Displays the distribution of accuracy. Top: distribution of accuracy for combined movements of the ‘non-dominant side’. Middle and bottom plot: distributions of the accuracy for the ‘dominant sides’ and ‘combined dominant and non-dominant sides’, respectively. Columns 1 and 2 correspond to the baseline session without stimulation, before and after the break, respectively. Columns 3 and 4 represent the active tDCS session, column 3 depicting the accuracy before active stimulation and column 4 after. Columns 5 and 6 represent the sham tDCS session. An asterisk indicates a statistically significant difference between the two neighboring columns.

In the baseline session, statistically significant improvements were observed in the completion rate (28% change, p-value: 0.0029), completion time (14% change, p-value: 0.0125), and accuracy (66% change, p-value: 0.0036). However, during the sham stimulation session, no significant change was detected in any of the MPR parameters. In contrast, the active stimulation session showed a significant improvement of 33% in the completion rate (p-value: 0.0126), but no significant change was detected in either completion time or accuracy.

Subgroup Analysis

Non-dominant side.

In the baseline session, a statistically significant improvement of 37% was observed in the completion rate (p-value: 0.0160). While, during the sham stimulation session, no statistically significant improvement was detected in any of the MPR parameters. In contrast, the active stimulation session showed a statistically significant improvement of 27% in the completion rate (p-value: 0.0236).

Dominant side

In the baseline session, statistically significant improvements were observed in the completion time (16% change, p-value: 0.0205), and accuracy (35% change, p-value: 0.0171), and substantial but not statistically significant improvement in completion rate (20% change, p-value: 0.0708). However, during the sham and active stimulation sessions, no statistically significant improvement was detected in any of the MPR parameters.

This study investigated the effect on the performance of MPR of a single session of anodal tDCS on the primary motor cortex. Testing on the non-dominant primary motor cortex assumes that disparities in the use of the dominant and non-dominant hands can imitate the differences observed between affected and intact hands in individuals with amputations, and the paretic and non-paretic hands in stroke patients. The relatively reduced dexterity of the non-dominant hand, as extensively demonstrated in the literature, stems from its asymmetric usage compared to the dominant hand 35 . Our findings showed a statistically significant improvement in the completion rate of the non-dominant side during the active tDCS day, but not during the sham tDCS day (27%, p = 0.02).

Our results demonstrated a statistically significant improvement in completion rate during the baseline training, but no further improvement during the sham session, which indicates that most learning could have taken place during the baseline training. Since there was no significant improvement in completion rate in the sham tDCS session, we can assume improvements during the active tDCS session are the result of motor excitability 26 , 36 of the non-dominant motor cortex by anodal tDCS. Our findings are in line with previous work on MPR for prosthetics 30 and motor performance in non-disable subjects 31 and stroke patients 37 .

It is worth noting that the largest improvements were observed in the baseline session in which the participants were exposed to the MPR task for the first time (better performance in all metrics). This is not surprising and illustrates that the learning effect of practicing a task for the first time is higher than what is possible to achieve in a later single session of training with neuromodulation. Chronic neuromodulation will likely improve the MPR task further, and potentially show a change in all metrics as opposed to completion rate only. Completion rate relates to the ability of the participant to accomplish a task, whereas completion time and accuracy relate to how effectively it is achieved. Larger improvement in motor control will impact all three metrics, and whether this can be achieved faster with chronic neuromodulation has yet to be investigated.

On the dominant hand all three MPR metrics generated a non-negligible improvement during the baseline training, suggesting that skilled learning ceiling may have been achieved 26 . No further improvement was observed in either sham or active tDCS sessions. In a study by, Pan et al., decreased performance was observed in the unaffected arm in participants with unilateral amputation 30 .

Our findings support the previous literature and suggest that tDCS can enhance MPR performance, particularly in the non-dominant side. The improvements in the MPR we observed indicate the potential of tDCS as an adjunctive tool to enhance motor learning and performance in MPR. Our findings on able bodied participants are supported by the literature in other populations. For example Cho et al . , compared the effect of anodal tDCS over primary motor cortex combined with active or sham Mirror Therapy, and concluded that tDCS plus Mirror Therapy has a positive effect on the functional recovery of upper extremity in stroke patients 38 . Other works have compared the effects of active tDCS with sham tDCS on functional motor and somatosensory functions in acute stroke patients and demonstrated significant improvements using active tDCS (measured by Wolf motor function test and the Semmes Weinstein monofilament test) 39 .

The clinical significance of improving the learning phase in individuals using MPR controlled prosthetic limbs is faster rehabilitation time and potentially better functional outcomes with a reduced prosthetic abandonment rate. In addition, with respect to clinicians, the significance of using tDCS could be reduced consultation time. Moreover, the effectiveness and efficiency of MPR-based PLP treatments, such as Phantom motor execution, can be enhanced 29 , 40 , 41 .

We conducted our study on able-bodied participants, the next step is to translate this method to relevant populations such as stroke patients and individuals with limb loss, in prospective investigations with larger sample sizes and patient matched populations. This was an acute application of tDCS and the chronic effects should be further investigated.

Restoration of function and rehabilitation of pain are of considerable importance after traumatic injuries and stroke. Decoding of motor volition via MPR is a promising tool that is now being used in different assistive and rehabilitation devices, and here we have provided further evidence that tDCS can facilitate MPR and thus potentially improve the clinical outcomes of patients using MPR. Further prospective investigations with larger sample sizes and matched patient populations ( e.g., limb loss or stroke) are necessary to produce higher quality evidence supporting this approach.

The study was approved by the governing ethical committee in Sweden (approval number 2022-00883-02) and was performed in accordance with declaration of Helsinki and the relevant guidelines and regulations. The study was conducted using a double-blinded, randomized and sham-controlled design, wherein participants were unaware of the placebo control. Twelve healthy, able-bodied participants without prior tDCS experience were recruited in this study. When laterality was unclear, the Waterloo handedness questionnaire 42 was used to assess the dominant side of the participant. Those with a history of neurological disorders or contradictions to tDCS were excluded from the study 34 . All participants received detailed information and provided written informed consent, including the consent to publish Fig.  6 .

Study design

The study consisted of three sessions, each conducted on separate days, with 48 h intervals in between sessions to wash out potential carryover effects of tDCS. Participant were randomized to either dominant or non-dominant hand, and for sham or active tDCS, leading to four different groups. The study design is illustrated in Fig.  4 .

figure 4

Presents an overview of the study design. Each session comprises two rounds of training involving both hands, with a break in between for session one. Sessions two and three receive either sham or active anodal tDCS between the two round of training.

On the first day (baseline session), the participant was familiarized with the experimental setup and procedures. The first session consisted of two rounds of motor training with MPR on both sides, with a break of 20 min in between. On day two and three, participants were exposed to either sham or active tDCS in between the two rounds of motor training. All training sessions for a participant began with the same hand.

Experimental setup

A total of eight bipolar self-adhesive electrodes, along with two reference Ag/AgCl electrodes, were positioned on both arms to record the EMG signals. The electrode diameter was 1 cm with an inter-electrode distance of approximately 2 cm. There were four bipolar electrodes per arm placed on the extensor carpi ulnaris, flexor carpi ulnaris, extensor carpi radialis, flexor carpi radialis muscles, and one reference electrode on the elbow. The positioning of the bipolar electrodes can be seen in Fig.  5 and was based on the orientation of the aforementioned muscles in wrist flexion/extension, elbow flexion/extension, wrist pro/supination and hand open/close movements, as described by previous work using BioPatRec, an open source platform for MPR 15 .

figure 5

Surface electrode placement on one arm.

Working with one arm at a time, the electrodes were connected to an amplifier (ADS_BP4 43 ) with embedded active filtering (high pass filter at 20 Hz and a low pass filter at 500 Hz) across four channels. The signals were amplified with a gain of 12 and sampled at 1000 Hz.

The setup of the motor training and the motion test can be seen in Fig.  6 A. To stabilize the lower arm, it was positioned in a lower arm rest, supporting the elbow and wrist but enabling enough range to complete the movements. The quality of the EMG signals was assessed by conducting a short myoelectric recording of flexion, extension, open/close movements, and rest. All training was conducted within the BioPatRec environment 17 .

figure 6

Setup of motor training ( A ), tDCS ( B ) and consumables and electrodes( C ).

The motor training protocol involved movement recording and testing of 10 movements in the following order: hand open, hand close, hand flex, hand extend, supination, pronation, fine grip, side grip, thumb up, pointing with index finger, and state of no movement/relaxation. Each movement was recorded with one dummy repetition and three repetitions lasting for 3 s, with a three-second rest interval in between. After the 10 movements were recorded, four key signal features were extracted (mean absolute value, wavelength, zero crossings, and slope changes) to create the features vectors to train the classifier/decoder (linear discriminant analysis—LDA). LDA has been shown to be a successful decoder for this particular tasks, as outlined in the BioPatRec article 17 . The movement recording was immediately followed by a motion test, where participants were requested to perform all the trained movements in a randomized order: three trials of three repetitions per movement with 5 s time out. The motion test captured the executed movements to examine the performance of MPR in terms of completion rate, completion time, and real-time accuracy as described in the BioPatRec article 17 . After completing the movement recording and motion test on one hand, the process was replicated on the other hand, constituting a singular training round.

Following the completion of the first training round, a 20 min rest on day one was allocated before conducting the second round of the day (starting with the same hand). In days two and three, the break included 20 min of shame or active tDCS. The anodal tDCS (Fig.  6 B) was applied on the non-dominant side, with an anode placed on the contralateral primary motor cortex (C4/C3) and the cathode over the ipsilateral prefrontal cortex (FP1/FP2), using a commercially available system (Starstim ® tES-EEG system). Through saline soaked 25 cm 2 round sponge electrodes, a current of 1 mA was applied for a duration of 20 min, in between a 30 s ramp up- and down for the active tDCS. The ramp up and down happened both in the beginning and end in the sham tDCS, blinding the participant for the actual stimulation status. The baseline training session ended with marking the placement of the electrodes with a skin-friendly marker, to ensure that the electrodes were placed at exactly the same position on day two and three.

Data availability

The data that support the findings of this study are available upon reasonable request.

Gittins, M. et al. Stroke impairment categories: A new way to classify the effects of stroke based on stroke-related impairments. Clin. Rehabil. 35 , 446–458 (2021).

Article   PubMed   Google Scholar  

Hatem, S. M. et al. Rehabilitation of motor function after stroke: A multiple systematic review focused on techniques to stimulate upper extremity recovery. Front. Hum. Neurosci. 10 , 1–22 (2016).

Article   Google Scholar  

Morgan, S. J., Friedly, J. L., Amtmann, D., Salem, R. & Hefner, B. J. A cross-sectional assessment of factors related to pain intensity and pain interference in lower limb prosthesis users. Physiol. Behav. 98 , 105–113 (2018).

Google Scholar  

Ehde, D. M. et al. Chronic phantom sensations, phantom pain, residual limb pain, and other regional pain after lower limb amputation. Arch. Phys. Med. Rehabil. 81 , 1039–1044 (2000).

Article   CAS   PubMed   Google Scholar  

Zbinden, J. et al. Improved control of a prosthetic limb by surgically creating electro-neuromuscular constructs with implanted electrodes. Sci. Transl. Med. 15 , eabq3665 (2023).

Ortiz Catalan, M., Hakansson, B. & Branemark, R. An osseointegrated human-machine gateway for long-term sensory feedback and motor control of artificial limbs. Sci. Transl. Med. 6 , 257re6 (2014).

Ortiz-Catalan, M., Mastinu, E., Sassu, P., Aszmann, O. & Brånemark, R. Self-contained neuromusculoskeletal arm prostheses. N. Engl. J. Med. 382 , 1732–1738 (2020).

Chan, B. L. et al. Mirror therapy for phantom limb pain. N. Engl. J. Med. https://doi.org/10.3344/kjp.2012.25.4.272 (2007).

Article   ADS   PubMed   Google Scholar  

Ol, H. S., Van Heng, Y., Danielsson, L. & Husum, H. Mirror therapy for phantom limb and stump pain: A randomized controlled clinical trial in landmine amputees in Cambodia. Scand. J. Pain 18 , 603–610 (2018).

Finn, S. B. et al. A randomized, controlled trial of mirror therapy for upper extremity phantom limb pain in male amputees. Front. Neurol. 8 , 1–7 (2017).

Article   ADS   Google Scholar  

Rothgangel, A., Braun, S., Winkens, B., Beurskens, A. & Smeets, R. Traditional and augmented reality mirror therapy for patients with chronic phantom limb pain (PACT study): Results of a three-group, multicentre single-blind randomized controlled trial. Clin. Rehabil. 32 , 1591–1608 (2018).

Moseley, G. L. Graded motor imagery for pathologic pain: A randomized controlled trial. Neurology 67 , 2129–2134 (2006).

Limakatso, K., Madden, V. J., Manie, S. & Parker, R. The effectiveness of graded motor imagery for reducing phantom limb pain in amputees: A randomised controlled trial. Physiotherapy 109 , 65–74 (2020).

Ortiz-Catalan, M. et al. Phantom motor execution facilitated by machine learning and augmented reality as treatment for phantom limb pain: A single group, clinical trial in patients with chronic intractable phantom limb pain. Lancet 388 , 2885–2894 (2016).

Ortiz-Catalan, M., Sander, N., Kristoffersen, M. B., Håkansson, B. & Brånemark, R. Treatment of phantom limb pain (PLP) based on augmented reality and gaming controlled by myoelectric pattern recognition: A case study of a chronic PLP patient. Front. Neurosci. 8 , 1–7 (2014).

Scheme, E. & Englehart, K. Electromyogram pattern recognition for control of powered upper-limb prostheses: State of the art and challenges for clinical use. J. Rehabil. Res. Dev. 48 , 643–660 (2011).

Ortiz-Catalan, M., Brånemark, R. & Håkansson, B. BioPatRec: A modular research platform for the control of artificial limbs based on pattern recognition algorithms. Source Code Biol. Med. 8 , 1–18 (2013).

Mastinu, E., Doguet, P., Botquin, Y., Hakansson, B. & Ortiz-Catalan, M. Embedded system for prosthetic control using implanted neuromuscular interfaces accessed via an osseointegrated implant. IEEE Trans. Biomed. Circuits Syst. 11 , 867–877 (2017).

Ortiz-Catalan, M. The stochastic entanglement and phantom motor execution hypotheses: A theoretical framework for the origin and treatment of Phantom limb pain. Front. Neurol. 9 , 1–16 (2018).

Kato, R., Fujita, T., Yokoi, H. & Arai, T. Adaptable EMG prosthetic hand using on-line learning method. In The 15th IEEE International Symposium Robot Human Interactive Communivation 599–604 (IEEE, 2006).

Powell, M. A., Kaliki, R. R. & Thakor, N. V. User training for pattern recognition-based myoelectric prostheses: Improving phantom limb movement consistency and distinguishability. IEEE Trans. Neural Syst. Rehabil. Eng. 22 , 522–532 (2014).

Stagg, C. J. & Nitsche, M. A. Physiological basis of transcranial direct current stimulation. Neuroscientist 17 , 37–53 (2011).

Thair, H., Holloway, A. L., Newport, R. & Smith, A. D. Transcranial direct current stimulation (tDCS): A Beginner’s guide for design and implementation. Front. Neurosci. 11 , 1–13 (2017).

Nitsche, M. A. et al. Transcranial direct current stimulation: State of the art 2008. Brain Stimul. https://doi.org/10.1016/j.brs.2008.06.004 (2008).

Lefaucheur, J. P. A comprehensive database of published tDCS clinical trials (2005–2016). Neurophysiol. Clin. 46 , 319–398 (2016).

Orban de Xivry, J. J. & Shadmehr, R. Electrifying the motor engram: Effects of tDCS on motor learning and control. Exp. Brain Res. 232 , 3379–3395 (2014).

Bai, X. et al . Different therapeutic effects of transcranial direct current stimulation on upper and lower limb recovery of stroke patients with motor dysfunction: A meta-analysis. Neural Plast. 2019 , 1372138. https://doi.org/10.1155/2019/1372138 (2019).

Bolognini, N. et al. Immediate and sustained effects of 5-day transcranial direct current stimulation of the motor cortex in phantom limb pain. J. Pain 16 , 657–665 (2015).

Damercheli, S., Ramne, M. & Ortiz-Catalan, M. Transcranial direct current stimulation (tDCS) for the treatment and investigation of phantom limb pain (PLP). Psychoradiology 2 , 23–31 (2022).

Article   PubMed   PubMed Central   Google Scholar  

Pan, L., Zhang, D., Sheng, X. & Zhu, X. Improving myoelectric control for amputees through transcranial direct current stimulation. IEEE Trans. Biomed. Eng. 62 , 1927–1936 (2015).

Boggio, P. S. et al. Enhancement of non-dominant hand motor function by anodal transcranial direct current stimulation. Neurosci. Lett. 404 , 232–236 (2006).

Munoz-Novoa, M. et al. Upper limb stroke rehabilitation using surface electromyography: A systematic review and meta-analysis. Front. Hum. Neurosci. 16 , 897870 (2022).

Kuiken, T. A. et al. Targeted muscle reinnervation for real-time myoelectric control of multifunction artificial arms. Jama 301 , 619–628 (2009).

Article   CAS   PubMed   PubMed Central   Google Scholar  

Antal, A. et al. Low intensity transcranial electric stimulation: Safety, ethical, legal regulatory and application guidelines. Clin. Neurophysiol. 128 , 1774–1809 (2017).

De Gennaro, L. et al. Handedness is mainly associated with an asymmetry of corticospinal excitability and not of transcallosal inhibition. Clin. Neurophysiol. 115 , 1305–1312 (2004).

Nitsche, M. A. & Paulus, W. Excitability changes induced in the human motor cortex by weak transcranial direct current stimulation. J. Physiol. 527 , 633–639 (2000).

Tedla, J. S. et al. Transcranial direct current stimulation (tDCS) effects on upper limb motor function in stroke: An overview review of the systematic reviews. Brain Inj. 37 , 122–133 (2023).

Cho, H. S. & Cha, H. G. Effect of mirror therapy with tDCS on functional recovery of the upper extremity of stroke patients. J. Phys. Ther. Sci. 27 , 1045–1047 (2015).

Bornheim, S., Croisier, J. L., Maquet, P. & Kaux, J. F. Transcranial direct current stimulation associated with physical-therapy in acute stroke patients—A randomized, triple blind, sham-controlled study. Brain Stimul. 13 , 329–336 (2020).

Ortiz-Catalan, M. The stochastic entanglement and phantom motor execution hypotheses: A theoretical framework for the origin and treatment of Phantom limb pain. Front. Neurol. 9 , 369181 (2018).

Damercheli, S., Buist, M. & Ortiz-Catalan, M. Mindful sensorimotor therapy combined with brain modulation for the treatment of pain in individuals with disarticulation or nerve injuries: A single-arm clinical trial. BMJ Open 13 , e059348 (2023).

Steenhuis, R. E., Bryden, M. P., Schwartz, M. & Lawson, S. Reliability of hand preference items and factors. J. Clin. Exp. Neuropsychol. 12 , 921–930 (1990).

Mastinu, E., Hakansson, B. & Ortiz-Catalan, M. Low-cost, open source bioelectric signal acquisition system. In 2017 IEEE 14th International Conference on Wearable and Implantable Body Sensor Networks 19–22 (IEEE, 2017).

Chapter   Google Scholar  

Download references

Acknowledgements

This study was supported by the Promobilia Foundation, IngaBritt och Arne Lundbergs Forskningsstiftelse, and Vetenskapsrådet.

Open access funding provided by Chalmers University of Technology.

Author information

Authors and affiliations.

Center for Bionics and Pain Research, Mölndal, Sweden

Shahrzad Damercheli, Kelly Morrenhof, Kirstin Ahmed & Max Ortiz-Catalan

Department of Electrical Engineering, Chalmers University of Technology, Gothenburg, Sweden

Bionics Institute, Melbourne, Australia

Max Ortiz-Catalan

Medical Bionics Department, University of Melbourne, Melbourne, Australia

NeuroBioniX, Melbourne, Australia

Prometei Pain Rehabilitation Center, Vinnytsia, Ukraine

You can also search for this author in PubMed   Google Scholar

Contributions

M.O.C. conceived the study. S.D. and M.O.C. designed the study. S.D. and K.M. conducted the experiments and analyzed the data. S.D. conducted the literature review and drafted the manuscript. K.A edited the manuscript and provided constructive feedback for it. M.O.C. edited the manuscript, supervised the research, and obtained the funding. All authors reviewed and proved the final version of the manuscript.

Corresponding author

Correspondence to Max Ortiz-Catalan .

Ethics declarations

Competing interests.

S.D., K.A and K.M. declare no competing interest. M.O.C has consulted Integrum AB and is the inventor of patents pertaining to MPR.

Additional information

Publisher's note.

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/ .

Reprints and permissions

About this article

Cite this article.

Damercheli, S., Morrenhof, K., Ahmed, K. et al. Performance in myoelectric pattern recognition improves with transcranial direct current stimulation. Sci Rep 14 , 11744 (2024). https://doi.org/10.1038/s41598-024-62185-x

Download citation

Received : 22 August 2023

Accepted : 14 May 2024

Published : 23 May 2024

DOI : https://doi.org/10.1038/s41598-024-62185-x

Share this article

Anyone you share the following link with will be able to read this content:

Sorry, a shareable link is not currently available for this article.

Provided by the Springer Nature SharedIt content-sharing initiative

By submitting a comment you agree to abide by our Terms and Community Guidelines . If you find something abusive or that does not comply with our terms or guidelines please flag it as inappropriate.

Quick links

  • Explore articles by subject
  • Guide to authors
  • Editorial policies

Sign up for the Nature Briefing: Translational Research newsletter — top stories in biotechnology, drug discovery and pharma.

how to find p value hypothesis test

IMAGES

  1. P-Value Method For Hypothesis Testing

    how to find p value hypothesis test

  2. Hypothesis testing tutorial using p value method

    how to find p value hypothesis test

  3. What is P-value in hypothesis testing

    how to find p value hypothesis test

  4. Understanding P-Values and Statistical Significance

    how to find p value hypothesis test

  5. Hypothesis testing tutorial using p value method

    how to find p value hypothesis test

  6. Understanding P-Values and Statistical Significance

    how to find p value hypothesis test

VIDEO

  1. What is P-value and how to find it? || Hypothesis testing || P-value in Z-test

  2. Using P-value: Hypothesis Test for Mean. (Large sample)

  3. Hypothesis Test

  4. Testing Hypothesis Using Probability Value (p-value) Approach

  5. Hypothesis Test

  6. How to Find P-Value for Hypothesis Testing in Excel

COMMENTS

  1. How to Find the P value: Process and Calculations

    To find the p value for your sample, do the following: Identify the correct test statistic. Calculate the test statistic using the relevant properties of your sample. Specify the characteristics of the test statistic's sampling distribution. Place your test statistic in the sampling distribution to find the p value.

  2. S.3.2 Hypothesis Testing (P-Value Approach)

    The P -value is, therefore, the area under a tn - 1 = t14 curve to the left of -2.5 and to the right of 2.5. It can be shown using statistical software that the P -value is 0.0127 + 0.0127, or 0.0254. The graph depicts this visually. Note that the P -value for a two-tailed test is always two times the P -value for either of the one-tailed tests.

  3. p-value Calculator

    To determine the p-value, you need to know the distribution of your test statistic under the assumption that the null hypothesis is true.Then, with the help of the cumulative distribution function (cdf) of this distribution, we can express the probability of the test statistics being at least as extreme as its value x for the sample:Left-tailed test:

  4. Understanding P-values

    The p value is a number, calculated from a statistical test, that describes how likely you are to have found a particular set of observations if the null hypothesis were true. P values are used in hypothesis testing to help decide whether to reject the null hypothesis. The smaller the p value, the more likely you are to reject the null hypothesis.

  5. 9.3

    P-Value. The P-value is the smallest significance level \(\alpha\) that leads us to reject the null hypothesis. Alternatively (and the way I prefer to think of P-values), the P-value is the probability that we'd observe a more extreme statistic than we did if the null hypothesis were true.

  6. Using P-values to make conclusions (article)

    Onward! We use p -values to make conclusions in significance testing. More specifically, we compare the p -value to a significance level α to make conclusions about our hypotheses. If the p -value is lower than the significance level we chose, then we reject the null hypothesis H 0 in favor of the alternative hypothesis H a .

  7. Calculating p-Value in Hypothesis Testing

    The p-value can be used in the final stage of the test to make this determination. Interpreting a p-value. Because it is a probability, the p-value can be expressed as a decimal or a percentage ranging from 0 to 1 or 0% to 100%. The closer the p-value is to zero, the stronger the evidence is in support of the alternative hypothesis, H a H_a H a .

  8. Hypothesis Testing

    Step 5: Present your findings. The results of hypothesis testing will be presented in the results and discussion sections of your research paper, dissertation or thesis.. In the results section you should give a brief summary of the data and a summary of the results of your statistical test (for example, the estimated difference between group means and associated p-value).

  9. Hypothesis testing and p-values (video)

    Then, if the null hypothesis is wrong, then the data will tend to group at a point that is not the value in the null hypothesis (1.2), and then our p-value will wind up being very small. If the null hypothesis is correct, or close to being correct, then the p-value will be larger, because the data values will group around the value we hypothesized.

  10. P-values and significance tests (video)

    About. Transcript. We compare a P-value to a significance level to make a conclusion in a significance test. Given the null hypothesis is true, a p-value is the probability of getting a result as or more extreme than the sample result by random chance alone. If a p-value is lower than our significance level, we reject the null hypothesis.

  11. How Hypothesis Tests Work: Significance Levels (Alpha) and P values

    Hypothesis testing is a vital process in inferential statistics where the goal is to use sample data to draw conclusions about an entire population. In the testing process, you use significance levels and p-values to determine whether the test results are statistically significant. You hear about results being statistically significant all of ...

  12. P-Value Method for Hypothesis Testing

    The P-value method is used in Hypothesis Testing to check the significance of the given Null Hypothesis. Then, deciding to reject or support it is based upon the specified significance level or threshold. A P-value is calculated in this method which is a test statistic. This statistic can give us the probability of finding a value (Sample Mean ...

  13. P-Value in Statistical Hypothesis Tests: What is it?

    A p value is used in hypothesis testing to help you support or reject the null hypothesis. The p value is the evidence against a null hypothesis. The smaller the p-value, the stronger the evidence that you should reject the null hypothesis. P values are expressed as decimals although it may be easier to understand what they are if you convert ...

  14. P-Value Method For Hypothesis Testing

    This statistics video explains how to use the p-value to solve problems associated with hypothesis testing. When the p-value is less than alpha, you should ...

  15. Understanding P-Values and Statistical Significance

    A p-value, or probability value, is a number describing how likely it is that your data would have occurred by random chance (i.e., that the null hypothesis is true). The level of statistical significance is often expressed as a p-value between 0 and 1. The smaller the p -value, the less likely the results occurred by random chance, and the ...

  16. The p-value and rejecting the null (for one- and two-tail tests)

    The p-value (or the observed level of significance) is the smallest level of significance at which you can reject the null hypothesis, assuming the null hypothesis is true. You can also think about the p-value as the total area of the region of rejection. Remember that in a one-tailed test, the regi

  17. R Handbook: Hypothesis Testing and p-values

    If the p-value for the test is less than alpha, we reject the null hypothesis. If the p-value is greater than or equal to alpha, we fail to reject the null hypothesis. Coin flipping example. For an example of using the p-value for hypothesis testing, imagine you have a coin you will toss 100 times. The null hypothesis is that the coin is fair ...

  18. How to Find P Value from a Test Statistic

    Hypothesis tests are used to test the validity of a claim that is made about a population. This claim that's on trial, in essence, is called the null hypothesis (H 0).The alternative hypothesis (H a) is the one you would believe if the null hypothesis is concluded to be untrue.Learning how to find the p-value in statistics is a fundamental skill in testing, helping you weigh the evidence ...

  19. 5 Tips for Interpreting P-Values Correctly in Hypothesis Testing

    Here are five essential tips for ensuring the p-value from a hypothesis test is understood correctly. 1. Know What the P-value Represents. First, it is essential to understand what a p-value is. In hypothesis testing, the p-value is defined as the probability of observing your data, or data more extreme, if the null hypothesis is true.

  20. P-value Calculator

    A P-value calculator is used to determine the statistical significance of an observed result in hypothesis testing. It takes as input the observed test statistic, the null hypothesis, and the relevant parameters of the statistical test (such as degrees of freedom), and computes the p-value.

  21. P-Value: What It Is, How to Calculate It, and Why It Matters

    P-Value: The p-value is the level of marginal significance within a statistical hypothesis test representing the probability of the occurrence of a given event. The p-value is used as an ...

  22. P-value Calculator & Statistical Significance Calculator

    The p-value is for a one-sided hypothesis (one-tailed test), allowing you to infer the direction of the effect (more on one vs. two-tailed tests). However, the probability value for the two-sided hypothesis (two-tailed p-value) is also calculated and displayed, although it should see little to no practical applications.

  23. t-test Calculator

    These next steps will tell you how to calculate the p-value from t-test or its critical values, and then which decision to make about the null hypothesis. Decide on the alternative hypothesis : Use a two-tailed t-test if you only care whether the population's mean (or, in the case of two populations, the difference between the populations ...

  24. PDF Hypothesis Testing

    If P-value ↵,wereject H 0 and say the data are statistically significant at the level ↵. If P-value > ↵,wedo notreject H 0. 0: z: x ¯ x x: 2. Right-tailed Test H: 0: µ = kH: 1: µ>k P-value = P(z>z ¯) This is the probability of getting a test statistic as high as or higher than z ¯ If P-value ↵,wereject H: 0: and say the data

  25. Hypothesis Test Calculator

    The decision rule of the hypothesis test is: If Z ≤ − z0.025 or Z ≥ z0.025, reject H0. If Z > − z0.025 or Z < z0.025, fail to reject H0. The decision rule (based on p-value approach) is: p − value ≤ α, Reject H0. p − value > α, Fail to reject H0. The critical values for a left-tailed test is: − z0.05 = − 1.645.

  26. All in one Standard normal distribution, z score, t score, p-value, one

    solve standard normal distribution using the table methods and the area method, find the cases of Z ana T score test, use the P-value test and rejector apply...

  27. Performance in myoelectric pattern recognition improves with ...

    We were able to reject the null hypothesis and improvements in the motion test's completion rate during tDCS (28% change, p-value: 0.023) indicate its potential as an adjunctive tool to enhance ...

  28. Analysis and hypothesis testing of redundant energy of solar home

    Hypothesis testing of the existence of redundant energy from the SHS is also conducted. Our study has revealed that generally, there is redundant energy generation in the hours of 10 a.m. to 3 p.m. for the households, with hourly values ranging from 0.37 kWh to 1.55 kWh. The redundant energy represents 29.6 %-56.3 % of the households' monthly ...