## Research Hypothesis In Psychology: Types, & Examples

Saul Mcleod, PhD

Educator, Researcher

BSc (Hons) Psychology, MRes, PhD, University of Manchester

Saul Mcleod, Ph.D., is a qualified psychology teacher with over 18 years experience of working in further and higher education. He has been published in peer-reviewed journals, including the Journal of Clinical Psychology.

Learn about our Editorial Process

Olivia Guy-Evans, MSc

Associate Editor for Simply Psychology

BSc (Hons) Psychology, MSc Psychology of Education

Olivia Guy-Evans is a writer and associate editor for Simply Psychology. She has previously worked in healthcare and educational sectors.

On This Page:

A research hypothesis, in its plural form “hypotheses,” is a specific, testable prediction about the anticipated results of a study, established at its outset. It is a key component of the scientific method .

Hypotheses connect theory to data and guide the research process towards expanding scientific understanding

## Some key points about hypotheses:

- A hypothesis expresses an expected pattern or relationship. It connects the variables under investigation.
- It is stated in clear, precise terms before any data collection or analysis occurs. This makes the hypothesis testable.
- A hypothesis must be falsifiable. It should be possible, even if unlikely in practice, to collect data that disconfirms rather than supports the hypothesis.
- Hypotheses guide research. Scientists design studies to explicitly evaluate hypotheses about how nature works.
- For a hypothesis to be valid, it must be testable against empirical evidence. The evidence can then confirm or disprove the testable predictions.
- Hypotheses are informed by background knowledge and observation, but go beyond what is already known to propose an explanation of how or why something occurs.

Predictions typically arise from a thorough knowledge of the research literature, curiosity about real-world problems or implications, and integrating this to advance theory. They build on existing literature while providing new insight.

## Types of Research Hypotheses

Alternative hypothesis.

The research hypothesis is often called the alternative or experimental hypothesis in experimental research.

It typically suggests a potential relationship between two key variables: the independent variable, which the researcher manipulates, and the dependent variable, which is measured based on those changes.

The alternative hypothesis states a relationship exists between the two variables being studied (one variable affects the other).

A hypothesis is a testable statement or prediction about the relationship between two or more variables. It is a key component of the scientific method. Some key points about hypotheses:

- Important hypotheses lead to predictions that can be tested empirically. The evidence can then confirm or disprove the testable predictions.

In summary, a hypothesis is a precise, testable statement of what researchers expect to happen in a study and why. Hypotheses connect theory to data and guide the research process towards expanding scientific understanding.

An experimental hypothesis predicts what change(s) will occur in the dependent variable when the independent variable is manipulated.

It states that the results are not due to chance and are significant in supporting the theory being investigated.

The alternative hypothesis can be directional, indicating a specific direction of the effect, or non-directional, suggesting a difference without specifying its nature. It’s what researchers aim to support or demonstrate through their study.

## Null Hypothesis

The null hypothesis states no relationship exists between the two variables being studied (one variable does not affect the other). There will be no changes in the dependent variable due to manipulating the independent variable.

It states results are due to chance and are not significant in supporting the idea being investigated.

The null hypothesis, positing no effect or relationship, is a foundational contrast to the research hypothesis in scientific inquiry. It establishes a baseline for statistical testing, promoting objectivity by initiating research from a neutral stance.

Many statistical methods are tailored to test the null hypothesis, determining the likelihood of observed results if no true effect exists.

This dual-hypothesis approach provides clarity, ensuring that research intentions are explicit, and fosters consistency across scientific studies, enhancing the standardization and interpretability of research outcomes.

## Nondirectional Hypothesis

A non-directional hypothesis, also known as a two-tailed hypothesis, predicts that there is a difference or relationship between two variables but does not specify the direction of this relationship.

It merely indicates that a change or effect will occur without predicting which group will have higher or lower values.

For example, “There is a difference in performance between Group A and Group B” is a non-directional hypothesis.

## Directional Hypothesis

A directional (one-tailed) hypothesis predicts the nature of the effect of the independent variable on the dependent variable. It predicts in which direction the change will take place. (i.e., greater, smaller, less, more)

It specifies whether one variable is greater, lesser, or different from another, rather than just indicating that there’s a difference without specifying its nature.

For example, “Exercise increases weight loss” is a directional hypothesis.

## Falsifiability

The Falsification Principle, proposed by Karl Popper , is a way of demarcating science from non-science. It suggests that for a theory or hypothesis to be considered scientific, it must be testable and irrefutable.

Falsifiability emphasizes that scientific claims shouldn’t just be confirmable but should also have the potential to be proven wrong.

It means that there should exist some potential evidence or experiment that could prove the proposition false.

However many confirming instances exist for a theory, it only takes one counter observation to falsify it. For example, the hypothesis that “all swans are white,” can be falsified by observing a black swan.

For Popper, science should attempt to disprove a theory rather than attempt to continually provide evidence to support a research hypothesis.

## Can a Hypothesis be Proven?

Hypotheses make probabilistic predictions. They state the expected outcome if a particular relationship exists. However, a study result supporting a hypothesis does not definitively prove it is true.

All studies have limitations. There may be unknown confounding factors or issues that limit the certainty of conclusions. Additional studies may yield different results.

In science, hypotheses can realistically only be supported with some degree of confidence, not proven. The process of science is to incrementally accumulate evidence for and against hypothesized relationships in an ongoing pursuit of better models and explanations that best fit the empirical data. But hypotheses remain open to revision and rejection if that is where the evidence leads.

- Disproving a hypothesis is definitive. Solid disconfirmatory evidence will falsify a hypothesis and require altering or discarding it based on the evidence.
- However, confirming evidence is always open to revision. Other explanations may account for the same results, and additional or contradictory evidence may emerge over time.

We can never 100% prove the alternative hypothesis. Instead, we see if we can disprove, or reject the null hypothesis.

If we reject the null hypothesis, this doesn’t mean that our alternative hypothesis is correct but does support the alternative/experimental hypothesis.

Upon analysis of the results, an alternative hypothesis can be rejected or supported, but it can never be proven to be correct. We must avoid any reference to results proving a theory as this implies 100% certainty, and there is always a chance that evidence may exist which could refute a theory.

## How to Write a Hypothesis

- Identify variables . The researcher manipulates the independent variable and the dependent variable is the measured outcome.
- Operationalized the variables being investigated . Operationalization of a hypothesis refers to the process of making the variables physically measurable or testable, e.g. if you are about to study aggression, you might count the number of punches given by participants.
- Decide on a direction for your prediction . If there is evidence in the literature to support a specific effect of the independent variable on the dependent variable, write a directional (one-tailed) hypothesis. If there are limited or ambiguous findings in the literature regarding the effect of the independent variable on the dependent variable, write a non-directional (two-tailed) hypothesis.
- Make it Testable : Ensure your hypothesis can be tested through experimentation or observation. It should be possible to prove it false (principle of falsifiability).
- Clear & concise language . A strong hypothesis is concise (typically one to two sentences long), and formulated using clear and straightforward language, ensuring it’s easily understood and testable.

Consider a hypothesis many teachers might subscribe to: students work better on Monday morning than on Friday afternoon (IV=Day, DV= Standard of work).

Now, if we decide to study this by giving the same group of students a lesson on a Monday morning and a Friday afternoon and then measuring their immediate recall of the material covered in each session, we would end up with the following:

- The alternative hypothesis states that students will recall significantly more information on a Monday morning than on a Friday afternoon.
- The null hypothesis states that there will be no significant difference in the amount recalled on a Monday morning compared to a Friday afternoon. Any difference will be due to chance or confounding factors.

## More Examples

- Memory : Participants exposed to classical music during study sessions will recall more items from a list than those who studied in silence.
- Social Psychology : Individuals who frequently engage in social media use will report higher levels of perceived social isolation compared to those who use it infrequently.
- Developmental Psychology : Children who engage in regular imaginative play have better problem-solving skills than those who don’t.
- Clinical Psychology : Cognitive-behavioral therapy will be more effective in reducing symptoms of anxiety over a 6-month period compared to traditional talk therapy.
- Cognitive Psychology : Individuals who multitask between various electronic devices will have shorter attention spans on focused tasks than those who single-task.
- Health Psychology : Patients who practice mindfulness meditation will experience lower levels of chronic pain compared to those who don’t meditate.
- Organizational Psychology : Employees in open-plan offices will report higher levels of stress than those in private offices.
- Behavioral Psychology : Rats rewarded with food after pressing a lever will press it more frequently than rats who receive no reward.

Grade Booster exam workshops for 2024 . Join us in to Birmingham, Bristol, Leeds, London, Manchester and Newcastle Book now →

Reference Library

Collections

- See what's new
- All Resources
- Student Resources
- Assessment Resources
- Teaching Resources
- CPD Courses
- Livestreams

Study notes, videos, interactive activities and more!

Psychology news, insights and enrichment

Currated collections of free resources

Browse resources by topic

- All Psychology Resources

Resource Selections

Currated lists of resources

- Study Notes

## Aims and Hypotheses

Last updated 22 Mar 2021

- Share on Facebook
- Share on Twitter
- Share by Email

Observations of events or behaviour in our surroundings provoke questions as to why they occur. In turn, one or multiple theories might attempt to explain a phenomenon, and investigations are consequently conducted to test them. One observation could be that athletes tend to perform better when they have a training partner, and a theory might propose that this is because athletes are more motivated with peers around them.

The aim of an investigation, driven by a theory to explain a given observation, states the intent of the study in general terms. Continuing the above example, the consequent aim might be “to investigate the effect of having a training partner on athletes’ motivation levels”.

The theory attempting to explain an observation will help to inform hypotheses - predictions of an investigation’s outcome that make specific reference to the independent variables (IVs) manipulated and dependent variables (DVs) measured by the researchers.

There are two types of hypothesis:

- - H 1 – Research hypothesis
- - H 0 – Null hypothesis

H 1 – The Research Hypothesis

This predicts a statistically significant effect of an IV on a DV (i.e. an experiment), or a significant relationship between variables (i.e. a correlation study), e.g.

- In an experiment: “Athletes who have a training partner are likely to score higher on a questionnaire measuring motivation levels than athletes who train alone.”
- In a correlation study: ‘There will be a significant positive correlation between athletes’ motivation questionnaire scores and the number of partners athletes train with.”

The research hypothesis will be directional (one-tailed) if theory or existing evidence argues a particular ‘direction’ of the predicted results, as demonstrated in the two hypothesis examples above.

Non-directional (two-tailed) research hypotheses do not predict a direction, so here would simply predict “a significant difference” between questionnaire scores in athletes who train alone and with a training partner (in an experiment), or “a significant relationship” between questionnaire scores and number of training partners (in a correlation study).

H 0 – The Null Hypothesis

This predicts that a statistically significant effect or relationship will not be found, e.g.

- In an experiment: “There will be no significant difference in motivation questionnaire scores between athletes who train with and without a training partner.”
- In a correlation study: “There will be no significant relationship between motivation questionnaire scores and the number of partners athletes train with.”

When the investigation concludes, analysis of results will suggest that either the research hypothesis or null hypothesis can be retained, with the other rejected. Ultimately this will either provide evidence to support of refute the theory driving a hypothesis, and may lead to further research in the field.

## You might also like

A level psychology topic quiz - research methods.

Quizzes & Activities

## Research Methods: MCQ Revision Test 1 for AQA A Level Psychology

Topic Videos

## Example Answers for Research Methods: A Level Psychology, Paper 2, June 2018 (AQA)

Exam Support

## Our subjects

- › Criminology
- › Economics
- › Geography
- › Health & Social Care
- › Psychology
- › Sociology
- › Teaching & learning resources
- › Student revision workshops
- › Online student courses
- › CPD for teachers
- › Livestreams
- › Teaching jobs

Boston House, 214 High Street, Boston Spa, West Yorkshire, LS23 6AD Tel: 01937 848885

- › Contact us
- › Terms of use
- › Privacy & cookies

© 2002-2024 Tutor2u Limited. Company Reg no: 04489574. VAT reg no 816865400.

In this post

Before carrying out their research, most psychologists will make a prediction about what will happen. This is known as a hypothesis, which is a statement regarding what the psychologist believes will or should happen at the end of the study. For example, a psychologist may predict that children who listen to music whilst revising will do better in their exams than those children who do not.

There are two types of hypotheses, which are null hypotheses and alternative hypotheses, both of which we will look at now in more detail.

## What is a null hypothesis?

A null hypothesis predicts that there will be no pattern or trend in results. In other words, it predicts no difference and no correlation . (A correlation is a relationship between two or more things.)

Before starting their research, psychologists usually have both a null and an alternative hypothesis and their aim is to find out which one is correct. Once they have identified which one is correct they will reject the other, as this one will not be supported by their research findings.

## What is an alternative hypothesis?

Unlike a null hypothesis, an alternative hypothesis predicts that there will be a difference or a correlation between two or more things. In other words, an alternative hypothesis predicts some kind of pattern or trend in results. Have a look at the following alternative hypotheses, which are based around the core studies within this course:

- Participants will be able to accurately recall more information at the start and end of a list than in the middle
- Children whose efforts are praised are more likely to grow up with a growth mindset than those who are praised personally
- Children are more likely to behave aggressively when they witnessed an aggressive adult role model
- Children under the age of eight are more likely to be egocentric than those who are over the age of eight.

It will help you in the exam, if you are asked to write some form of hypothesis, if you begin a null hypothesis with “there will be no…” and an alternative hypothesis with “there will be a…”. There are usually two marks available for writing a hypothesis correctly. One mark will be for knowing whether it is predicting a different or a correlation or not and the other mark will be for stating the rest of the hypothesis, i.e. the variables, which must be done in a clear and accurate way.

## Interested in a Psychology GCSE?

We offer the Edexcel GCSE in Psychology through our online campus.

Learn more about our Psychology GCSE courses

## Read another one of our posts

Parent’s guide to supporting gcse pupils.

## Coping Strategies for Care Worker Burnout

## The Importance of Training For Care Workers

## The Benefits Of Online A-Level Courses For Students

## The Challenges Of Caring For Children With Additional Needs

## Counselling Techniques for Working With Adolescents

## The Future of Care Work And Its Opportunities

## Tips For Success In Online A-Levels

## Save your cart?

- Account details

## AQA GCSE Psychology Research Methods

This section provides revision resources for AQA GCSE psychology and the Research Methods chapter. The revision notes cover the AQA exam board and the new specification. As part of your GCSE psychology course, you need to know the following topics below within this chapter:

- AQA Psychology
- Research Methods

We've covered everything you need to know for this research methods chapter to smash your exams.

- The latest AQA GCSE Psychology specification (2023 onwards) has been followed exactly so if it's not in this resource pack, you don't need to know it.
- We've provided practice questions at the end to help you get better with this topic.
- Completely free for schools , just get in touch using the contact form at the bottom.
- Teachers can print and distribute this resource freely in classrooms to aid students and teaching.
- Instant download, no waiting.

## Formulation of Testable Hypotheses

For the formulation of testable hypotheses, the psychology specification states you need to know following:

- Null hypothesis and alternative hypothesis.

A hypothesis is simply a formal and testable statement of the relationship between two variables that is to be tested through experimentation. In psychology, as well as other sciences, we use them as part of the scientific method.

The hypothesis is not strictly speaking a prediction and should not be used in the future tense i.e. “this will happen”. It is only at the end of the study that the researcher decides whether the research evidence supports the hypothesis or not.

There are different types of hypotheses used in psychology, however, the main ones that crop up frequently are:

- Directional hypotheses
- Non-directional hypotheses
- Null hypotheses
- Alternative hypotheses

For GCSE Psychology and the AQA specification, we need to know about null hypotheses and alternative hypotheses .

## What is a Null Hypothesis?

A null hypothesis is a general statement that the observed variables will have no impact as there is no relationship between them. This hypothesis assumes that any difference observed is due to sampling or experimentation errors.

An example of a null hypothesis for a hypothetical scenario is “watching television before bed has no impact on how well you sleep”

## What is an Alternative Hypothesis?

The alternative hypothesis would be a prediction that one variable will affect the other.

An example would be “watching scary movies before bed affects how fast you fall asleep”. The alternative hypothesis does not specify the direction of the outcome, merely that there will be an effect.

## Formulating Hypotheses

Once you know enough about hypotheses, you need to consider how to apply them. When conducting research, most of the time the experiment comes from a simple or vague idea we wish to test.

Here’s an example: does music affect peoples ability to learn?

This is rather a vague question and to turn it into a testable experiment, we need to be able to operationalise the two key variables; music and learning .

These two variables are then known as the independent variable and dependent variable – often referred to as the IV and DV for short. More information is given on them below.

Hypotheses are then easier to form, a suitable one for this experiment would be an alternative hypothesis such as:

- “ The presence or absence of music has an effect on the score in a learning test ”

A null hypothesis for this example would simply be:

- “The presence of music has no effect on the score in a learning test”

## Type of Variables

For the different types of variables, the GCSE psychology specification states you need to know the following:

- Independent variable, dependent variable, extraneous variables.

There are 3 different types of variables we need to know about which are:

- The independent variable (IV)
- The dependent variable (DV)
- Extraneous variables.

## Independent Variable

An experiment will look to measure the effect of one variable on another. These two variables have special names, which are the independent variable and dependent variable.

The independent variable is what researchers manipulate in order to test its effect on the dependent variable (the outcome). Let’s use the example mentioned earlier about music and learning to illustrate this: We are conducting an experiment to see if music affects the ability of students to learn. In this case, the independent variable (IV) we will be manipulating is music.

Within the context of an experiment, we may simply have two conditions where one group is exposed to music while another group is not while engaging in some learning activity. We would then compare the findings to assess the results.

## Dependent Variable

The dependent variable (DV) is the outcome or effect we are measuring within the study. So using the example above, the dependent variable would be how well the students are able to learn with or without music. This may be measured in a number of ways (taking a memory test for example or quiz).

So to clarify – the independent variable is what we change and the dependent variable is the outcome we then measure .

A good way to remember the difference is to think of it like this:

- The dependent variable “depends” on what's being changed (the independent variable).
- Another way would be to remember that “we measure the effect of the IV on the DV”.

If you remember that the independent variable (IV) always comes first, you should be able to recall that the dependent variable (DV) is then the outcome. These are just two simple ways of remembering the difference between the IV and DV but feel free to use what works for you.

## Extraneous Variable

The extraneous variable is a third variable that may unknowingly be affecting the outcome of the study (the DV).

We conduct experiments to measure the effect of the IV on the DV but sometimes extraneous variables are actually the cause of the changes. They can be seen as “nuisance variables” that affect the study and make it difficult to know whether it is the IV that affects the DV.

Let’s use that example mentioned earlier about how music may affect a students ability to learn. We may conduct this experiment and find that music improves learning as the students who listened to the music performed better.

We may, therefore, conclude music improves students ability to learn, however, what if it was actually a third variable affecting the results which is unaccounted for? (an extraneous variable).

Perhaps we find that the students who performed the best were those with prior knowledge of the questions in the test?. The extraneous variable could then be argued to be prior knowledge participants had that we have not accounted for or could control.

Looking into the study we could perhaps argue the extraneous variable may be the intelligence of participants from one group to another that is affecting the outcome. It may be that some participants in one group were more educated and therefore better problem solvers, and this is an extraneous variable that is affecting the dependent variable (outcome).

With research studies you will be presented, you can almost always find arguments to highlight extraneous variables in some form. It is handy to get into the habit of recognising these different forms as they prove useful in critically analysing studies and topping up your points with further evaluation marks, especially if you go on to study A-level psychology.

## Sampling Methods

- Random sampling
- Opportunity sampling
- Systematic sampling
- Stratified sampling
- Strengths and weaknesses of each sampling method
- Understand principles of sampling as applied to scientific data.

This section of AQA GCSE psychology requires you to know about 4 different sampling methods and their strengths and weaknesses.

Sampling methods are merely the different strategies researchers use to get participants for their studies. In any psychological research study, there is usually a target population, which is the group of individuals the researcher is interested in. The aim of the researcher is to try and take a representative sample from this target population using a sampling method. The goal is to gain a representative sample that then allows the researcher to make generalisations across the whole population, based on the findings of this sample.

The four sampling methods you are required to know about are:

## Random Sampling

Random sampling involves the researcher identifying members of the target population, numbering them and then attempting to draw out the required number of people for their study.

The selection of participants can be done in a randomised way such as drawing out numbers from a hat if the sample size is small or having a computer randomly select the participants if the sample size is large.

## Strengths and Weaknesses of Random Sampling

- Random sampling has the benefit of being more unbiased as all members of the target population have an equal chance of being selected for the study. This would mean that the sample is likely to be more representative of the target population making more valid generalisations possible from the research findings.
- Random sampling also means there is less chance that researchers can influence the results as they have no say as to who is picked. This reduces the impact of investigator effects which means the findings may have more validity.
- However, even despite this, it is still possible for the researcher to end up with an unbalanced and biased sample by chance, particularly if the sample size is too small.
- Gathering randomised samples can also be time-consuming, as attempting to gather enough willing participants from the target population takes a considerable amount of time and effort.

## Opportunity Sampling

Opportunity sampling is a form of sampling method that means you ask those who are around you and most easily available , that represent the target population, to participate in the study. This may involve asking those around you in your class, school or people walking in the street for their involvement.

## Strengths and Weaknesses of Opportunity Sampling

- The main benefit of opportunity sampling is it is one of the fastest and easiest ways to gather participants for a study when compared to other sampling methods.
- Opportunity samples have a greater chance of being biased because the sample is drawn from a very narrow part of the target population. For example, if you selected participants at school, your sample is likely to consist of mostly students and the behaviours they display in the study may not generalise to adults. Participants may also try to “help” the researcher in a way that would support the hypothesis so the results may be unreliable and invalid.
- With opportunity sampling methods, it is possible the researcher can influence those selected as the process is not randomised. The researcher may select the people they think will support their hypothesis, so investigator effects is a potential hindrance.

## Systematic Sampling

Systematic sampling involves selecting every “nth” member of the target population . An example of this would be if the researcher decided that “n” will be “5”, every 5th person in the target population is selected as a participant.

This is still unbiased as the researcher has no influence as to who is picked and it is technically not a “random sample” either as not everyone gets an equal opportunity to be selected (it is only the person 5 positions away). Be sure not to confuse this with the random sampling method due to this slight difference; just remember that there is a fixed systematic way for selection that determines this to be a systematic sample.

## Strengths and weaknesses of systematic sampling

- A strength of the systematic sampling method is that it is a simple way for researchers to gather participants and there is little risk of research bias influencing this. Therefore the participants gathered should, in theory, be representative and unbiased which should lead to more reliable results.
- A weakness, however, is participants gathered could still be unrepresentative and biased due to chance selection. This would make the results unreliable when re-tested.
- Another weakness of systematic sampling is you need a bigger sample size to be able to filter out participants based on the “nth” selection. If you require 100 participants for a study and picked them based on every 10 participants, you would need 1000 participants to filter through. Therefore gathering participants for a study based on systematic sampling methods can be very time-consuming.

## Stratified Sampling

Stratified sampling is the most complex of the sampling methods and it is most often used in questionnaires. Sub-groups (or strata) within the population are identified (e.g. boys and girls or age groups: 10-12 years, 13-15 years etc) and then participants are gathered from each strata in proportion to their occurrence in the population . The selection of participants is generally done using a random technique.

For example, in a school, there are several subgroups such as teachers, support staff, students and other staff. If the teachers made up 10% of the whole school’s population, then 10% of the sample must be teachers. This is then repeated for each sub-group.

## Strengths and Weaknesses of Stratified Sampling

- A major strength of using stratified sampling techniques is that they are very representative of the target population. This means the findings should have high reliability and validity to make generalisations to the target population.
- A major weakness of using stratified sampling is that it is very time-consuming to identify the subgroups, select necessary participants and attempt to get a proportionate sample involved in the study. Therefore this form of sampling method is extremely difficult to execute and can be impractical.

## Volunteer Sampling

A volunteer sample consists of people that have volunteered to take part in the study . Volunteers can be gathered in a number of ways such as putting an advert out on the newspaper, internet or some media outlet to try and gather people to take part.

Volunteers may put themselves forward to be part of the study but they may not necessarily be told the aim of the study or what they are really being tested in. For example, Milgram’s shock study gathered volunteers who agreed to take part but did not necessarily know what they were being tested on (obedience).

## Strengths and Weaknesses of Volunteer Sampling

- A strength of using volunteer sampling is participants should be willing to give their informed consent to be a part of the study. The people that tend to volunteer tend to be those motivated to take part in the study.
- Volunteer sampling can also be a fast and efficient way of gathering research participants. Instead of having to search for volunteers, an advert could be placed to gather participants based on the traits/characteristics the researcher requires.
- A weakness of using volunteer sampling is the people that tend to volunteer may be a biased sample that are not representative of the target population. For example, volunteers are already motivated to engage in the research (volunteer bias) and more motivated than those that do not and this can influence the outcome of the study in some way.

## Designing Research

This section on designing research for GCSE psychology and research methods is quite extensive and requires you to know about quite a few different aspects of designing psychological research studies.

The topics you need to know for research methods include:

## Independent group design

- Repeated measures design
- Matched pairs design
- Strengths and weaknesses of each design
- Laboratory experiments
- Field and natural experiments

## Questionnaires

- Case studies
- Observation studies
- Strengths and weaknesses of each research method and types of behaviour for which they are suitable.

An independent group design is the simplest to understand and conducted with participants involved in the study usually divided into two subgroups .

One group will take part in the experimental condition (with the independent variable introduced), while the other group would not be exposed to this and form the control group for comparison.

Let’s use the example we mentioned earlier with a study that measures the effects of music on learning.

In an independent group design, one group of participants would be measured on their ability to learn with music being played while the other group would be tested on their learning ability without music.

The results (dependent variable) are then compared between the two groups to measure the effects.

If the results are significantly different then researchers may conclude that this is because of the independent variable, which in our case would be music affecting learning ability.

## Strengths and Weaknesses of Independent Group Design

- A strength of using independent group designs is there are no order effects that can invalidate the results, as participants only take part in one of the conditions. Order effects are apparent in experiments where repeated measure designs are used and this involves participants learning or improving from their experience of having to do the experiment more than once. This does not happen in independent group designs which can give more valid results.
- Independent group designs are beneficial as the materials or apparatus can usually be used across both the experimental condition and the control group (minus the independent variable being manipulated or introduced as required). This makes setting up independent group designs far easier than other experimental conditions due to saving time.
- Another strength of independent group designs is that participants are less likely to display demand characteristics. Demand characteristics are when participants change their own behaviour as they figure out (or think they do) the purpose of the study. The participants may then display behaviour that is different in response which can invalidate findings. Demand characteristics are less likely in independent group designs as participants are only exposed to one condition and they don’t have the opportunity to learn or adjust their behaviour in another condition (as they cannot compare).
- A weakness of independent group designs is that differences between the experimental condition and control group may be due to participant variables, such as individual differences between the two groups, rather than the independent variable. Just by probability or chance, one group may be smarter than another or have individual characteristics that make them more able (or less able) for the condition they are exposed. This would then be a confounding variable that affects the results. Using the music example mentioned previously, the group that performs best (whether its the group exposed to music or not) may do so simply because they have more educated or intelligent people than the other condition.
- Another criticism of using independent group designs in experiments is that you need to gather more participants. For example, you need a large enough sample to be exposed to the experimental condition to make generalisations but you then need to gather this number again for the control group condition. Using our example earlier, if we wanted to test how music affects people’s ability to learn and we gather 50 people, we need another 50 people for the control condition that is exposed to no music. Gathering too few participants increases the risks of individual differences being the difference in results while gathering a large number requires more time, effort and resources.

## Repeated Measures Design

A repeated measures design sees all the gathered participants of the study being exposed to both conditions of the experiment.

Referring to our music and learning scenario (once again!), we would have a group of 50 participants that would first be exposed to the experimental condition whereby they attempt to learn with music present and then they would attempt to learn without music.

The results would then be compared between the conditions to assess what impact the IV had on the DV. In experiments where there were numerous different conditions, the same participants would be used across them while exposed to different independent variables.

## Strengths and Weaknesses of Repeated Measures Design

- A major strength of repeated measure designs is that they require less effort to gather participants as they use the same people across the different experimental conditions. Therefore setting up the experiment tends to be faster compared to group designs such as independent measures where you would require double the amount of participants to cross-compare against.
- Another strength of using repeated measure designs is participant variables are eliminated. This is because the same people are used across the different conditions and they are comparing against themselves directly. This means there is less chance of individual differences influencing the results.
- A weakness of using repeated measure designs is that there is a high risk of order effects affecting the validity of findings. As participants are required to do multiple tasks across different conditions, there is the risk that participants may improve as they repeat the experiments. For example, if they were tested on their learning ability while music was played in one condition, when they are tested without music, the experience and practice gained from the first condition may see them improve. Researchers may then incorrectly view this improvement as due to the independent variable (IV) rather than order effects.
- Another criticism of using repeated measures is you need to create multiple different tasks or materials between the conditions. For example, you could not use the same content for participants to memorise from one condition to another in a memory test experiment. You would need to create content that was judged to be similar in difficulty which in itself would be a subjective measure. For example, having participants memorise 20 “easy” words with similar syllables in one condition, would require a researcher to spend significant time and effort in creating another set of similar words for another condition.
- There is a higher risk of demand characteristics when using repeated measure designs. This is because participants may be able to guess the purpose of the study (if it is intentionally obscured to improve the validity of findings) and then adjust their behaviour accordingly. This is more likely to happen as the same participants are used across the different conditions and they may notice the different setups and the purpose of the study. This may lead to invalid findings from the behaviour that is observed.

## Matched Pairs Design

A matched pairs design involves gathering participants and testing them prior to them taking part in the study on certain characteristics . The tests allow them to be matched in pairs with someone who is deemed to have similar qualities as to them which may be relevant to the study.

The pairs may be identified as Pair Aa or Pair Bb etc.

In conducting a matched pairs design research study, one pair will take part in one experimental condition while their matched partner/pair is exposed to another experimental condition.

The results are then compared by the researcher between the conditions and treated as if they were gathered from one individual despite coming from two individuals.

Within psychological research, the most ideal matched pairs participants tend to be identical twins as they account have identical biology (as they are similar) and potentially very similar personality factors too.

## Strengths and Weaknesses of Matched Pairs Design

- One strength of using matched pairs designs in research is they reduce participant variables which can affect the results. This is because the people are paired up together based on similar traits that are relevant to the study.
- Another strength of using matched pairs is that there are no order effects, unlike repeated measure design studies. This is because everyone does the experiment once and have no opportunity to learn from their previous attempts.
- Matched pairs designs can re-use the same materials/apparatus across the pairs as everyone will only be exposed to them once. This makes the setup of the experiment easier as researchers do not have to create unique set-ups across the two groups which can be time-consuming.
- A weakness of using matched pairs design is matching people on key variables is time-consuming and not always successful. Attempting to find people who can be matched requires an initial large sample to filter through and this can take a very long time to do.
- It is difficult to match people based on personality variables or filter out individual differences for certain. You can generally only match people based on fixed traits such as gender (sex), age, height etc, however, personality factors may be what determine differences in the experiments. Therefore matched pair designs can produce invalid results that are not the result of the independent variable.

## Laboratory Experiments

Laboratory experiments are experiments that are conducted in a controlled setting , usually a research laboratory where participants are aware of being observed and part of a study.

Laboratory experiments tend to have high internal validity because researchers can control all the variables so the main differences between the experimental condition and control group are only the independent variable whose effect is being monitored. This allows researchers to more confidently assume that any differences between the conditions are due to the independent variable.

## Strengths and Weaknesses of Laboratory Experiments

- A major strength of laboratory experiments is they have high validity. This means that researchers can be confident to a higher degree that what they are measuring is in fact due to the effect of the independent variable because this is the only difference between the experimental condition and control group.
- Another strength of using a laboratory setup is this limits the role of extraneous variables from influencing the results as researchers have complete control of the environment. This means unaccounted for outside influences are limited and makes drawing cause and effect between the IV and DV more reliable. Laboratory experiments can be checked for reliability as they are easier to replicate. Due to the artificial setup of the experiments (being in a laboratory setting), other researchers can recreate the experiment exactly to check the results for reliability. This can be harder to do with other setups.
- A weakness of using laboratory experiments is they lack ecological validity. This is because the setup of the experiment is artificial and in a completely controlled environment and the results gathered in the lab, may not generalise to real-world situations due to their contrived setup. Therefore laboratory experiments tend to lack ecological validity as the setup involved to test behaviour may not occur similarly in real life e.g. testing memory ability and learning in a lab setup is unlikely to be how people learn with or without music being present – or using a film clip to test eyewitness testimony is not realistic.
- Participants in laboratory setups may display demand characteristics and adjust their behaviour due to the contrived setup and being aware that they are being observed. Therefore the behaviour observed may lack validity as it may not be indicative of how people are likely to behave in the real world if they think they are not being observed or under supervision. Participants may, therefore, behave how they think researchers want them or what would be deemed normal with others watching, not necessarily what they would actually do.

## Field Experiments

A field experiment is conducted in a more natural or everyday environment , unlike the laboratory experiment where the behaviour being measured is more likely to occur.

The field experiment can be conducted anywhere in real-world settings with researchers manipulating an independent variable to measure its impact on the dependent variable. A field experiment can include confederates that participants are unaware of also being involved to test their response in the field setting.

One key difference between a field experiment compared to a laboratory experiment, are participants may not be aware of being observed or studied. This is in an attempt to generate more realistic behaviour or responses from them that can generalise to real-world settings.

## Strengths and Weaknesses of Field Experiments

- A strength of using field experiments is they are high in ecological validity as the setup and environments are more realistic. This is thought to increase more realistic responses from participants as they are not aware always aware of being observed (unlike lab settings). The argument here is field experiments have higher internal validity and the behaviours from participants can then be generalised to the wider population.
- A weakness of using field experiments is they are at higher risk of extraneous variables influencing the behaviour of participants. Researchers, therefore, have less control and cannot say with as much certainty that the behaviour they observed was in fact due to the independent variable or not.
- Another criticism of field experiments is they are difficult to replicate. Participants may be members of the public with personality factors that influence the results which are unaccounted for and the environment itself may be difficult to recreate in order to test the study for reliability in its findings. Therefore replication and reliability become an issue for field experiments.
- Another weakness of using field experiments is they raise ethical issues in regards to informed consent. This is because participants may be unaware of being observed or part of a study and this raises ethical concerns. On the other hand, this may also provide us with more realistic and valid results without demand characteristics being a potential confounding variable.

## Natural Experiments

A natural experiment is conducted when ethical or practical reasons to manipulate an independent variable (IV) are not possible. It is therefore said that the IV occurs 'naturally'.

The dependent variable (DV), may however, be tested in a laboratory, for example, the effects of institutionalisation in some form, which may occur naturally due to imprisonment or disruption of attachment through the care system and how it may affect psychological development such as intellect or emotional development.

Another good example of a natural experiment is the study by Charlton et al. (2000) which measured the effects of television. Prior to 1995, the people of St. Helena, a small island in the Atlantic had no access to TV however it's arrival gave the researchers to examine how exposure to western programmes may influence their behaviour. The IV in this case was the introduction of TV which was not controlled by researchers and something they took advantage of would be practically difficult to control. The DV was measures of pro or anti-social behaviours that were assessed through the use of questionnaires, observations and psychological tests.

These types of experiments would either impractical or unethical to implement and therefore cases where this occurs naturally due to normal circumstances may be examined through natural experiments.

## Strengths and Weaknesses of Natural Experiments

- One major weakness of natural experiments is the lack of control. It is more difficult to control extraneous variables which makes it difficult to establish causality.
- A strength of natural experiments is they are high in ecological validity. Due to the 'real world' environment, the results relate to everyday behaviour and can be generalised to other settings.
- Another strength of natural experiments is they often produce no demand characteristics as the participants are unaware of the experiment. Therefore the behaviour observed is more likely to be realistic and indicative of behaviour that can be generalised across wider populations.
- A weakness of natural experiments is they are difficult to replicate to double check the findings. As the conditions are never exactly the same, it becomes difficult to establish reliability in such experiments which then affects validity as causality cannot be determined.
- Participants are often not aware of being observed or taking part in natural experiments and this raises ethical issues, in particular, informed consent. They may not wish to take part or be monitored and this is another weakness of natural experiments, although they may be debriefed after the experiment and given the option of giving consent to use the data collected from them.

One way psychologists find out about peoples behaviour is to quite simply ask them through the form of interviews.

Interviews involve a researcher in direct contact with the participant and this could either be face to face or via phone/video call. The vast majority of interviews involve a questionnaire that the researcher records the responses on at the time of the interview. There are different forms of interviews used which vary in structure and we will look at specifically structured and unstructured interviews for GCSE psychology.

## Structured Interviews

Structured interviews involve all participants being asked the same pre-set questions in the same order . The researcher is unable to ask additional questions outside of this.

The questions are often closed questions that require a yes or no response , or they can be open questions that simply require the researcher to record the participant’s response.

Open questions can be questions that begin with who, what, where, when, why and how.

These force a participant to explain their answers beyond simply saying yes or no.

## Strengths and Weaknesses of Structured Interviews

- Structured interviews can be replicated far more easily than unstructured interviews as the questions are all pre-set. This helps in testing the reliability of research findings to check for consistency and validity in the conclusions drawn.
- A criticism/weakness of using structured interviews is they can be incredibly time consuming and require skilled researchers. People’s responses can also be affected by social desirability bias.
- Structured interviews gather quantitative data but lack qualitative data. When participants can only answer yes or no, this does not tell us why they think or respond this way which may be more important to understand behaviour.

## Unstructured Interviews

In unstructured interviews, participants are free to discuss anything freely . The interviewer may devise new questions as the interview progresses or on the previous answers given, to explore further.

With unstructured interviews, each participant is likely to be asked different sets of questions within the interview. The questions asked in unstructured interviews may be a mix of open and closed questions.

## Strengths and Weaknesses of Unstructured Interviews

- Unstructured interviews provide rich and detailed information however they can not be replicable and people’s responses cannot be easily compared.
- Unstructured interviews have the benefit of allowing participants to explain their responses which can help us understand why they think or behave in particular ways which may be more valuable than structured interviews telling us merely how they would behave.
- Unstructured interviews can be more time-consuming as there is no structure or guideline to follow in regards to how many questions are being asked. They also require more trained interviewers who are able to articulate themselves and the questions they wish to ask, unlike structured interviews which can merely be read from a list and explained more easily.

Questionnaires are an example of a survey method that are used to collect large amounts of information from a target group that may be spread out across the country.

The researcher must design a set of questions for participants to answer; people taking part in a survey are referred to as “respondents” because their answers or behaviours are in response to the questions presented. Questionnaires can be conducted face to face, via phone or video call too.

Questionnaires are similar to structured interviews as respondents all answer the same questions, in the same order and they often narrow the possible responses to closed questions (yes or no answers).

## Strengths and Weaknesses of Questionnaires

- Questionnaires are practical ways for researchers to gather large amounts of information very quickly on topics where the responses are best suited for yes or no responses.
- Another strength of using questionnaires is that they can be replicated very easily as all the questions are pre-set. Responses can be gathered again to check for reliability and validity this way far more easily.
- Problems arise in the use of questionnaires when the questions are unclear or if they suggest or lead respondents into a desirable response. Responses can be affected by social desirability bias so participants may not necessarily answer truthfully which can invalidate findings.
- Another criticism of using questionnaires in research is respondents can only answer yes or no. This limits the amount of information that can be gathered but also participants may not be able to answer in certain terms yes to every presented scenario (or no). It may be that their responses only represent given situations but can be different in other situations.
- Respondents may misunderstand the meaning of questions and therefore answer incorrectly. Unlike structured interviews that allow participants to ask questions to clarify their understanding, respondents may misread or misunderstand questions and answer in a way that is not truly representative of their views.
- The researcher needs to make sure that in writing the questions, they are clear and unambiguous. This can be a difficult task to achieve and requires a great deal of time to construct questions that do not bias or lead the respondents into responses.

## Case Studies

A case study is a very detailed study of the life and background of either one person, a small group of people or an institution or an event . Case studies use information from a range of sources, such as the person concerned, related family members or even friends.

Various techniques may be used such as interviewing people or observing people as they engaged in daily life. Psychologists may also use various tests such as IQ tests, personality tests or some questionnaire to produce psychological data about the target in question.

Researchers may also refer to school or work records for an individual or carry out observations of the individual or groups in question. The case study is then written up as a description of the target individual or group and interpreted information based on psychological theories.

Case studies tend to be longitudinal and follow the target over a long period of time (often many years).

## Strengths and Weaknesses of Case Studies

- A strength of using case studies is they provide detailed information about individuals (or target group/institution) rather than collecting a score on a metric test from a person.
- Another benefit is case studies collect information over a long period of time so changes in behaviour can be observed and comparisons are drawn over this period to understand the changes.
- A weakness of using case studies is they target a single individual and this makes it difficult to generalise the findings to others. The situation or factors that influence this individual’s outcomes may not necessarily do the same for others due to individual differences. The data collected is also very subjective as it relies on usually peoples perceptions of things and their memories may not be so reliable over such a long period of time. There is also the risk that the researcher themselves projects their own biases onto the findings and makes their own interpretations of the content making the case study unreliable.
- There can be ethical concerns with using case studies as the people or group being followed are usually of interest because of some psychological problem. This could make them vulnerable and raise ethical concerns about whether they can give informed consent.

## Observation Studies and the Observational Method

In an observational study, the researcher watches or listens to the participants engaging in whatever behaviour is being studied and records their behaviour . In most natural observations, people are observed in their normal environments without interference from the researcher.

In some studies, a researcher may cause something to happen to gauge the responses of people and record these.

Here’s an example of one such study:

A nurse is called by a “doctor” via telephone and instructed to give medicine to a patient which is against the rules. The study was conducted in the nurses natural setting of the hospital and researchers then observe whether the nurse follows this instruction or not.

In some studies, the data may also be collected in a “laboratory setting” although this may not necessarily be a laboratory. This may be a natural setting that has been organised by the researcher to make it easier to observe the targets.

## Strengths and Weaknesses of Observation Studies and the Observational Method

- What people say is often very different from what they may do in a given situation. The observational method is high in ecological validity and its use is very suitable for social behaviours as it allows researchers to gauge peoples true responses. If participants were asked about their behaviour prior, they may give socially desirable responses which may not be what they would really do and observational studies allow us to see true behaviour without this bias.
- The behaviours observed in observational studies have higher external validity as they can be more easily generalised. Unlike laboratory studies that test participants under contrived circumstances (e.g. memorising lists of words to test memory), observational studies and their setup are more natural providing more ecologically valid results.
- A weakness, however, is although researchers see and record behaviour in an observational study, they do not know why the behaviour happened. This then requires the researcher to make a judgement on its cause which may be riddled with bias or may simply be incorrect.
- Participants or subjects may become aware of being observed and thus change their behaviour leading to researchers recording incorrect responses. Also, the researcher themselves may make a mistake recording the behaviour which can invalidate findings.
- Observational studies also raise ethical issues particularly around informed consent as participants are usually not aware of being observed or part of a study. Informing them prior may lead to their behaviour altering when they are aware of being observed however not informing them raises ethical issues of privacy and lack of consent.

## Categories of behaviour

In order to make sure that accurate records of behaviour are made, researchers use categories of behaviour systems.

If researchers wanted to observe “playground behaviour”, researchers would not necessarily know what they were looking for in this definition or what may be classified as “playground behaviour”. The observers would need to know what they are looking for to make accurate recordings and therefore behavioural categories are created to make it clear what behaviours are to be recorded.

## Inter-observer reliability

When an observation study is conducted, observers record the number of times certain behaviours occur (usually in the form of a tally chart).

This record of the number of incidents for the different behaviours needs to be accurate and ensure that the observer is recording the correct behaviour within the correct categories.

In observation studies, observers may miss the behaviour and so accuracy of recording the behaviour becomes an issue as it cannot be seen again in live environments. A solution to this problem is to design a record sheet with the pre-defined suitable behaviour categories and then have two observers independently observe the targets at the same time and location . Each would then record what they see in their own individual sheets independently from the other.

At the end of the study, the observers may compare their record sheets to check for consistency . If the sheets have been recorded correctly, they should have matching or very similar recordings of their observations. If this occurs, they have established inter-observer reliability. If the record sheets are considered vastly different, this would mean the study lacks inter-observer reliability and the results lack validity as they are not measuring what they are supposed to measure accurately.

## What is a Correlation?

For this section of Research Methods, we need to know about the following in relation to Correlations:

- An understanding of association between two variables and the use of scatter diagrams to show possible correlational relationships.
- The strengths and weaknesses of correlations.

A correlation is quite simply a relationship between two variables. There are 3 types of correlations which are:

- Positive correlation,
- Negative correlation
- Zero correlation

With positive and negative correlations, the relationship is seen as a “cause and effect” relationship whereby one variable has a direct impact on the other . Correlations form part of a statistical technique to analyse and display the possible relationship between the two variables.

Let’s work through a few subjective examples for each: Let’s assume there is a correlation (relationship) between the two variables age and beauty. As people get older they may be seen to be more beautiful. This would be considered a positive correlation because both the variables increase together .

If however people disagreed and thought that as people age and get older, they are less beautiful, this would be a negative correlation. This is because as one variable increases, the other one decreases which in our case would be age increasing while beauty decreases.

The third way of looking at this is thinking that age has no effect on perceived beauty. As people get older you may think this has no bearing on a person’s beauty so the two variables would be seen as having zero correlation.

Below we have some examples of scattergrams that give you an idea of how each correlation would look if presented to you. You may sometimes be asked to draw a line of central tendency too within a correlation; all this means is you draw a line down the middle of all the correlations with equal amounts on either side of it.

## Positive Correlations

Negative correlations, zero correlation, strengths and weaknesses of correlations.

- Correlational research can be very useful as they allow a researcher to see if two variables are connected in some way. Once a relationship has been established between two variables, a researcher can then use an experiment to try and find the true cause of the correlation.
- Correlational research can be used in situations where it may be unethical or impossible to carry out an experiment. For example, if we wanted to check for the relationship between smoking and cancer, this would be unethical to test (asking people to smoke to see if they develop cancer). However, plotting the rates of cancer developing in people who already smoke can help us establish links between these two variables. This knowledge can then be helpful in influencing future research.
- A weakness of using correlations is although this type of tool can tell us if two variables are related, it does not tell us which of the two variables caused the relationship. It is also possible that there may be third unknown variables that lay in between and influence the two we measure in research which may be the actual cause.
- For correlational research to be helpful, we first need to gather large amounts of data to establish the pattern in the scattergraph. This means researchers are required to make lots of measurements of both variables so that the patterns in the data can be reliably established. Using correlational research for small populations is not reliable so it can be very time-consuming establishing a large data set.

## Research Procedures

What the GCSE Psychology specification says you need to learn for this section on Research Procedures:

- Standardised procedures
- Instructions to participants
- Randomisation,
- Allocation to conditions

## Counterbalancing

- Extraneous variables (including explaining the effect of extraneous variables and how to control for them).

## Standardised Procedures

When conducting experiments, researchers need to ensure that standardised procedures are used.

Standardised procedures are a set of sequences that apply to all the participants when necessary to ensure the experiment is unbiased . Standardised procedures allow the researcher to try and control all the variables and events so the results of the experiment can be safely attributed to the independent variable.

## Instructions to Participants

When standardising procedures, another issue researchers need to be mindful of is how instructions to participants are put across to make sure they know what to do but without biasing the study in any way. This can include verbal and written instructions.

Instructions can be interpreted in a way that can influence their performance and these can become extraneous variables. For example, if instructions were worded with leading questions, this may cause participants to answer in one particular way. If instructions are ambiguous, this can also affect the results of the study.

To address this issue, the usual practice is to write as much information as possible for participants and ensure they all receive this same information. This is usually done in sections as follows:

- Briefings: this is where participants are encouraged to participate with a log of what is discussed to gain their consent. This can include ethical information about consent, anonymity, the right to withdraw etc.
- Standardised instructions are given: these are clear instructions given to each participant explaining their role and what they need to do.
- Debriefing: at the end of the study, participants are given a detailed explanation about the aims of it, what their role was and why they were given their tasks or roles. Ethical issues are also raised again with participants given the opportunity to withdraw their data/contributions if they feel unhappy about their performance or participation.

## Randomisation

Randomisation simply means to make sure there are no biases in the procedures .

Let’s use our music and learning example again for a moment to highlight how randomisation may be implemented in a psychological study.

Participants are being tested on their ability to learn through the use of 20 random words they are presented with. All the words are considered to be of equal difficulty because they are everyday nouns with only six letters. The researcher has to decide which order they should be presented to each of the participants in the study however instead of the researcher determining the order, randomisation is used.

All 20 words are written down on a piece of paper and put into a hat. They are then randomly selected one after the other with their order being written down in which they have been selected. This order is then determined to be the order to which all participants will be exposed within the experiment.

Using randomisation, all the words had an equal chance of selection and now with an order established, all participants will be exposed to them in the same way. Randomisation can be implemented in a number of ways within an experiment to filter out biases and you may be given a question on how to best implement this or its benefits.

Another major issue researchers face, is how to allocate the participants to the experimental condition or control group.

To reduce researcher bias, two methods used are random allocation and counterbalancing .

## Random Allocation

When the design of the study uses an independent group design, the researcher can use random allocation to avoid any potential researcher bias. Participants can be randomly selected in turns for either condition A or condition B by pulling their name out of a hat for example.

A similar method can be employed if the design of the experiment is a matched pairs design. Participants can be randomly allocated to their pairs by them pulling out the letters for each pair from a hat e.g. the two people who pull out A+a from a hat form a pair, the same with B+b, C+c etc and so forth.

For experimental designs such as the repeated measures design, all the participants are required to take part in the experiment for both conditions. The problem with this is that order effects can occur whereby participants learn from experience and thus do better in all the following conditions after their initial one.

Counterbalancing helps balance out order effects by splitting the group of participants into two groups. One half will then complete condition 1 while the other half complete condition 2.

After completing this, they swap and complete the opposite condition so those who completed condition 1, then move on to complete condition 2, those that completed condition 2, go on to complete condition 1.

Using counterbalancing does not get rid of order effects but allows for the effects of it to be balanced out equally between the two conditions for participants and thus providing more valid results.

## Ethical considerations

For Ethical Considerations, the specification states you need to know the following:

- Ethical issues in psychological research as outlined in the British Psychological Society guidelines
- Ways of dealing with each of these issues.

This next section focuses on all the ethical considerations based on the British Psychological Society guidelines and ways in which each can be dealt.

Ethical issues arise when there are two conflicting points of view;

- One is what the researcher needs to do in order to conduct a useful and meaningful study
- The second is the rights of the participants which need to be considered .

Ethical issues are therefore all the conflicts that arise about what is acceptable to do as part of the research.

As part of your GCSE psychology course, you need to be able to highlight ethical concerns and generate ways in which to deal with them. You may also be given a scenario where you need to highlight the relevant concerns and comment on how to deal with them.

The Code of Ethics and Conduct (2009) and Code of Human Research Ethics (2014) from the British Psychological Society underpin the activities of all practising psychologists.

## What Are The British Psychological Society Guidelines?

When research is conducted by any practising psychologist, The Code of Ethics and Conduct (2009) and Code of Human Research Ethics (2014) will underpin their work.

The British Psychological Society (BPS) guidelines explain what is required:

- Participants should be respected as individuals and unfair or prejudiced practices are to be avoided.
- The data collected should also be confidential and anonymised so participants cannot be identified from the research.
- Participants should have also given informed consent and know fully what they are consenting to. They should also be told at the beginning what the study is about prior to taking part.
- Deception must be avoided although the BPS recognises that some studies are not possible without this to gather meaningful results. Any deceptions that do take place must be explained to participants as soon as possible once the study concludes.
- They should also be aware of their right to withdraw from the study at any time.

Psychologists should maintain high standards in their professional work which includes:

- Being aware of the code of conduct
- Recognising that ethical dilemmas will inevitably arise and seeking to resolve them
- They should only give advice if they are qualified to do so and not trying to do things that are beyond their competence.
- Staying within the law if ethical principles conflict with the law but try to maintain the ethical principles as far as possible
- Monitoring their own health and lifestyle to recognise times when they may be unable to carry out their work competently

## Responsibility

Responsibility within the British Psychological Society (BPS) is generally about avoiding harm to clients, avoiding misconduct that would bring psychology into disrepute and looking out for other psychologists that may be breaching these guidelines.

The BPS states researchers should:

"Consider all research from the standpoint of research participants, for the purpose of eliminating potential risks to psychological well-being, physical health, personal values, or dignity"

This can be done by:

- Ensuring researchers protect participants from physical and psychological harm.
- Making sure the risk of physical or psychological harm is no greater than what one would expect from everyday life and their wellbeing should not be at risk.
- At the end of the experiment, participants should be debriefed at the end of the investigation so they fully understand the true aim of the study. This would then allow them to make an informed decision about whether they wish to withdraw their results.

## Informed Consent

Informed consent means revealing to the participant the real aims of the study or telling them what will happen within the study. This becomes an ethical issue because revealing the true aims or details may lead to the participants adjusting their behaviour which could lead to invalid results.

For example, if we wanted to study whether people are more likely to obey a male or female as part of research into obedience, revealing the aims of this study will almost certainly affect their behaviour and invalidate findings.

Researchers may therefore not always give out the full details of the study however this means participants can not give their full informed consent. From a participants point of view, they should be told what they are required to do in the study so they can make an informed decision about whether they wish to take part.

This became a basic human right that was established during the Nuremberg war trials after the second world war. During the war, Nazi doctors conducted various experiments on prisoners without their consent and the war trials afterwards decided that consent should become a basic human right for participants to be involved in a study.

Epstein and Lasagna found that only a third of participants volunteering for experiments really understood what they had agreed to take part in despite giving informed consent. This demonstrates that even if researchers sought to and obtained informed consent, this does not always guarantee that participants understand what they are involved in or doing.

## How to deal with ethical issues of informed consent

- Participants could be asked to formally indicate their agreement to take part based on information concerning the nature and purpose of the study and how their role fits in.
- Presumptive consent may also be gained; this can be done by asking a group of people whether they feel a planned study is acceptable and assume that the participants themselves would have felt the same if given the opportunity to say so.
- Researchers can offer the right to withdraw at any stage of the study to participants so if at any stage they feel uncomfortable or do not wish to continue, they can exit the research.

Some experiments require deception about the true aims of research otherwise participants might alter their behaviour and the study’s findings become meaningless . A distinction could be made in some cases between withholding some details about the study (reasonably acceptable) compared to deliberately providing false information (less acceptable).

From the participant’s point of view, deception would be unethical and thus they should not be misled without good reason.

An issue with deception is it prevents participants from giving informed consent . Participants may agree to take part without fully knowing what they have agreed to and become quite distressed by the experience. Baumrind (1985) argued that deception was morally wrong based on three generally accepted ethical rules within western society: the right of informed consent, the obligation of researchers to protect the welfare of participants and the responsibility of the researcher to be trustworthy.

Others have argued that deception can be harmless in some studies i.e. testing memory, and deception may be necessary to gain meaningful insights that would not be otherwise possible.

## How to deal with ethical issues of deception

- The need for deception in research could be approved by an ethics committee which weighs up the potential benefits of the research, against the costs to participants.
- Participants should be fully debriefed after the study and given the opportunity to request that their data is withheld.

## The Right to Withdraw

Participants would deem the right to withdraw from an experiment as important. If a participant begins to feel distressed or uncomfortable, they should have the right to withdraw from the study. This becomes more important particularly if they have been deceived about the nature of the study or their role.

From a researchers point of view, participants being able to withdraw midway through a study could bias the results in some way when comparing the results of those that stayed.

Within some experiments, participants are offered financial payments for completing the study and withdrawing is compromised because they may not get paid and thus feel like they can not withdraw.

## Confidentiality

A researcher may find that maintaining confidentiality can be difficult as they wish to publish the findings. They may guarantee anonymity and withhold the participants’ names, but even then it may be evident for some who the participants are.

In some locations or communities which are remote or the population is low, naming even the geographical area can identify the individual. The Data Protection Act makes confidentiality a legal right and it is only acceptable for a person’s data to be recorded if it does not make it available in a form that can make the people identifiable.

To tackle this researchers should not record any names or personal details about the participants using numbers or fake names instead.

Privacy may be difficult to accomplish from a researchers point of view, particularly when studying participants without their awareness.

Participants may feel that they should not be expected to be observed or watched by others in some situations e.g. within the privacy of their own homes although not when in public areas such as a park.

To tackle this researchers should not observe anyone without their informed consent unless it is in a public place where this may be expected to some degree. Participants could also be asked to give their retrospective consent or withhold the data entirely.

## Data Handling

What the GCSE Psychology specification says you need to learn for this section:

- Quantitive and qualitative data
- Primary and secondary data
- Computation
- Descriptive statistics
- Interpretation and display of quantitative data
- Normal distributions

There are two types of data research studies collect which are:

- Quantitive data
- Qualitative data

## What is Quantitive data?

- Primary data is data that has been collected firsthand from the source (participants) directly by researchers. The majority of data collected in psychological research will be primary data.
- Secondary data is data that has been already published and simply used by researchers in their own work.

## Strengths and Weaknesses of Quantitive Data

- Quantitive data tends to be objective and easy to measure for researchers.
- Precise measures are used,
- The data is high in reliability and can be checked through replication.
- The data can be more easily examined to check for patterns through the use of correlations and presented in the form of scattergrams.
- Weaknesses of quantitive data include the possibility that meaningful details could be lost or lacking as researchers focus on a narrow set of responses or pre-defined questions people answer.

## What is Qualitative Data?

Qualitative data is descriptive data that is non-numerical. This type of data provides detailed information which can provide insights into the thoughts and behaviours of individuals because the answers are not restricted to yes or no responses. For example, in an observational study, researchers may describe what they see and this would be deemed a form of qualitative data.

Qualitative data tends to be collected through the use of open questions (questions that begin with who, what, where, when, why or how) that encourage participants to explain themselves. This is done usually through questionnaires or unstructured interviews.

Qualitative data cannot be counted or quantified as easily although it can be placed into categories to count the frequency in which it is reported to occur. For example, we may be able to count how many times participants in Milgram’s study reported being stressed or worried.

However as the responses from participants can be completely subjective to them, the data can be incredibly varied based on their responses and difficult to quantify or generalise with any meaning.

## Strengths and Weaknesses of Qualitative Data

- A major strength of qualitative data is it tends to be rich in detail.
- Another strength is qualitative data can help researchers understand peoples attitudes, thoughts and beliefs which may better explain their behaviour rather than them having to guess.
- A weakness of using qualitative data is it tends to be completely subjective.
- Qualitative data tends to also be an imprecise measure that is difficult to quantify.
- Another criticism of qualitative data is the difficulty in checking for reliability as participants all give subjective responses. This makes it difficult to generalise to other people.

## What Is Primary and Secondary Data?

What is the mean, median, mode and range.

There are three types of averages that can be calculated from the raw data obtained from studies which allow researchers to identify patterns in the behaviour.

These three are:

- The mean average
- The median average

You can also work out the range although this is not an average.

## The Mean Average

The Mean Average is calculated by adding together all the values in a set of scores and then dividing that number by the number of values in the set .

For example, if we wanted to work out the mean average for what Brad Pitts score would be on a beauty scale from us questioning 12 people, we would take their scores, add them up and then divide them by the number of people in the study (in our case, this would be 12).

Let’s work through an example assuming that the beauty score is out of 10:

So to work out the mean average we would need to add up all the scores all 12 people have given – this would then be:

- 7 + 8 + 7 + 8 + 9 + 7 + 10 + 8 + 7 + 6 + 9 + 8 = 94
- 94 is the total score of all 12 participants. We then divide this by 12 (the number of participants)
- 94 divided by 12 = 7.83

Brad Pitt’s mean average score would be 7.83/10 (out of 10)

The Median is the middle value of a set of scores.

- To calculate the median you must arrange all the values in order from the lowest to the highest .
- Then you must find the middle value . If there isn’t an obvious middle value due to an even number set, you work out the midpoint of the of the two middle values.

Let’s work out the median value using the example above; we must first order the numbers from lowest to highest.

6, 7, 7, 7, 7 , 8, 8, 8, 8, 9, 9, 10

Having ordered the numbers we can see that the midpoint is 8 either side. Therefore the median in the example is 8.

The Mode is the most frequently occurring value in a set of scores.

Sometimes there may be no mode (if no number occurs more frequently than another) or there can be more than one mode.

Let’s work out the mode using the example above again by first writing down all the scores; we must first order the numbers from lowest to highest.

6, 7, 7, 7, 7 , 8, 8, 8, 8 , 9, 9, 10

We can see that the mode is 7 and 8 because they both appear the most which is 4 times.

The Range is the numerical difference between the highest and lowest set of scores.

So using our example above we can see that the highest score is 10 and the lowest score is 6.

We therefore minus 6 from 10 as follows:

The range is therefore 4.

## Ratios, Fractions, Percentages

This section focuses on recognising and using expressions in decimal and standard form.

These include:

- Fractions/decimals

## Percentages

A ratio is a way of comparing the amounts of something between each other and this is usually expressed in its simplest form.

If we had 15 boys and 12 girls in one class and we wanted to compare this as a ratio, this would be 15:12.

When we break this down into its simplest form this would be 5:4 because we can divide both sides by 3.

## Fractions and Decimals

A fraction is a way of expressing a part of a whole number.

For example, if we had a group of 20 boys and 15 of those produced the action of running which we wished to express, the fraction would be 15/20 or 3/4 in its simplest form.

As a decimal, this may be expressed as 0.75 as the total or whole amount is always represented as 1. The number of boys that did not express running would, therefore, be 0.25

Percentages are a way of expressing a fraction of a hundred which is considered the full amount.

So 50/100 would be expressed as 50% (percent). This is sometimes used in psychology to express how often something happens e.g. running occurred 75% of the time.

So using the example before, if we had a group of 20 boys and 15 of them were seen to be running and we wanted to work out the percentage of this, we could calculate it in the following way:

- 15 x 100 divided by 20 (total no. of people) = 75%.

So to rephrase:

- 15 (boys) x 100 (the whole amount) divide by 20 (the total number of boys) = 75%

Bar charts are used to display data that is in categories.

Each bar represents a separate category with them labelled across the x-axis which is at the bottom (horizontal). The frequency or amount for each category is labelled on the y-axis which runs along the side (vertical). The bars drawn should not touch and be separated from one another.

Here’s a picture of the one we used earlier to measure the hypothetical study of beauty:

Histograms are used to present data that are continuous measurements such as test scores or even height.

The continuous scores are on the x-axis across the bottom and the frequency of these scores are on the y-axis. Histograms have no spaces between the bars (unlike bar charts) as the data is continuous.

Here’s an example below:

## Scattergrams

We’ve already looked at scattergrams when discussing correlations earlier.

Here is an example of a Scattergram showing a positive correlation below – notice how all the recording measure along an invisible line almost going diagonally across:

## Normal Distributions

The normal distribution is the predicted distribution when considering an equally likely set of results.

On a graph, this shows as a bell-shaped curve encompassing the mean, median and mode .

For example, in an IQ test, most scores for the whole population would be around the mean average with decreasing scores away from this for those with lower IQ’s as well as higher.

In a normal distribution the mean, median and mode scores tend to be of very similar value when plotted to produce a distinctive curve. The curve shape is what we call the normal distribution curve.

Here is an example of a normal distribution curve below:

## Leave a Reply Cancel reply

You must be logged in to post a comment.

Get Free Resources For Your School!

Welcome Back.

Don’t have an account? Create Now

Username or Email Address

Remember Me

Create a free account.

Already have an account? Login Here

Want to create or adapt books like this? Learn more about how Pressbooks supports open publishing practices.

Chapter 13: Inferential Statistics

## Understanding Null Hypothesis Testing

Learning Objectives

- Explain the purpose of null hypothesis testing, including the role of sampling error.
- Describe the basic logic of null hypothesis testing.
- Describe the role of relationship strength and sample size in determining statistical significance and make reasonable judgments about statistical significance based on these two factors.

## The Purpose of Null Hypothesis Testing

As we have seen, psychological research typically involves measuring one or more variables for a sample and computing descriptive statistics for that sample. In general, however, the researcher’s goal is not to draw conclusions about that sample but to draw conclusions about the population that the sample was selected from. Thus researchers must use sample statistics to draw conclusions about the corresponding values in the population. These corresponding values in the population are called parameters . Imagine, for example, that a researcher measures the number of depressive symptoms exhibited by each of 50 clinically depressed adults and computes the mean number of symptoms. The researcher probably wants to use this sample statistic (the mean number of symptoms for the sample) to draw conclusions about the corresponding population parameter (the mean number of symptoms for clinically depressed adults).

Unfortunately, sample statistics are not perfect estimates of their corresponding population parameters. This is because there is a certain amount of random variability in any statistic from sample to sample. The mean number of depressive symptoms might be 8.73 in one sample of clinically depressed adults, 6.45 in a second sample, and 9.44 in a third—even though these samples are selected randomly from the same population. Similarly, the correlation (Pearson’s r ) between two variables might be +.24 in one sample, −.04 in a second sample, and +.15 in a third—again, even though these samples are selected randomly from the same population. This random variability in a statistic from sample to sample is called sampling error . (Note that the term error here refers to random variability and does not imply that anyone has made a mistake. No one “commits a sampling error.”)

One implication of this is that when there is a statistical relationship in a sample, it is not always clear that there is a statistical relationship in the population. A small difference between two group means in a sample might indicate that there is a small difference between the two group means in the population. But it could also be that there is no difference between the means in the population and that the difference in the sample is just a matter of sampling error. Similarly, a Pearson’s r value of −.29 in a sample might mean that there is a negative relationship in the population. But it could also be that there is no relationship in the population and that the relationship in the sample is just a matter of sampling error.

In fact, any statistical relationship in a sample can be interpreted in two ways:

- There is a relationship in the population, and the relationship in the sample reflects this.
- There is no relationship in the population, and the relationship in the sample reflects only sampling error.

The purpose of null hypothesis testing is simply to help researchers decide between these two interpretations.

## The Logic of Null Hypothesis Testing

Null hypothesis testing is a formal approach to deciding between two interpretations of a statistical relationship in a sample. One interpretation is called the null hypothesis (often symbolized H 0 and read as “H-naught”). This is the idea that there is no relationship in the population and that the relationship in the sample reflects only sampling error. Informally, the null hypothesis is that the sample relationship “occurred by chance.” The other interpretation is called the alternative hypothesis (often symbolized as H 1 ). This is the idea that there is a relationship in the population and that the relationship in the sample reflects this relationship in the population.

Again, every statistical relationship in a sample can be interpreted in either of these two ways: It might have occurred by chance, or it might reflect a relationship in the population. So researchers need a way to decide between them. Although there are many specific null hypothesis testing techniques, they are all based on the same general logic. The steps are as follows:

- Assume for the moment that the null hypothesis is true. There is no relationship between the variables in the population.
- Determine how likely the sample relationship would be if the null hypothesis were true.
- If the sample relationship would be extremely unlikely, then reject the null hypothesis in favour of the alternative hypothesis. If it would not be extremely unlikely, then retain the null hypothesis .

Following this logic, we can begin to understand why Mehl and his colleagues concluded that there is no difference in talkativeness between women and men in the population. In essence, they asked the following question: “If there were no difference in the population, how likely is it that we would find a small difference of d = 0.06 in our sample?” Their answer to this question was that this sample relationship would be fairly likely if the null hypothesis were true. Therefore, they retained the null hypothesis—concluding that there is no evidence of a sex difference in the population. We can also see why Kanner and his colleagues concluded that there is a correlation between hassles and symptoms in the population. They asked, “If the null hypothesis were true, how likely is it that we would find a strong correlation of +.60 in our sample?” Their answer to this question was that this sample relationship would be fairly unlikely if the null hypothesis were true. Therefore, they rejected the null hypothesis in favour of the alternative hypothesis—concluding that there is a positive correlation between these variables in the population.

A crucial step in null hypothesis testing is finding the likelihood of the sample result if the null hypothesis were true. This probability is called the p value . A low p value means that the sample result would be unlikely if the null hypothesis were true and leads to the rejection of the null hypothesis. A high p value means that the sample result would be likely if the null hypothesis were true and leads to the retention of the null hypothesis. But how low must the p value be before the sample result is considered unlikely enough to reject the null hypothesis? In null hypothesis testing, this criterion is called α (alpha) and is almost always set to .05. If there is less than a 5% chance of a result as extreme as the sample result if the null hypothesis were true, then the null hypothesis is rejected. When this happens, the result is said to be statistically significant . If there is greater than a 5% chance of a result as extreme as the sample result when the null hypothesis is true, then the null hypothesis is retained. This does not necessarily mean that the researcher accepts the null hypothesis as true—only that there is not currently enough evidence to conclude that it is true. Researchers often use the expression “fail to reject the null hypothesis” rather than “retain the null hypothesis,” but they never use the expression “accept the null hypothesis.”

The Misunderstood p Value

The p value is one of the most misunderstood quantities in psychological research (Cohen, 1994) [1] . Even professional researchers misinterpret it, and it is not unusual for such misinterpretations to appear in statistics textbooks!

The most common misinterpretation is that the p value is the probability that the null hypothesis is true—that the sample result occurred by chance. For example, a misguided researcher might say that because the p value is .02, there is only a 2% chance that the result is due to chance and a 98% chance that it reflects a real relationship in the population. But this is incorrect . The p value is really the probability of a result at least as extreme as the sample result if the null hypothesis were true. So a p value of .02 means that if the null hypothesis were true, a sample result this extreme would occur only 2% of the time.

You can avoid this misunderstanding by remembering that the p value is not the probability that any particular hypothesis is true or false. Instead, it is the probability of obtaining the sample result if the null hypothesis were true.

## Role of Sample Size and Relationship Strength

Recall that null hypothesis testing involves answering the question, “If the null hypothesis were true, what is the probability of a sample result as extreme as this one?” In other words, “What is the p value?” It can be helpful to see that the answer to this question depends on just two considerations: the strength of the relationship and the size of the sample. Specifically, the stronger the sample relationship and the larger the sample, the less likely the result would be if the null hypothesis were true. That is, the lower the p value. This should make sense. Imagine a study in which a sample of 500 women is compared with a sample of 500 men in terms of some psychological characteristic, and Cohen’s d is a strong 0.50. If there were really no sex difference in the population, then a result this strong based on such a large sample should seem highly unlikely. Now imagine a similar study in which a sample of three women is compared with a sample of three men, and Cohen’s d is a weak 0.10. If there were no sex difference in the population, then a relationship this weak based on such a small sample should seem likely. And this is precisely why the null hypothesis would be rejected in the first example and retained in the second.

Of course, sometimes the result can be weak and the sample large, or the result can be strong and the sample small. In these cases, the two considerations trade off against each other so that a weak result can be statistically significant if the sample is large enough and a strong relationship can be statistically significant even if the sample is small. Table 13.1 shows roughly how relationship strength and sample size combine to determine whether a sample result is statistically significant. The columns of the table represent the three levels of relationship strength: weak, medium, and strong. The rows represent four sample sizes that can be considered small, medium, large, and extra large in the context of psychological research. Thus each cell in the table represents a combination of relationship strength and sample size. If a cell contains the word Yes , then this combination would be statistically significant for both Cohen’s d and Pearson’s r . If it contains the word No , then it would not be statistically significant for either. There is one cell where the decision for d and r would be different and another where it might be different depending on some additional considerations, which are discussed in Section 13.2 “Some Basic Null Hypothesis Tests”

Although Table 13.1 provides only a rough guideline, it shows very clearly that weak relationships based on medium or small samples are never statistically significant and that strong relationships based on medium or larger samples are always statistically significant. If you keep this lesson in mind, you will often know whether a result is statistically significant based on the descriptive statistics alone. It is extremely useful to be able to develop this kind of intuitive judgment. One reason is that it allows you to develop expectations about how your formal null hypothesis tests are going to come out, which in turn allows you to detect problems in your analyses. For example, if your sample relationship is strong and your sample is medium, then you would expect to reject the null hypothesis. If for some reason your formal null hypothesis test indicates otherwise, then you need to double-check your computations and interpretations. A second reason is that the ability to make this kind of intuitive judgment is an indication that you understand the basic logic of this approach in addition to being able to do the computations.

## Statistical Significance Versus Practical Significance

Table 13.1 illustrates another extremely important point. A statistically significant result is not necessarily a strong one. Even a very weak result can be statistically significant if it is based on a large enough sample. This is closely related to Janet Shibley Hyde’s argument about sex differences (Hyde, 2007) [2] . The differences between women and men in mathematical problem solving and leadership ability are statistically significant. But the word significant can cause people to interpret these differences as strong and important—perhaps even important enough to influence the college courses they take or even who they vote for. As we have seen, however, these statistically significant differences are actually quite weak—perhaps even “trivial.”

This is why it is important to distinguish between the statistical significance of a result and the practical significance of that result. Practical significance refers to the importance or usefulness of the result in some real-world context. Many sex differences are statistically significant—and may even be interesting for purely scientific reasons—but they are not practically significant. In clinical practice, this same concept is often referred to as “clinical significance.” For example, a study on a new treatment for social phobia might show that it produces a statistically significant positive effect. Yet this effect still might not be strong enough to justify the time, effort, and other costs of putting it into practice—especially if easier and cheaper treatments that work almost as well already exist. Although statistically significant, this result would be said to lack practical or clinical significance.

Key Takeaways

- Null hypothesis testing is a formal approach to deciding whether a statistical relationship in a sample reflects a real relationship in the population or is just due to chance.
- The logic of null hypothesis testing involves assuming that the null hypothesis is true, finding how likely the sample result would be if this assumption were correct, and then making a decision. If the sample result would be unlikely if the null hypothesis were true, then it is rejected in favour of the alternative hypothesis. If it would not be unlikely, then the null hypothesis is retained.
- The probability of obtaining the sample result if the null hypothesis were true (the p value) is based on two considerations: relationship strength and sample size. Reasonable judgments about whether a sample relationship is statistically significant can often be made by quickly considering these two factors.
- Statistical significance is not the same as relationship strength or importance. Even weak relationships can be statistically significant if the sample size is large enough. It is important to consider relationship strength and the practical significance of a result in addition to its statistical significance.
- Discussion: Imagine a study showing that people who eat more broccoli tend to be happier. Explain for someone who knows nothing about statistics why the researchers would conduct a null hypothesis test.
- The correlation between two variables is r = −.78 based on a sample size of 137.
- The mean score on a psychological characteristic for women is 25 ( SD = 5) and the mean score for men is 24 ( SD = 5). There were 12 women and 10 men in this study.
- In a memory experiment, the mean number of items recalled by the 40 participants in Condition A was 0.50 standard deviations greater than the mean number recalled by the 40 participants in Condition B.
- In another memory experiment, the mean scores for participants in Condition A and Condition B came out exactly the same!
- A student finds a correlation of r = .04 between the number of units the students in his research methods class are taking and the students’ level of stress.

## Long Descriptions

“Null Hypothesis” long description: A comic depicting a man and a woman talking in the foreground. In the background is a child working at a desk. The man says to the woman, “I can’t believe schools are still teaching kids about the null hypothesis. I remember reading a big study that conclusively disproved it years ago.” [Return to “Null Hypothesis”]

“Conditional Risk” long description: A comic depicting two hikers beside a tree during a thunderstorm. A bolt of lightning goes “crack” in the dark sky as thunder booms. One of the hikers says, “Whoa! We should get inside!” The other hiker says, “It’s okay! Lightning only kills about 45 Americans a year, so the chances of dying are only one in 7,000,000. Let’s go on!” The comic’s caption says, “The annual death rate among people who know that statistic is one in six.” [Return to “Conditional Risk”]

## Media Attributions

- Null Hypothesis by XKCD CC BY-NC (Attribution NonCommercial)
- Conditional Risk by XKCD CC BY-NC (Attribution NonCommercial)
- Cohen, J. (1994). The world is round: p < .05. American Psychologist, 49 , 997–1003. ↵
- Hyde, J. S. (2007). New directions in the study of gender similarities and differences. Current Directions in Psychological Science, 16 , 259–263. ↵

Values in a population that correspond to variables measured in a study.

The random variability in a statistic from sample to sample.

A formal approach to deciding between two interpretations of a statistical relationship in a sample.

The idea that there is no relationship in the population and that the relationship in the sample reflects only sampling error.

The idea that there is a relationship in the population and that the relationship in the sample reflects this relationship in the population.

When the relationship found in the sample would be extremely unlikely, the idea that the relationship occurred “by chance” is rejected.

When the relationship found in the sample is likely to have occurred by chance, the null hypothesis is not rejected.

The probability that, if the null hypothesis were true, the result found in the sample would occur.

How low the p value must be before the sample result is considered unlikely in null hypothesis testing.

When there is less than a 5% chance of a result as extreme as the sample result occurring and the null hypothesis is rejected.

Research Methods in Psychology - 2nd Canadian Edition by Paul C. Price, Rajiv Jhangiani, & I-Chant A. Chiang is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License , except where otherwise noted.

## Share This Book

- International
- Schools directory
- Resources Jobs Schools directory News Search

## AQA GCSE Psychology - Hypotheses and variables (Research Methods Lesson 1)

Subject: Psychology

Age range: 14-16

Resource type: Lesson (complete)

Last updated

6 February 2023

- Share through email
- Share through twitter
- Share through linkedin
- Share through facebook
- Share through pinterest

This is the first lesson in the topic of Research Methods in the AQA GCSE Psychology 8182 course.

I don’t deliver this topic all in one go, I interleave the lessons throughout the other topics. If you are teaching the whole topic in one go, I have tried to make the whole series of lessons about the same imaginary investigation for some consistency and a continued thread of learning.

The lesson covers formulating an aim, null & alternative hypotheses and independent & dependent variables in psychological studies and investigations. LO1: Formulate an aim for an investigation. LO2: Identify independent and dependent variables in investigations. LO3: Compare alternative hypotheses and null hypotheses.

The activities include taking the students through the beginnings of a plan for an investigation or study. There are lots of worked examples, notes and practice questions.

I designed this as a single lesson (50 mins) but can be extended or shortened as necessary.

My lessons follow the textbook ‘AQA Psychology for GCSE’ from Illuminate Publishing.

Creative Commons "NoDerivatives"

## Get this resource as part of a bundle and save up to 13%

A bundle is a package of resources grouped together to teach a particular topic, or a series of lessons, in one place.

## AQA GCSE Psychology - Research Methods Lesson 1-5

This bundle contains the first five lessons in the topic of Research Methods from the AQA GCSE in Psychology 8182. Personally, I don't teach this topic all in one go, but rather I interleave them through the other topics as each lesson becomes relevant to the students' learning. Lessons included: 1. Hypotheses and variables (writing hypotheses, understanding independent and dependent variables as well as operationalisation of the varaibles). 2. Extraneous variables (understanding how to limit extraneous variables as well as employing standardised procedure and using randomisation). 3. Types of experiment (natural, laboratory and field experiments). 4. Experimental designs (quantitative and qualitative methods, independent groups, matched pairs and repeated measures). 5. Sampling methods (random, volunteers, systematic, stratified, opportunity).

Your rating is required to reflect your happiness.

It's good to leave some feedback.

Something went wrong, please try again later.

This resource hasn't been reviewed yet

To ensure quality for our reviews, only customers who have downloaded this resource can review it

Report this resource to let us know if it violates our terms and conditions. Our customer service team will review your report and will be in touch.

## Not quite what you were looking for? Search by keyword to find the right resource:

## Aims And Hypotheses, Directional And Non-Directional

March 7, 2021 - paper 2 psychology in context | research methods.

- Back to Paper 2 - Research Methods

In Psychology, hypotheses are predictions made by the researcher about the outcome of a study. The research can chose to make a specific prediction about what they feel will happen in their research (a directional hypothesis) or they can make a ‘general,’ ‘less specific’ prediction about the outcome of their research (a non-directional hypothesis). The type of prediction that a researcher makes is usually dependent on whether or not any previous research has also investigated their research aim.

## Variables Recap:

The independent variable (IV) is the variable that psychologists manipulate/change to see if changing this variable has an effect on the depen dent variable (DV).

The dependent variable (DV) is the variable that the psychologists measures (to see if the IV has had an effect).

It is important that the only variable that is changed in research is the independent variable (IV), all other variables have to be kept constant across the control condition and the experimental conditions. Only then will researchers be able to observe the true effects of just the independent variable (IV) on the dependent variable (DV).

## Research/Experimental Aim(S):

An aim is a clear and precise statement of the purpose of the study. It is a statement of why a research study is taking place. This should include what is being studied and what the study is trying to achieve. (e.g. “This study aims to investigate the effects of alcohol on reaction times”.

It is important that aims created in research are realistic and ethical.

## Hypotheses:

This is a testable statement that predicts what the researcher expects to happen in their research. The research study itself is therefore a means of testing whether or not the hypothesis is supported by the findings. If the findings do support the hypothesis then the hypothesis can be retained (i.e., accepted), but if not, then it must be rejected.

Three Different Hypotheses:

We're not around right now. But you can send us an email and we'll get back to you, asap.

Start typing and press Enter to search

Cookie Policy - Terms and Conditions - Privacy Policy

Want to create or adapt books like this? Learn more about how Pressbooks supports open publishing practices.

Inferential Statistics

## Learning Objectives

- Explain the purpose of null hypothesis testing, including the role of sampling error.
- Describe the basic logic of null hypothesis testing.
- Describe the role of relationship strength and sample size in determining statistical significance and make reasonable judgments about statistical significance based on these two factors.

## The Purpose of Null Hypothesis Testing

As we have seen, psychological research typically involves measuring one or more variables in a sample and computing descriptive summary data (e.g., means, correlation coefficients) for those variables. These descriptive data for the sample are called statistics . In general, however, the researcher’s goal is not to draw conclusions about that sample but to draw conclusions about the population that the sample was selected from. Thus researchers must use sample statistics to draw conclusions about the corresponding values in the population. These corresponding values in the population are called parameters . Imagine, for example, that a researcher measures the number of depressive symptoms exhibited by each of 50 adults with clinical depression and computes the mean number of symptoms. The researcher probably wants to use this sample statistic (the mean number of symptoms for the sample) to draw conclusions about the corresponding population parameter (the mean number of symptoms for adults with clinical depression).

Unfortunately, sample statistics are not perfect estimates of their corresponding population parameters. This is because there is a certain amount of random variability in any statistic from sample to sample. The mean number of depressive symptoms might be 8.73 in one sample of adults with clinical depression, 6.45 in a second sample, and 9.44 in a third—even though these samples are selected randomly from the same population. Similarly, the correlation (Pearson’s r ) between two variables might be +.24 in one sample, −.04 in a second sample, and +.15 in a third—again, even though these samples are selected randomly from the same population. This random variability in a statistic from sample to sample is called sampling error . (Note that the term error here refers to random variability and does not imply that anyone has made a mistake. No one “commits a sampling error.”)

One implication of this is that when there is a statistical relationship in a sample, it is not always clear that there is a statistical relationship in the population. A small difference between two group means in a sample might indicate that there is a small difference between the two group means in the population. But it could also be that there is no difference between the means in the population and that the difference in the sample is just a matter of sampling error. Similarly, a Pearson’s r value of −.29 in a sample might mean that there is a negative relationship in the population. But it could also be that there is no relationship in the population and that the relationship in the sample is just a matter of sampling error.

In fact, any statistical relationship in a sample can be interpreted in two ways:

- There is a relationship in the population, and the relationship in the sample reflects this.
- There is no relationship in the population, and the relationship in the sample reflects only sampling error.

The purpose of null hypothesis testing is simply to help researchers decide between these two interpretations.

## The Logic of Null Hypothesis Testing

Null hypothesis testing (often called null hypothesis significance testing or NHST) is a formal approach to deciding between two interpretations of a statistical relationship in a sample. One interpretation is called the null hypothesis (often symbolized H 0 and read as “H-zero”). This is the idea that there is no relationship in the population and that the relationship in the sample reflects only sampling error. Informally, the null hypothesis is that the sample relationship “occurred by chance.” The other interpretation is called the alternative hypothesis (often symbolized as H 1 ). This is the idea that there is a relationship in the population and that the relationship in the sample reflects this relationship in the population.

Again, every statistical relationship in a sample can be interpreted in either of these two ways: It might have occurred by chance, or it might reflect a relationship in the population. So researchers need a way to decide between them. Although there are many specific null hypothesis testing techniques, they are all based on the same general logic. The steps are as follows:

- Assume for the moment that the null hypothesis is true. There is no relationship between the variables in the population.
- Determine how likely the sample relationship would be if the null hypothesis were true.
- If the sample relationship would be extremely unlikely, then reject the null hypothesis in favor of the alternative hypothesis. If it would not be extremely unlikely, then retain the null hypothesis .

Following this logic, we can begin to understand why Mehl and his colleagues concluded that there is no difference in talkativeness between women and men in the population. In essence, they asked the following question: “If there were no difference in the population, how likely is it that we would find a small difference of d = 0.06 in our sample?” Their answer to this question was that this sample relationship would be fairly likely if the null hypothesis were true. Therefore, they retained the null hypothesis—concluding that there is no evidence of a sex difference in the population. We can also see why Kanner and his colleagues concluded that there is a correlation between hassles and symptoms in the population. They asked, “If the null hypothesis were true, how likely is it that we would find a strong correlation of +.60 in our sample?” Their answer to this question was that this sample relationship would be fairly unlikely if the null hypothesis were true. Therefore, they rejected the null hypothesis in favor of the alternative hypothesis—concluding that there is a positive correlation between these variables in the population.

A crucial step in null hypothesis testing is finding the probability of the sample result or a more extreme result if the null hypothesis were true (Lakens, 2017). [1] This probability is called the p value . A low p value means that the sample or more extreme result would be unlikely if the null hypothesis were true and leads to the rejection of the null hypothesis. A p value that is not low means that the sample or more extreme result would be likely if the null hypothesis were true and leads to the retention of the null hypothesis. But how low must the p value criterion be before the sample result is considered unlikely enough to reject the null hypothesis? In null hypothesis testing, this criterion is called α (alpha) and is almost always set to .05. If there is a 5% chance or less of a result at least as extreme as the sample result if the null hypothesis were true, then the null hypothesis is rejected. When this happens, the result is said to be statistically significant . If there is greater than a 5% chance of a result as extreme as the sample result when the null hypothesis is true, then the null hypothesis is retained. This does not necessarily mean that the researcher accepts the null hypothesis as true—only that there is not currently enough evidence to reject it. Researchers often use the expression “fail to reject the null hypothesis” rather than “retain the null hypothesis,” but they never use the expression “accept the null hypothesis.”

## The Misunderstood p Value

The p value is one of the most misunderstood quantities in psychological research (Cohen, 1994) [2] . Even professional researchers misinterpret it, and it is not unusual for such misinterpretations to appear in statistics textbooks!

The most common misinterpretation is that the p value is the probability that the null hypothesis is true—that the sample result occurred by chance. For example, a misguided researcher might say that because the p value is .02, there is only a 2% chance that the result is due to chance and a 98% chance that it reflects a real relationship in the population. But this is incorrect . The p value is really the probability of a result at least as extreme as the sample result if the null hypothesis were true. So a p value of .02 means that if the null hypothesis were true, a sample result this extreme would occur only 2% of the time.

You can avoid this misunderstanding by remembering that the p value is not the probability that any particular hypothesis is true or false. Instead, it is the probability of obtaining the sample result if the null hypothesis were true.

## Role of Sample Size and Relationship Strength

Recall that null hypothesis testing involves answering the question, “If the null hypothesis were true, what is the probability of a sample result as extreme as this one?” In other words, “What is the p value?” It can be helpful to see that the answer to this question depends on just two considerations: the strength of the relationship and the size of the sample. Specifically, the stronger the sample relationship and the larger the sample, the less likely the result would be if the null hypothesis were true. That is, the lower the p value. This should make sense. Imagine a study in which a sample of 500 women is compared with a sample of 500 men in terms of some psychological characteristic, and Cohen’s d is a strong 0.50. If there were really no sex difference in the population, then a result this strong based on such a large sample should seem highly unlikely. Now imagine a similar study in which a sample of three women is compared with a sample of three men, and Cohen’s d is a weak 0.10. If there were no sex difference in the population, then a relationship this weak based on such a small sample should seem likely. And this is precisely why the null hypothesis would be rejected in the first example and retained in the second.

Of course, sometimes the result can be weak and the sample large, or the result can be strong and the sample small. In these cases, the two considerations trade off against each other so that a weak result can be statistically significant if the sample is large enough and a strong relationship can be statistically significant even if the sample is small. Table 13.1 shows roughly how relationship strength and sample size combine to determine whether a sample result is statistically significant. The columns of the table represent the three levels of relationship strength: weak, medium, and strong. The rows represent four sample sizes that can be considered small, medium, large, and extra large in the context of psychological research. Thus each cell in the table represents a combination of relationship strength and sample size. If a cell contains the word Yes , then this combination would be statistically significant for both Cohen’s d and Pearson’s r . If it contains the word No , then it would not be statistically significant for either. There is one cell where the decision for d and r would be different and another where it might be different depending on some additional considerations, which are discussed in Section 13.2 “Some Basic Null Hypothesis Tests”

Although Table 13.1 provides only a rough guideline, it shows very clearly that weak relationships based on medium or small samples are never statistically significant and that strong relationships based on medium or larger samples are always statistically significant. If you keep this lesson in mind, you will often know whether a result is statistically significant based on the descriptive statistics alone. It is extremely useful to be able to develop this kind of intuitive judgment. One reason is that it allows you to develop expectations about how your formal null hypothesis tests are going to come out, which in turn allows you to detect problems in your analyses. For example, if your sample relationship is strong and your sample is medium, then you would expect to reject the null hypothesis. If for some reason your formal null hypothesis test indicates otherwise, then you need to double-check your computations and interpretations. A second reason is that the ability to make this kind of intuitive judgment is an indication that you understand the basic logic of this approach in addition to being able to do the computations.

## Statistical Significance Versus Practical Significance

Table 13.1 illustrates another extremely important point. A statistically significant result is not necessarily a strong one. Even a very weak result can be statistically significant if it is based on a large enough sample. This is closely related to Janet Shibley Hyde’s argument about sex differences (Hyde, 2007) [3] . The differences between women and men in mathematical problem solving and leadership ability are statistically significant. But the word significant can cause people to interpret these differences as strong and important—perhaps even important enough to influence the college courses they take or even who they vote for. As we have seen, however, these statistically significant differences are actually quite weak—perhaps even “trivial.”

This is why it is important to distinguish between the statistical significance of a result and the practical significance of that result. Practical significance refers to the importance or usefulness of the result in some real-world context. Many sex differences are statistically significant—and may even be interesting for purely scientific reasons—but they are not practically significant. In clinical practice, this same concept is often referred to as “clinical significance.” For example, a study on a new treatment for social phobia might show that it produces a statistically significant positive effect. Yet this effect still might not be strong enough to justify the time, effort, and other costs of putting it into practice—especially if easier and cheaper treatments that work almost as well already exist. Although statistically significant, this result would be said to lack practical or clinical significance.

## Image Description

“Null Hypothesis” long description: A comic depicting a man and a woman talking in the foreground. In the background is a child working at a desk. The man says to the woman, “I can’t believe schools are still teaching kids about the null hypothesis. I remember reading a big study that conclusively disproved it years ago.” [Return to “Null Hypothesis”]

“Conditional Risk” long description: A comic depicting two hikers beside a tree during a thunderstorm. A bolt of lightning goes “crack” in the dark sky as thunder booms. One of the hikers says, “Whoa! We should get inside!” The other hiker says, “It’s okay! Lightning only kills about 45 Americans a year, so the chances of dying are only one in 7,000,000. Let’s go on!” The comic’s caption says, “The annual death rate among people who know that statistic is one in six.” [Return to “Conditional Risk”]

## Media Attributions

- Null Hypothesis by XKCD CC BY-NC (Attribution NonCommercial)
- Conditional Risk by XKCD CC BY-NC (Attribution NonCommercial)
- Lakens, D. (2017, December 25). About p -values: Understanding common misconceptions. [Blog post] Retrieved from https://correlaid.org/en/blog/understand-p-values/ ↵
- Cohen, J. (1994). The world is round: p < .05. American Psychologist, 49 , 997–1003. ↵
- Hyde, J. S. (2007). New directions in the study of gender similarities and differences. Current Directions in Psychological Science, 16 , 259–263. ↵

Descriptive data that involves measuring one or more variables in a sample and computing descriptive summary data (e.g., means, correlation coefficients) for those variables.

Corresponding values in the population.

The random variability in a statistic from sample to sample.

A formal approach to deciding between two interpretations of a statistical relationship in a sample.

The idea that there is no relationship in the population and that the relationship in the sample reflects only sampling error (often symbolized H0 and read as “H-zero”).

An alternative to the null hypothesis (often symbolized as H1), this hypothesis proposes that there is a relationship in the population and that the relationship in the sample reflects this relationship in the population.

A decision made by researchers using null hypothesis testing which occurs when the sample relationship would be extremely unlikely.

A decision made by researchers in null hypothesis testing which occurs when the sample relationship would not be extremely unlikely.

The probability of obtaining the sample result or a more extreme result if the null hypothesis were true.

The criterion that shows how low a p-value should be before the sample result is considered unlikely enough to reject the null hypothesis (Usually set to .05).

An effect that is unlikely due to random chance and therefore likely represents a real effect in the population.

Refers to the importance or usefulness of the result in some real-world context.

Research Methods in Psychology Copyright © 2019 by Rajiv S. Jhangiani, I-Chant A. Chiang, Carrie Cuttler, & Dana C. Leighton is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License , except where otherwise noted.

## Share This Book

## Understanding Psychology at A level and GCSE: 6

Posted by: Tracey Jones at 7:12 am, April 26, 2011 | Tags: A level Psychology , A level psychology revision , GCSE Psychology , gcse psychology revision

Here is the sixth in our new series of Psychology blogs – useful for anyone revising for exams or thinking about taking up Psychology as a new subject at A level or GCSE .

## Psychology – Aims and Hypotheses

We talked about hypotheses in the previous blog. Let’s just summarise:

The aim is the purpose of the study, but isn’t usually in precise enough terms to test. So we need a hypothesis. A hypothesis is a testable statement. A research hypothesis is generated by a theory. It is developed to test a theory.

## What is a null hypothesis?

A null hypothesis is what we assume to be true when carrying out psychological research. It doesn’t mean that this is what a psychologist believes to be true though. It can simply be used as a hypothesis for the purpose of the research. For example, I may believe that women are better drivers than men. A null hypothesis to test this belief would state “There is no difference between male and female driving ability.”

I would then carry out research to test that null hypothesis. At the end of the research, if I found that women were better drivers, this would show that the null hypothesis was wrong, so we would reject the null hypothesis. We have proved it is wrong.

If I found that the null hypothesis was right – that there was no difference between male and female drivers – then the null hypothesis would be found to be true.

A null hypothesis often predicts that there is no relationship between the variables in the research and that any result is due to chance. The hypothesis that “There is no difference between male and female driving ability” is an example of a null hypothesis that suggests no difference.

## What is a directional hypothesis?

A directional hypothesis suggests that two variables are linked. The hypothesis that “Women will perform better in a driving test than men” is an example of a directional hypothesis. The hypothesis suggests what is going to happen and the direction in which it will happen, i.e. that women will be better drivers and the direction of the results will show this. So if I found that women were better drivers, it would prove my hypothesis. If I found men were better drivers, it would disprove my hypothesis.

Directional hypotheses often use previous research to suggest the way that the results will go.

## What is a non-directional hypothesis?

A non-directional hypothesis will predict that there will be a difference between two groups, but won’t predict in which direction. For example, “There will be a difference in male and female performance in a driving test”: this hypothesis does not say whether men will perform better or women will perform better.

Often non-directional hypotheses are used when there is little previous research on a particular topic or the previous search did not come to any particular conclusion.

Tracey Jones

Psychology Tutor

See more by Tracey Jones

## Stay Connected

## Null Hypothesis Examples

ThoughtCo / Hilary Allison

- Scientific Method
- Chemical Laws
- Periodic Table
- Projects & Experiments
- Biochemistry
- Physical Chemistry
- Medical Chemistry
- Chemistry In Everyday Life
- Famous Chemists
- Activities for Kids
- Abbreviations & Acronyms
- Weather & Climate
- Ph.D., Biomedical Sciences, University of Tennessee at Knoxville
- B.A., Physics and Mathematics, Hastings College

The null hypothesis —which assumes that there is no meaningful relationship between two variables—may be the most valuable hypothesis for the scientific method because it is the easiest to test using a statistical analysis. This means you can support your hypothesis with a high level of confidence. Testing the null hypothesis can tell you whether your results are due to the effect of manipulating the dependent variable or due to chance.

## What Is the Null Hypothesis?

The null hypothesis states there is no relationship between the measured phenomenon (the dependent variable) and the independent variable . You do not need to believe that the null hypothesis is true to test it. On the contrary, you will likely suspect that there is a relationship between a set of variables. One way to prove that this is the case is to reject the null hypothesis. Rejecting a hypothesis does not mean an experiment was "bad" or that it didn't produce results. In fact, it is often one of the first steps toward further inquiry.

To distinguish it from other hypotheses, the null hypothesis is written as H 0 (which is read as “H-nought,” "H-null," or "H-zero"). A significance test is used to determine the likelihood that the results supporting the null hypothesis are not due to chance. A confidence level of 95 percent or 99 percent is common. Keep in mind, even if the confidence level is high, there is still a small chance the null hypothesis is not true, perhaps because the experimenter did not account for a critical factor or because of chance. This is one reason why it's important to repeat experiments.

## Examples of the Null Hypothesis

To write a null hypothesis, first start by asking a question. Rephrase that question in a form that assumes no relationship between the variables. In other words, assume a treatment has no effect. Write your hypothesis in a way that reflects this.

- What Is a Hypothesis? (Science)
- Null Hypothesis Definition and Examples
- What 'Fail to Reject' Means in a Hypothesis Test
- What Are the Elements of a Good Hypothesis?
- Scientific Method Vocabulary Terms
- Definition of a Hypothesis
- Null Hypothesis and Alternative Hypothesis
- Hypothesis Test for the Difference of Two Population Proportions
- Six Steps of the Scientific Method
- What Is the Difference Between Alpha and P-Values?
- What Are Examples of a Hypothesis?
- Hypothesis Test Example
- Understanding Simple vs Controlled Experiments
- How to Conduct a Hypothesis Test
- The Difference Between Type I and Type II Errors in Hypothesis Testing
- What Is a P-Value?

By clicking “Accept All Cookies”, you agree to the storing of cookies on your device to enhance site navigation, analyze site usage, and assist in our marketing efforts.

## 13.1 Understanding Null Hypothesis Testing

Learning objectives.

- Explain the purpose of null hypothesis testing, including the role of sampling error.
- Describe the basic logic of null hypothesis testing.
- Describe the role of relationship strength and sample size in determining statistical significance and make reasonable judgments about statistical significance based on these two factors.

## The Purpose of Null Hypothesis Testing

As we have seen, psychological research typically involves measuring one or more variables in a sample and computing descriptive statistics for that sample. In general, however, the researcher’s goal is not to draw conclusions about that sample but to draw conclusions about the population that the sample was selected from. Thus researchers must use sample statistics to draw conclusions about the corresponding values in the population. These corresponding values in the population are called parameters . Imagine, for example, that a researcher measures the number of depressive symptoms exhibited by each of 50 adults with clinical depression and computes the mean number of symptoms. The researcher probably wants to use this sample statistic (the mean number of symptoms for the sample) to draw conclusions about the corresponding population parameter (the mean number of symptoms for adults with clinical depression).

Unfortunately, sample statistics are not perfect estimates of their corresponding population parameters. This is because there is a certain amount of random variability in any statistic from sample to sample. The mean number of depressive symptoms might be 8.73 in one sample of adults with clinical depression, 6.45 in a second sample, and 9.44 in a third—even though these samples are selected randomly from the same population. Similarly, the correlation (Pearson’s r ) between two variables might be +.24 in one sample, −.04 in a second sample, and +.15 in a third—again, even though these samples are selected randomly from the same population. This random variability in a statistic from sample to sample is called sampling error . (Note that the term error here refers to random variability and does not imply that anyone has made a mistake. No one “commits a sampling error.”)

One implication of this is that when there is a statistical relationship in a sample, it is not always clear that there is a statistical relationship in the population. A small difference between two group means in a sample might indicate that there is a small difference between the two group means in the population. But it could also be that there is no difference between the means in the population and that the difference in the sample is just a matter of sampling error. Similarly, a Pearson’s r value of −.29 in a sample might mean that there is a negative relationship in the population. But it could also be that there is no relationship in the population and that the relationship in the sample is just a matter of sampling error.

In fact, any statistical relationship in a sample can be interpreted in two ways:

- There is a relationship in the population, and the relationship in the sample reflects this.
- There is no relationship in the population, and the relationship in the sample reflects only sampling error.

The purpose of null hypothesis testing is simply to help researchers decide between these two interpretations.

## The Logic of Null Hypothesis Testing

Null hypothesis testing is a formal approach to deciding between two interpretations of a statistical relationship in a sample. One interpretation is called the null hypothesis (often symbolized H 0 and read as “H-naught”). This is the idea that there is no relationship in the population and that the relationship in the sample reflects only sampling error. Informally, the null hypothesis is that the sample relationship “occurred by chance.” The other interpretation is called the alternative hypothesis (often symbolized as H 1 ). This is the idea that there is a relationship in the population and that the relationship in the sample reflects this relationship in the population.

Again, every statistical relationship in a sample can be interpreted in either of these two ways: It might have occurred by chance, or it might reflect a relationship in the population. So researchers need a way to decide between them. Although there are many specific null hypothesis testing techniques, they are all based on the same general logic. The steps are as follows:

- Assume for the moment that the null hypothesis is true. There is no relationship between the variables in the population.
- Determine how likely the sample relationship would be if the null hypothesis were true.
- If the sample relationship would be extremely unlikely, then reject the null hypothesis in favor of the alternative hypothesis. If it would not be extremely unlikely, then retain the null hypothesis .

Following this logic, we can begin to understand why Mehl and his colleagues concluded that there is no difference in talkativeness between women and men in the population. In essence, they asked the following question: “If there were no difference in the population, how likely is it that we would find a small difference of d = 0.06 in our sample?” Their answer to this question was that this sample relationship would be fairly likely if the null hypothesis were true. Therefore, they retained the null hypothesis—concluding that there is no evidence of a sex difference in the population. We can also see why Kanner and his colleagues concluded that there is a correlation between hassles and symptoms in the population. They asked, “If the null hypothesis were true, how likely is it that we would find a strong correlation of +.60 in our sample?” Their answer to this question was that this sample relationship would be fairly unlikely if the null hypothesis were true. Therefore, they rejected the null hypothesis in favor of the alternative hypothesis—concluding that there is a positive correlation between these variables in the population.

A crucial step in null hypothesis testing is finding the likelihood of the sample result if the null hypothesis were true. This probability is called the p value . A low p value means that the sample result would be unlikely if the null hypothesis were true and leads to the rejection of the null hypothesis. A p value that is not low means that the sample result would be likely if the null hypothesis were true and leads to the retention of the null hypothesis. But how low must the p value be before the sample result is considered unlikely enough to reject the null hypothesis? In null hypothesis testing, this criterion is called α (alpha) and is almost always set to .05. If there is a 5% chance or less of a result as extreme as the sample result if the null hypothesis were true, then the null hypothesis is rejected. When this happens, the result is said to be statistically significant . If there is greater than a 5% chance of a result as extreme as the sample result when the null hypothesis is true, then the null hypothesis is retained. This does not necessarily mean that the researcher accepts the null hypothesis as true—only that there is not currently enough evidence to reject it. Researchers often use the expression “fail to reject the null hypothesis” rather than “retain the null hypothesis,” but they never use the expression “accept the null hypothesis.”

## The Misunderstood p Value

The p value is one of the most misunderstood quantities in psychological research (Cohen, 1994) [1] . Even professional researchers misinterpret it, and it is not unusual for such misinterpretations to appear in statistics textbooks!

The most common misinterpretation is that the p value is the probability that the null hypothesis is true—that the sample result occurred by chance. For example, a misguided researcher might say that because the p value is .02, there is only a 2% chance that the result is due to chance and a 98% chance that it reflects a real relationship in the population. But this is incorrect . The p value is really the probability of a result at least as extreme as the sample result if the null hypothesis were true. So a p value of .02 means that if the null hypothesis were true, a sample result this extreme would occur only 2% of the time.

You can avoid this misunderstanding by remembering that the p value is not the probability that any particular hypothesis is true or false. Instead, it is the probability of obtaining the sample result if the null hypothesis were true.

“Null Hypothesis” retrieved from http://imgs.xkcd.com/comics/null_hypothesis.png (CC-BY-NC 2.5)

## Role of Sample Size and Relationship Strength

Recall that null hypothesis testing involves answering the question, “If the null hypothesis were true, what is the probability of a sample result as extreme as this one?” In other words, “What is the p value?” It can be helpful to see that the answer to this question depends on just two considerations: the strength of the relationship and the size of the sample. Specifically, the stronger the sample relationship and the larger the sample, the less likely the result would be if the null hypothesis were true. That is, the lower the p value. This should make sense. Imagine a study in which a sample of 500 women is compared with a sample of 500 men in terms of some psychological characteristic, and Cohen’s d is a strong 0.50. If there were really no sex difference in the population, then a result this strong based on such a large sample should seem highly unlikely. Now imagine a similar study in which a sample of three women is compared with a sample of three men, and Cohen’s d is a weak 0.10. If there were no sex difference in the population, then a relationship this weak based on such a small sample should seem likely. And this is precisely why the null hypothesis would be rejected in the first example and retained in the second.

Of course, sometimes the result can be weak and the sample large, or the result can be strong and the sample small. In these cases, the two considerations trade off against each other so that a weak result can be statistically significant if the sample is large enough and a strong relationship can be statistically significant even if the sample is small. Table 13.1 shows roughly how relationship strength and sample size combine to determine whether a sample result is statistically significant. The columns of the table represent the three levels of relationship strength: weak, medium, and strong. The rows represent four sample sizes that can be considered small, medium, large, and extra large in the context of psychological research. Thus each cell in the table represents a combination of relationship strength and sample size. If a cell contains the word Yes , then this combination would be statistically significant for both Cohen’s d and Pearson’s r . If it contains the word No , then it would not be statistically significant for either. There is one cell where the decision for d and r would be different and another where it might be different depending on some additional considerations, which are discussed in Section 13.2 “Some Basic Null Hypothesis Tests”

Although Table 13.1 provides only a rough guideline, it shows very clearly that weak relationships based on medium or small samples are never statistically significant and that strong relationships based on medium or larger samples are always statistically significant. If you keep this lesson in mind, you will often know whether a result is statistically significant based on the descriptive statistics alone. It is extremely useful to be able to develop this kind of intuitive judgment. One reason is that it allows you to develop expectations about how your formal null hypothesis tests are going to come out, which in turn allows you to detect problems in your analyses. For example, if your sample relationship is strong and your sample is medium, then you would expect to reject the null hypothesis. If for some reason your formal null hypothesis test indicates otherwise, then you need to double-check your computations and interpretations. A second reason is that the ability to make this kind of intuitive judgment is an indication that you understand the basic logic of this approach in addition to being able to do the computations.

## Statistical Significance Versus Practical Significance

Table 13.1 illustrates another extremely important point. A statistically significant result is not necessarily a strong one. Even a very weak result can be statistically significant if it is based on a large enough sample. This is closely related to Janet Shibley Hyde’s argument about sex differences (Hyde, 2007) [2] . The differences between women and men in mathematical problem solving and leadership ability are statistically significant. But the word significant can cause people to interpret these differences as strong and important—perhaps even important enough to influence the college courses they take or even who they vote for. As we have seen, however, these statistically significant differences are actually quite weak—perhaps even “trivial.”

This is why it is important to distinguish between the statistical significance of a result and the practical significance of that result. Practical significance refers to the importance or usefulness of the result in some real-world context. Many sex differences are statistically significant—and may even be interesting for purely scientific reasons—but they are not practically significant. In clinical practice, this same concept is often referred to as “clinical significance.” For example, a study on a new treatment for social phobia might show that it produces a statistically significant positive effect. Yet this effect still might not be strong enough to justify the time, effort, and other costs of putting it into practice—especially if easier and cheaper treatments that work almost as well already exist. Although statistically significant, this result would be said to lack practical or clinical significance.

“Conditional Risk” retrieved from http://imgs.xkcd.com/comics/conditional_risk.png (CC-BY-NC 2.5)

## Key Takeaways

- Null hypothesis testing is a formal approach to deciding whether a statistical relationship in a sample reflects a real relationship in the population or is just due to chance.
- The logic of null hypothesis testing involves assuming that the null hypothesis is true, finding how likely the sample result would be if this assumption were correct, and then making a decision. If the sample result would be unlikely if the null hypothesis were true, then it is rejected in favor of the alternative hypothesis. If it would not be unlikely, then the null hypothesis is retained.
- The probability of obtaining the sample result if the null hypothesis were true (the p value) is based on two considerations: relationship strength and sample size. Reasonable judgments about whether a sample relationship is statistically significant can often be made by quickly considering these two factors.
- Statistical significance is not the same as relationship strength or importance. Even weak relationships can be statistically significant if the sample size is large enough. It is important to consider relationship strength and the practical significance of a result in addition to its statistical significance.
- Discussion: Imagine a study showing that people who eat more broccoli tend to be happier. Explain for someone who knows nothing about statistics why the researchers would conduct a null hypothesis test.
- The correlation between two variables is r = −.78 based on a sample size of 137.
- The mean score on a psychological characteristic for women is 25 ( SD = 5) and the mean score for men is 24 ( SD = 5). There were 12 women and 10 men in this study.
- In a memory experiment, the mean number of items recalled by the 40 participants in Condition A was 0.50 standard deviations greater than the mean number recalled by the 40 participants in Condition B.
- In another memory experiment, the mean scores for participants in Condition A and Condition B came out exactly the same!
- A student finds a correlation of r = .04 between the number of units the students in his research methods class are taking and the students’ level of stress.
- Cohen, J. (1994). The world is round: p < .05. American Psychologist, 49 , 997–1003. ↵
- Hyde, J. S. (2007). New directions in the study of gender similarities and differences. Current Directions in Psychological Science, 16 , 259–263. ↵

## Share This Book

- Increase Font Size

## IMAGES

## VIDEO

## COMMENTS

A null hypothesis is a statistical concept suggesting no significant difference or relationship between measured variables. It's the default assumption unless empirical evidence proves otherwise. The null hypothesis states no relationship exists between the two variables being studied (i.e., one variable does not affect the other).

A research hypothesis, in its plural form "hypotheses," is a specific, testable prediction about the anticipated results of a study, established at its outset. It is a key component of the scientific method. Hypotheses connect theory to data and guide the research process towards expanding scientific understanding Some key points about hypotheses:

There are two types of hypothesis: - H1 - Research hypothesis - H0 - Null hypothesis H1 - The Research Hypothesis This predicts a statistically significant effect of an IV on a DV (i.e. an experiment), or a significant relationship between variables (i.e. a correlation study), e.g.

Psychology requires the use of mathematical skills for handling data in investigations. The mathematical skills required for this qualification are set out in Section 11.2 of this Topic Guide. Guidance 11.1 Designing psychological research 11.1.1 Be able to identify: a. an independent variable (IV) b. a dependent variable (DV)

The basics: A definition: A hypothesis is a testable statement predicted by a psychologist based on a set of previous observations and/or theories. The wording should allow a precise test to be made to see if the prediction is correct. Hypothesis testing is one of the key features of science.

What is a null hypothesis? A null hypothesis predicts that there will be no pattern or trend in results. In other words, it predicts no difference and no correlation. (A correlation is a relationship between two or more things.)

A null hypothesis is a general statement that the observed variables will have no impact as there is no relationship between them.

application of the mark schemes in the GCSE Psychology assessment. The 1PS0/02 (Paper 2) assessment consists of a written examination of 1 hour and 20 minutes, which accounts for 45% of the qualification, with a total of 79 marks available on ... Candidates are required to give a null hypothesis for the research being undertaken in the scenario ...

First worksheet on Alternate and Null hypotheses Psychology GCSE 1-9 Subject: Psychology Age range: 14-16 Resource type: Worksheet/Activity File previews docx, 13.21 KB This is the first worksheet to examine the way of writing down scientific predictions in psychology.

Included is a brief explanation of aims; the differences between null and alternative hypothesis, with a task to practice this, then some small mark GCSE questions from Section D of paper 1 (2 marks), with a mark scheme reflection activity. Tes paid licence How can I reuse this?

The purpose of null hypothesis testing is simply to help researchers decide between these two interpretations. The Logic of Null Hypothesis Testing Null hypothesis testing is a formal approach to deciding between two interpretations of a statistical relationship in a sample. One interpretation is called the

The lesson covers formulating an aim, null & alternative hypotheses and independent & dependent variables in psychological studies and investigations. LO1: Formulate an aim for an investigation. LO2: Identify independent and dependent variables in investigations. LO3: Compare alternative hypotheses and null hypotheses.

Three Different Hypotheses: (1) Directional Hypothesis: states that the IV will have an effect on the DV and what that effect will be (the direction of results). For example, eating smarties will significantly improve an individual's dancing ability. When writing a directional hypothesis, it is important that you state exactly how the IV will ...

This is the hypothesis that the researcher will attempt to reject by disproving it, thus approving the alternate hypothesis.An alternate hypothesis (also called an experimental hypothesis) is what you'd assume to be a hypothesis normally. It's called the 'alternate' hypothesis because it acts as the alternative to the null hypothesis, e.g.

The Logic of Null Hypothesis Testing. Null hypothesis testing (often called null hypothesis significance testing or NHST) is a formal approach to deciding between two interpretations of a statistical relationship in a sample. One interpretation is called the null hypothesis (often symbolized 0 and read as "H-zero").

The Purpose of Null Hypothesis Testing As we have seen, psychological research typically involves measuring one or more variables in a sample and computing descriptive summary data (e.g., means, correlation coefficients) for those variables. These descriptive data for the sample are called statistics .

A null hypothesis to test this belief would state "There is no difference between male and female driving ability." I would then carry out research to test that null hypothesis. At the end of the research, if I found that women were better drivers, this would show that the null hypothesis was wrong, so we would reject the null hypothesis ...

The hypothesis the researcher tests by conducting a study and collecting data, which attempts to show the null hypothesis is not supported. This often involves a prediction about how one variable will affect another variable. "There will be a difference in watching scary films before bed and how much sleep an individual has".

The null hypothesis —which assumes that there is no meaningful relationship between two variables—may be the most valuable hypothesis for the scientific method because it is the easiest to test using a statistical analysis. This means you can support your hypothesis with a high level of confidence.

Study with Quizlet and memorize flashcards containing terms like alternative hypothesis, null hypothesis, independent variable and more. ... OCR GCSE Psychology Social Influence. 67 terms. sid_halder. Preview. Test 7 Psych . 90 terms. Kaitlyn_Manhart. Preview. Substance Use Disorder (SUD) 38 terms. ajohnston528.

Null Hypothesis This has to be written as well as the experimental hypothesis This is when the hypothesis states that there is no relationship between the IV & the DV If there is a relationship it is merely due to chance and not because of purposeful experimentation of the IV

Explain how sleep is needed for emotional stability. Lower levels of stress hormone cortisol. Features of sleep stage 1. Stage 1: -Light sleep, eye and muscle activity slows, muscle spasms. -Slower, synchronised brain waves (alpha), these slow down to theta. Features of sleep stage 2. -Eye/muscle movements slow down.

A crucial step in null hypothesis testing is finding the likelihood of the sample result if the null hypothesis were true. This probability is called the p value. A low p value means that the sample result would be unlikely if the null hypothesis were true and leads to the rejection of the null hypothesis. A p value that is not low means that ...