• Privacy Policy

Research Method

Home » Case Study – Methods, Examples and Guide

Case Study – Methods, Examples and Guide

Table of Contents

Case Study Research

A case study is a research method that involves an in-depth examination and analysis of a particular phenomenon or case, such as an individual, organization, community, event, or situation.

It is a qualitative research approach that aims to provide a detailed and comprehensive understanding of the case being studied. Case studies typically involve multiple sources of data, including interviews, observations, documents, and artifacts, which are analyzed using various techniques, such as content analysis, thematic analysis, and grounded theory. The findings of a case study are often used to develop theories, inform policy or practice, or generate new research questions.

Types of Case Study

Types and Methods of Case Study are as follows:

Single-Case Study

A single-case study is an in-depth analysis of a single case. This type of case study is useful when the researcher wants to understand a specific phenomenon in detail.

For Example , A researcher might conduct a single-case study on a particular individual to understand their experiences with a particular health condition or a specific organization to explore their management practices. The researcher collects data from multiple sources, such as interviews, observations, and documents, and uses various techniques to analyze the data, such as content analysis or thematic analysis. The findings of a single-case study are often used to generate new research questions, develop theories, or inform policy or practice.

Multiple-Case Study

A multiple-case study involves the analysis of several cases that are similar in nature. This type of case study is useful when the researcher wants to identify similarities and differences between the cases.

For Example, a researcher might conduct a multiple-case study on several companies to explore the factors that contribute to their success or failure. The researcher collects data from each case, compares and contrasts the findings, and uses various techniques to analyze the data, such as comparative analysis or pattern-matching. The findings of a multiple-case study can be used to develop theories, inform policy or practice, or generate new research questions.

Exploratory Case Study

An exploratory case study is used to explore a new or understudied phenomenon. This type of case study is useful when the researcher wants to generate hypotheses or theories about the phenomenon.

For Example, a researcher might conduct an exploratory case study on a new technology to understand its potential impact on society. The researcher collects data from multiple sources, such as interviews, observations, and documents, and uses various techniques to analyze the data, such as grounded theory or content analysis. The findings of an exploratory case study can be used to generate new research questions, develop theories, or inform policy or practice.

Descriptive Case Study

A descriptive case study is used to describe a particular phenomenon in detail. This type of case study is useful when the researcher wants to provide a comprehensive account of the phenomenon.

For Example, a researcher might conduct a descriptive case study on a particular community to understand its social and economic characteristics. The researcher collects data from multiple sources, such as interviews, observations, and documents, and uses various techniques to analyze the data, such as content analysis or thematic analysis. The findings of a descriptive case study can be used to inform policy or practice or generate new research questions.

Instrumental Case Study

An instrumental case study is used to understand a particular phenomenon that is instrumental in achieving a particular goal. This type of case study is useful when the researcher wants to understand the role of the phenomenon in achieving the goal.

For Example, a researcher might conduct an instrumental case study on a particular policy to understand its impact on achieving a particular goal, such as reducing poverty. The researcher collects data from multiple sources, such as interviews, observations, and documents, and uses various techniques to analyze the data, such as content analysis or thematic analysis. The findings of an instrumental case study can be used to inform policy or practice or generate new research questions.

Case Study Data Collection Methods

Here are some common data collection methods for case studies:

Interviews involve asking questions to individuals who have knowledge or experience relevant to the case study. Interviews can be structured (where the same questions are asked to all participants) or unstructured (where the interviewer follows up on the responses with further questions). Interviews can be conducted in person, over the phone, or through video conferencing.


Observations involve watching and recording the behavior and activities of individuals or groups relevant to the case study. Observations can be participant (where the researcher actively participates in the activities) or non-participant (where the researcher observes from a distance). Observations can be recorded using notes, audio or video recordings, or photographs.

Documents can be used as a source of information for case studies. Documents can include reports, memos, emails, letters, and other written materials related to the case study. Documents can be collected from the case study participants or from public sources.

Surveys involve asking a set of questions to a sample of individuals relevant to the case study. Surveys can be administered in person, over the phone, through mail or email, or online. Surveys can be used to gather information on attitudes, opinions, or behaviors related to the case study.

Artifacts are physical objects relevant to the case study. Artifacts can include tools, equipment, products, or other objects that provide insights into the case study phenomenon.

How to conduct Case Study Research

Conducting a case study research involves several steps that need to be followed to ensure the quality and rigor of the study. Here are the steps to conduct case study research:

  • Define the research questions: The first step in conducting a case study research is to define the research questions. The research questions should be specific, measurable, and relevant to the case study phenomenon under investigation.
  • Select the case: The next step is to select the case or cases to be studied. The case should be relevant to the research questions and should provide rich and diverse data that can be used to answer the research questions.
  • Collect data: Data can be collected using various methods, such as interviews, observations, documents, surveys, and artifacts. The data collection method should be selected based on the research questions and the nature of the case study phenomenon.
  • Analyze the data: The data collected from the case study should be analyzed using various techniques, such as content analysis, thematic analysis, or grounded theory. The analysis should be guided by the research questions and should aim to provide insights and conclusions relevant to the research questions.
  • Draw conclusions: The conclusions drawn from the case study should be based on the data analysis and should be relevant to the research questions. The conclusions should be supported by evidence and should be clearly stated.
  • Validate the findings: The findings of the case study should be validated by reviewing the data and the analysis with participants or other experts in the field. This helps to ensure the validity and reliability of the findings.
  • Write the report: The final step is to write the report of the case study research. The report should provide a clear description of the case study phenomenon, the research questions, the data collection methods, the data analysis, the findings, and the conclusions. The report should be written in a clear and concise manner and should follow the guidelines for academic writing.

Examples of Case Study

Here are some examples of case study research:

  • The Hawthorne Studies : Conducted between 1924 and 1932, the Hawthorne Studies were a series of case studies conducted by Elton Mayo and his colleagues to examine the impact of work environment on employee productivity. The studies were conducted at the Hawthorne Works plant of the Western Electric Company in Chicago and included interviews, observations, and experiments.
  • The Stanford Prison Experiment: Conducted in 1971, the Stanford Prison Experiment was a case study conducted by Philip Zimbardo to examine the psychological effects of power and authority. The study involved simulating a prison environment and assigning participants to the role of guards or prisoners. The study was controversial due to the ethical issues it raised.
  • The Challenger Disaster: The Challenger Disaster was a case study conducted to examine the causes of the Space Shuttle Challenger explosion in 1986. The study included interviews, observations, and analysis of data to identify the technical, organizational, and cultural factors that contributed to the disaster.
  • The Enron Scandal: The Enron Scandal was a case study conducted to examine the causes of the Enron Corporation’s bankruptcy in 2001. The study included interviews, analysis of financial data, and review of documents to identify the accounting practices, corporate culture, and ethical issues that led to the company’s downfall.
  • The Fukushima Nuclear Disaster : The Fukushima Nuclear Disaster was a case study conducted to examine the causes of the nuclear accident that occurred at the Fukushima Daiichi Nuclear Power Plant in Japan in 2011. The study included interviews, analysis of data, and review of documents to identify the technical, organizational, and cultural factors that contributed to the disaster.

Application of Case Study

Case studies have a wide range of applications across various fields and industries. Here are some examples:

Business and Management

Case studies are widely used in business and management to examine real-life situations and develop problem-solving skills. Case studies can help students and professionals to develop a deep understanding of business concepts, theories, and best practices.

Case studies are used in healthcare to examine patient care, treatment options, and outcomes. Case studies can help healthcare professionals to develop critical thinking skills, diagnose complex medical conditions, and develop effective treatment plans.

Case studies are used in education to examine teaching and learning practices. Case studies can help educators to develop effective teaching strategies, evaluate student progress, and identify areas for improvement.

Social Sciences

Case studies are widely used in social sciences to examine human behavior, social phenomena, and cultural practices. Case studies can help researchers to develop theories, test hypotheses, and gain insights into complex social issues.

Law and Ethics

Case studies are used in law and ethics to examine legal and ethical dilemmas. Case studies can help lawyers, policymakers, and ethical professionals to develop critical thinking skills, analyze complex cases, and make informed decisions.

Purpose of Case Study

The purpose of a case study is to provide a detailed analysis of a specific phenomenon, issue, or problem in its real-life context. A case study is a qualitative research method that involves the in-depth exploration and analysis of a particular case, which can be an individual, group, organization, event, or community.

The primary purpose of a case study is to generate a comprehensive and nuanced understanding of the case, including its history, context, and dynamics. Case studies can help researchers to identify and examine the underlying factors, processes, and mechanisms that contribute to the case and its outcomes. This can help to develop a more accurate and detailed understanding of the case, which can inform future research, practice, or policy.

Case studies can also serve other purposes, including:

  • Illustrating a theory or concept: Case studies can be used to illustrate and explain theoretical concepts and frameworks, providing concrete examples of how they can be applied in real-life situations.
  • Developing hypotheses: Case studies can help to generate hypotheses about the causal relationships between different factors and outcomes, which can be tested through further research.
  • Providing insight into complex issues: Case studies can provide insights into complex and multifaceted issues, which may be difficult to understand through other research methods.
  • Informing practice or policy: Case studies can be used to inform practice or policy by identifying best practices, lessons learned, or areas for improvement.

Advantages of Case Study Research

There are several advantages of case study research, including:

  • In-depth exploration: Case study research allows for a detailed exploration and analysis of a specific phenomenon, issue, or problem in its real-life context. This can provide a comprehensive understanding of the case and its dynamics, which may not be possible through other research methods.
  • Rich data: Case study research can generate rich and detailed data, including qualitative data such as interviews, observations, and documents. This can provide a nuanced understanding of the case and its complexity.
  • Holistic perspective: Case study research allows for a holistic perspective of the case, taking into account the various factors, processes, and mechanisms that contribute to the case and its outcomes. This can help to develop a more accurate and comprehensive understanding of the case.
  • Theory development: Case study research can help to develop and refine theories and concepts by providing empirical evidence and concrete examples of how they can be applied in real-life situations.
  • Practical application: Case study research can inform practice or policy by identifying best practices, lessons learned, or areas for improvement.
  • Contextualization: Case study research takes into account the specific context in which the case is situated, which can help to understand how the case is influenced by the social, cultural, and historical factors of its environment.

Limitations of Case Study Research

There are several limitations of case study research, including:

  • Limited generalizability : Case studies are typically focused on a single case or a small number of cases, which limits the generalizability of the findings. The unique characteristics of the case may not be applicable to other contexts or populations, which may limit the external validity of the research.
  • Biased sampling: Case studies may rely on purposive or convenience sampling, which can introduce bias into the sample selection process. This may limit the representativeness of the sample and the generalizability of the findings.
  • Subjectivity: Case studies rely on the interpretation of the researcher, which can introduce subjectivity into the analysis. The researcher’s own biases, assumptions, and perspectives may influence the findings, which may limit the objectivity of the research.
  • Limited control: Case studies are typically conducted in naturalistic settings, which limits the control that the researcher has over the environment and the variables being studied. This may limit the ability to establish causal relationships between variables.
  • Time-consuming: Case studies can be time-consuming to conduct, as they typically involve a detailed exploration and analysis of a specific case. This may limit the feasibility of conducting multiple case studies or conducting case studies in a timely manner.
  • Resource-intensive: Case studies may require significant resources, including time, funding, and expertise. This may limit the ability of researchers to conduct case studies in resource-constrained settings.

About the author

' src=

Muhammad Hassan

Researcher, Academic Writer, Web developer

You may also like


Questionnaire – Definition, Types, and Examples

Observational Research

Observational Research – Methods and Guide

Quantitative Research

Quantitative Research – Methods, Types and...

Qualitative Research Methods

Qualitative Research Methods

Explanatory Research

Explanatory Research – Types, Methods, Guide

Survey Research

Survey Research – Types, Methods, Examples

Qualitative case study data analysis: an example from practice


  • 1 School of Nursing and Midwifery, National University of Ireland, Galway, Republic of Ireland.
  • PMID: 25976531
  • DOI: 10.7748/nr.22.5.8.e1307

Aim: To illustrate an approach to data analysis in qualitative case study methodology.

Background: There is often little detail in case study research about how data were analysed. However, it is important that comprehensive analysis procedures are used because there are often large sets of data from multiple sources of evidence. Furthermore, the ability to describe in detail how the analysis was conducted ensures rigour in reporting qualitative research.

Data sources: The research example used is a multiple case study that explored the role of the clinical skills laboratory in preparing students for the real world of practice. Data analysis was conducted using a framework guided by the four stages of analysis outlined by Morse ( 1994 ): comprehending, synthesising, theorising and recontextualising. The specific strategies for analysis in these stages centred on the work of Miles and Huberman ( 1994 ), which has been successfully used in case study research. The data were managed using NVivo software.

Review methods: Literature examining qualitative data analysis was reviewed and strategies illustrated by the case study example provided. Discussion Each stage of the analysis framework is described with illustration from the research example for the purpose of highlighting the benefits of a systematic approach to handling large data sets from multiple sources.

Conclusion: By providing an example of how each stage of the analysis was conducted, it is hoped that researchers will be able to consider the benefits of such an approach to their own case study analysis.

Implications for research/practice: This paper illustrates specific strategies that can be employed when conducting data analysis in case study research and other qualitative research designs.

Keywords: Case study data analysis; case study research methodology; clinical skills research; qualitative case study methodology; qualitative data analysis; qualitative research.

  • Case-Control Studies*
  • Data Interpretation, Statistical*
  • Nursing Research / methods*
  • Qualitative Research*
  • Research Design

Case Study Research in Software Engineering: Guidelines and Examples by Per Runeson, Martin Höst, Austen Rainer, Björn Regnell

Get full access to Case Study Research in Software Engineering: Guidelines and Examples and 60K+ other titles, with a free 10-day trial of O'Reilly.

There are also live events, courses curated by job role, and more.


5.1 introduction.

Once data has been collected the focus shifts to analysis of data. It can be said that in this phase, data is used to understand what actually has happened in the studied case, and where the researcher understands the details of the case and seeks patterns in the data. This means that there inevitably is some analysis going on also in the data collection phase where the data is studied, and for example when data from an interview is transcribed. The understandings in the earlier phases are of course also valid and important, but this chapter is more focusing on the separate phase that starts after the data has been collected.

Data analysis is conducted differently for quantitative and qualitative data. Sections 5.2 – 5.5 describe how to analyze qualitative data and how to assess the validity of this type of analysis. In Section 5.6 , a short introduction to quantitative analysis methods is given. Since quantitative analysis is covered extensively in textbooks on statistical analysis, and case study research to a large extent relies on qualitative data, this section is kept short.


5.2.1 introduction.

As case study research is a flexible research method, qualitative data analysis methods are commonly used [176]. The basic objective of the analysis is, as in any other analysis, to derive conclusions from the data, keeping a clear chain of evidence. The chain of evidence means that a reader ...

Get Case Study Research in Software Engineering: Guidelines and Examples now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.

Don’t leave empty-handed

Get Mark Richards’s Software Architecture Patterns ebook to better understand how to design components—and how they should interact.

It’s yours, free.

Cover of Software Architecture Patterns

Check it out now on O’Reilly

Dive in for free with a 10-day trial of the O’Reilly learning platform—then explore all the other resources our members count on to build skills and solve problems every day.

case study with data interpretation

10 Real World Data Science Case Studies Projects with Example

Top 10 Data Science Case Studies Projects with Examples and Solutions in Python to inspire your data science learning in 2023.

10 Real World Data Science Case Studies Projects with Example

BelData science has been a trending buzzword in recent times. With wide applications in various sectors like healthcare , education, retail, transportation, media, and banking -data science applications are at the core of pretty much every industry out there. The possibilities are endless: analysis of frauds in the finance sector or the personalization of recommendations on eCommerce businesses.  We have developed ten exciting data science case studies to explain how data science is leveraged across various industries to make smarter decisions and develop innovative personalized products tailored to specific customers.


Walmart Sales Forecasting Data Science Project

Downloadable solution code | Explanatory videos | Tech Support

Table of Contents

Data science case studies in retail , data science case study examples in entertainment industry , data analytics case study examples in travel industry , case studies for data analytics in social media , real world data science projects in healthcare, data analytics case studies in oil and gas, what is a case study in data science, how do you prepare a data science case study, 10 most interesting data science case studies with examples.

data science case studies

So, without much ado, let's get started with data science business case studies !

With humble beginnings as a simple discount retailer, today, Walmart operates in 10,500 stores and clubs in 24 countries and eCommerce websites, employing around 2.2 million people around the globe. For the fiscal year ended January 31, 2021, Walmart's total revenue was $559 billion showing a growth of $35 billion with the expansion of the eCommerce sector. Walmart is a data-driven company that works on the principle of 'Everyday low cost' for its consumers. To achieve this goal, they heavily depend on the advances of their data science and analytics department for research and development, also known as Walmart Labs. Walmart is home to the world's largest private cloud, which can manage 2.5 petabytes of data every hour! To analyze this humongous amount of data, Walmart has created 'Data Café,' a state-of-the-art analytics hub located within its Bentonville, Arkansas headquarters. The Walmart Labs team heavily invests in building and managing technologies like cloud, data, DevOps , infrastructure, and security.

ProjectPro Free Projects on Big Data and Data Science

Walmart is experiencing massive digital growth as the world's largest retailer . Walmart has been leveraging Big data and advances in data science to build solutions to enhance, optimize and customize the shopping experience and serve their customers in a better way. At Walmart Labs, data scientists are focused on creating data-driven solutions that power the efficiency and effectiveness of complex supply chain management processes. Here are some of the applications of data science  at Walmart:

i) Personalized Customer Shopping Experience

Walmart analyses customer preferences and shopping patterns to optimize the stocking and displaying of merchandise in their stores. Analysis of Big data also helps them understand new item sales, make decisions on discontinuing products, and the performance of brands.

ii) Order Sourcing and On-Time Delivery Promise

Millions of customers view items on Walmart.com, and Walmart provides each customer a real-time estimated delivery date for the items purchased. Walmart runs a backend algorithm that estimates this based on the distance between the customer and the fulfillment center, inventory levels, and shipping methods available. The supply chain management system determines the optimum fulfillment center based on distance and inventory levels for every order. It also has to decide on the shipping method to minimize transportation costs while meeting the promised delivery date.

Here's what valued users are saying about ProjectPro

user profile

Director Data Analytics at EY / EY Tech

user profile

Tech Leader | Stanford / Yale University

Not sure what you are looking for?

iii) Packing Optimization 

Also known as Box recommendation is a daily occurrence in the shipping of items in retail and eCommerce business. When items of an order or multiple orders for the same customer are ready for packing, Walmart has developed a recommender system that picks the best-sized box which holds all the ordered items with the least in-box space wastage within a fixed amount of time. This Bin Packing problem is a classic NP-Hard problem familiar to data scientists .

Whenever items of an order or multiple orders placed by the same customer are picked from the shelf and are ready for packing, the box recommendation system determines the best-sized box to hold all the ordered items with a minimum of in-box space wasted. This problem is known as the Bin Packing Problem, another classic NP-Hard problem familiar to data scientists.

Here is a link to a sales prediction data science case study to help you understand the applications of Data Science in the real world. Walmart Sales Forecasting Project uses historical sales data for 45 Walmart stores located in different regions. Each store contains many departments, and you must build a model to project the sales for each department in each store. This data science case study aims to create a predictive model to predict the sales of each product. You can also try your hands-on Inventory Demand Forecasting Data Science Project to develop a machine learning model to forecast inventory demand accurately based on historical sales data.

Get Closer To Your Dream of Becoming a Data Scientist with 70+ Solved End-to-End ML Projects

Amazon is an American multinational technology-based company based in Seattle, USA. It started as an online bookseller, but today it focuses on eCommerce, cloud computing , digital streaming, and artificial intelligence . It hosts an estimate of 1,000,000,000 gigabytes of data across more than 1,400,000 servers. Through its constant innovation in data science and big data Amazon is always ahead in understanding its customers. Here are a few data analytics case study examples at Amazon:

i) Recommendation Systems

Data science models help amazon understand the customers' needs and recommend them to them before the customer searches for a product; this model uses collaborative filtering. Amazon uses 152 million customer purchases data to help users to decide on products to be purchased. The company generates 35% of its annual sales using the Recommendation based systems (RBS) method.

Here is a Recommender System Project to help you build a recommendation system using collaborative filtering. 

ii) Retail Price Optimization

Amazon product prices are optimized based on a predictive model that determines the best price so that the users do not refuse to buy it based on price. The model carefully determines the optimal prices considering the customers' likelihood of purchasing the product and thinks the price will affect the customers' future buying patterns. Price for a product is determined according to your activity on the website, competitors' pricing, product availability, item preferences, order history, expected profit margin, and other factors.

Check Out this Retail Price Optimization Project to build a Dynamic Pricing Model.

iii) Fraud Detection

Being a significant eCommerce business, Amazon remains at high risk of retail fraud. As a preemptive measure, the company collects historical and real-time data for every order. It uses Machine learning algorithms to find transactions with a higher probability of being fraudulent. This proactive measure has helped the company restrict clients with an excessive number of returns of products.

You can look at this Credit Card Fraud Detection Project to implement a fraud detection model to classify fraudulent credit card transactions.

New Projects

Let us explore data analytics case study examples in the entertainment indusry.

Ace Your Next Job Interview with Mock Interviews from Experts to Improve Your Skills and Boost Confidence!

Data Science Interview Preparation

Netflix started as a DVD rental service in 1997 and then has expanded into the streaming business. Headquartered in Los Gatos, California, Netflix is the largest content streaming company in the world. Currently, Netflix has over 208 million paid subscribers worldwide, and with thousands of smart devices which are presently streaming supported, Netflix has around 3 billion hours watched every month. The secret to this massive growth and popularity of Netflix is its advanced use of data analytics and recommendation systems to provide personalized and relevant content recommendations to its users. The data is collected over 100 billion events every day. Here are a few examples of data analysis case studies applied at Netflix :

i) Personalized Recommendation System

Netflix uses over 1300 recommendation clusters based on consumer viewing preferences to provide a personalized experience. Some of the data that Netflix collects from its users include Viewing time, platform searches for keywords, Metadata related to content abandonment, such as content pause time, rewind, rewatched. Using this data, Netflix can predict what a viewer is likely to watch and give a personalized watchlist to a user. Some of the algorithms used by the Netflix recommendation system are Personalized video Ranking, Trending now ranker, and the Continue watching now ranker.

ii) Content Development using Data Analytics

Netflix uses data science to analyze the behavior and patterns of its user to recognize themes and categories that the masses prefer to watch. This data is used to produce shows like The umbrella academy, and Orange Is the New Black, and the Queen's Gambit. These shows seem like a huge risk but are significantly based on data analytics using parameters, which assured Netflix that they would succeed with its audience. Data analytics is helping Netflix come up with content that their viewers want to watch even before they know they want to watch it.

iii) Marketing Analytics for Campaigns

Netflix uses data analytics to find the right time to launch shows and ad campaigns to have maximum impact on the target audience. Marketing analytics helps come up with different trailers and thumbnails for other groups of viewers. For example, the House of Cards Season 5 trailer with a giant American flag was launched during the American presidential elections, as it would resonate well with the audience.

Here is a Customer Segmentation Project using association rule mining to understand the primary grouping of customers based on various parameters.

Get FREE Access to Machine Learning Example Codes for Data Cleaning , Data Munging, and Data Visualization

In a world where Purchasing music is a thing of the past and streaming music is a current trend, Spotify has emerged as one of the most popular streaming platforms. With 320 million monthly users, around 4 billion playlists, and approximately 2 million podcasts, Spotify leads the pack among well-known streaming platforms like Apple Music, Wynk, Songza, amazon music, etc. The success of Spotify has mainly depended on data analytics. By analyzing massive volumes of listener data, Spotify provides real-time and personalized services to its listeners. Most of Spotify's revenue comes from paid premium subscriptions. Here are some of the examples of case study on data analytics used by Spotify to provide enhanced services to its listeners:

i) Personalization of Content using Recommendation Systems

Spotify uses Bart or Bayesian Additive Regression Trees to generate music recommendations to its listeners in real-time. Bart ignores any song a user listens to for less than 30 seconds. The model is retrained every day to provide updated recommendations. A new Patent granted to Spotify for an AI application is used to identify a user's musical tastes based on audio signals, gender, age, accent to make better music recommendations.

Spotify creates daily playlists for its listeners, based on the taste profiles called 'Daily Mixes,' which have songs the user has added to their playlists or created by the artists that the user has included in their playlists. It also includes new artists and songs that the user might be unfamiliar with but might improve the playlist. Similar to it is the weekly 'Release Radar' playlists that have newly released artists' songs that the listener follows or has liked before.

ii) Targetted marketing through Customer Segmentation

With user data for enhancing personalized song recommendations, Spotify uses this massive dataset for targeted ad campaigns and personalized service recommendations for its users. Spotify uses ML models to analyze the listener's behavior and group them based on music preferences, age, gender, ethnicity, etc. These insights help them create ad campaigns for a specific target audience. One of their well-known ad campaigns was the meme-inspired ads for potential target customers, which was a huge success globally.

iii) CNN's for Classification of Songs and Audio Tracks

Spotify builds audio models to evaluate the songs and tracks, which helps develop better playlists and recommendations for its users. These allow Spotify to filter new tracks based on their lyrics and rhythms and recommend them to users like similar tracks ( collaborative filtering). Spotify also uses NLP ( Natural language processing) to scan articles and blogs to analyze the words used to describe songs and artists. These analytical insights can help group and identify similar artists and songs and leverage them to build playlists.

Here is a Music Recommender System Project for you to start learning. We have listed another music recommendations dataset for you to use for your projects: Dataset1 . You can use this dataset of Spotify metadata to classify songs based on artists, mood, liveliness. Plot histograms, heatmaps to get a better understanding of the dataset. Use classification algorithms like logistic regression, SVM, and Principal component analysis to generate valuable insights from the dataset.

Explore Categories

Below you will find case studies for data analytics in the travel and tourism industry.

Airbnb was born in 2007 in San Francisco and has since grown to 4 million Hosts and 5.6 million listings worldwide who have welcomed more than 1 billion guest arrivals in almost every country across the globe. Airbnb is active in every country on the planet except for Iran, Sudan, Syria, and North Korea. That is around 97.95% of the world. Using data as a voice of their customers, Airbnb uses the large volume of customer reviews, host inputs to understand trends across communities, rate user experiences, and uses these analytics to make informed decisions to build a better business model. The data scientists at Airbnb are developing exciting new solutions to boost the business and find the best mapping for its customers and hosts. Airbnb data servers serve approximately 10 million requests a day and process around one million search queries. Data is the voice of customers at AirBnB and offers personalized services by creating a perfect match between the guests and hosts for a supreme customer experience. 

i) Recommendation Systems and Search Ranking Algorithms

Airbnb helps people find 'local experiences' in a place with the help of search algorithms that make searches and listings precise. Airbnb uses a 'listing quality score' to find homes based on the proximity to the searched location and uses previous guest reviews. Airbnb uses deep neural networks to build models that take the guest's earlier stays into account and area information to find a perfect match. The search algorithms are optimized based on guest and host preferences, rankings, pricing, and availability to understand users’ needs and provide the best match possible.

ii) Natural Language Processing for Review Analysis

Airbnb characterizes data as the voice of its customers. The customer and host reviews give a direct insight into the experience. The star ratings alone cannot be an excellent way to understand it quantitatively. Hence Airbnb uses natural language processing to understand reviews and the sentiments behind them. The NLP models are developed using Convolutional neural networks .

Practice this Sentiment Analysis Project for analyzing product reviews to understand the basic concepts of natural language processing.

iii) Smart Pricing using Predictive Analytics

The Airbnb hosts community uses the service as a supplementary income. The vacation homes and guest houses rented to customers provide for rising local community earnings as Airbnb guests stay 2.4 times longer and spend approximately 2.3 times the money compared to a hotel guest. The profits are a significant positive impact on the local neighborhood community. Airbnb uses predictive analytics to predict the prices of the listings and help the hosts set a competitive and optimal price. The overall profitability of the Airbnb host depends on factors like the time invested by the host and responsiveness to changing demands for different seasons. The factors that impact the real-time smart pricing are the location of the listing, proximity to transport options, season, and amenities available in the neighborhood of the listing.

Here is a Price Prediction Project to help you understand the concept of predictive analysis which is widely common in case studies for data analytics. 

Uber is the biggest global taxi service provider. As of December 2018, Uber has 91 million monthly active consumers and 3.8 million drivers. Uber completes 14 million trips each day. Uber uses data analytics and big data-driven technologies to optimize their business processes and provide enhanced customer service. The Data Science team at uber has been exploring futuristic technologies to provide better service constantly. Machine learning and data analytics help Uber make data-driven decisions that enable benefits like ride-sharing, dynamic price surges, better customer support, and demand forecasting. Here are some of the real world data science projects used by uber:

i) Dynamic Pricing for Price Surges and Demand Forecasting

Uber prices change at peak hours based on demand. Uber uses surge pricing to encourage more cab drivers to sign up with the company, to meet the demand from the passengers. When the prices increase, the driver and the passenger are both informed about the surge in price. Uber uses a predictive model for price surging called the 'Geosurge' ( patented). It is based on the demand for the ride and the location.

ii) One-Click Chat

Uber has developed a Machine learning and natural language processing solution called one-click chat or OCC for coordination between drivers and users. This feature anticipates responses for commonly asked questions, making it easy for the drivers to respond to customer messages. Drivers can reply with the clock of just one button. One-Click chat is developed on Uber's machine learning platform Michelangelo to perform NLP on rider chat messages and generate appropriate responses to them.

iii) Customer Retention

Failure to meet the customer demand for cabs could lead to users opting for other services. Uber uses machine learning models to bridge this demand-supply gap. By using prediction models to predict the demand in any location, uber retains its customers. Uber also uses a tier-based reward system, which segments customers into different levels based on usage. The higher level the user achieves, the better are the perks. Uber also provides personalized destination suggestions based on the history of the user and their frequently traveled destinations.

You can take a look at this Python Chatbot Project and build a simple chatbot application to understand better the techniques used for natural language processing. You can also practice the working of a demand forecasting model with this project using time series analysis. You can look at this project which uses time series forecasting and clustering on a dataset containing geospatial data for forecasting customer demand for ola rides.

Explore More  Data Science and Machine Learning Projects for Practice. Fast-Track Your Career Transition with ProjectPro

7) LinkedIn 

LinkedIn is the largest professional social networking site with nearly 800 million members in more than 200 countries worldwide. Almost 40% of the users access LinkedIn daily, clocking around 1 billion interactions per month. The data science team at LinkedIn works with this massive pool of data to generate insights to build strategies, apply algorithms and statistical inferences to optimize engineering solutions, and help the company achieve its goals. Here are some of the real world data science projects at LinkedIn:

i) LinkedIn Recruiter Implement Search Algorithms and Recommendation Systems

LinkedIn Recruiter helps recruiters build and manage a talent pool to optimize the chances of hiring candidates successfully. This sophisticated product works on search and recommendation engines. The LinkedIn recruiter handles complex queries and filters on a constantly growing large dataset. The results delivered have to be relevant and specific. The initial search model was based on linear regression but was eventually upgraded to Gradient Boosted decision trees to include non-linear correlations in the dataset. In addition to these models, the LinkedIn recruiter also uses the Generalized Linear Mix model to improve the results of prediction problems to give personalized results.

ii) Recommendation Systems Personalized for News Feed

The LinkedIn news feed is the heart and soul of the professional community. A member's newsfeed is a place to discover conversations among connections, career news, posts, suggestions, photos, and videos. Every time a member visits LinkedIn, machine learning algorithms identify the best exchanges to be displayed on the feed by sorting through posts and ranking the most relevant results on top. The algorithms help LinkedIn understand member preferences and help provide personalized news feeds. The algorithms used include logistic regression, gradient boosted decision trees and neural networks for recommendation systems.

iii) CNN's to Detect Inappropriate Content

To provide a professional space where people can trust and express themselves professionally in a safe community has been a critical goal at LinkedIn. LinkedIn has heavily invested in building solutions to detect fake accounts and abusive behavior on their platform. Any form of spam, harassment, inappropriate content is immediately flagged and taken down. These can range from profanity to advertisements for illegal services. LinkedIn uses a Convolutional neural networks based machine learning model. This classifier trains on a training dataset containing accounts labeled as either "inappropriate" or "appropriate." The inappropriate list consists of accounts having content from "blocklisted" phrases or words and a small portion of manually reviewed accounts reported by the user community.

Here is a Text Classification Project to help you understand NLP basics for text classification. You can find a news recommendation system dataset to help you build a personalized news recommender system. You can also use this dataset to build a classifier using logistic regression, Naive Bayes, or Neural networks to classify toxic comments.

Get confident to build end-to-end projects

Access to a curated library of 250+ end-to-end industry projects with solution code, videos and tech support.

Pfizer is a multinational pharmaceutical company headquartered in New York, USA. One of the largest pharmaceutical companies globally known for developing a wide range of medicines and vaccines in disciplines like immunology, oncology, cardiology, and neurology. Pfizer became a household name in 2010 when it was the first to have a COVID-19 vaccine with FDA. In early November 2021, The CDC has approved the Pfizer vaccine for kids aged 5 to 11. Pfizer has been using machine learning and artificial intelligence to develop drugs and streamline trials, which played a massive role in developing and deploying the COVID-19 vaccine. Here are a few data analytics case studies by Pfizer :

i) Identifying Patients for Clinical Trials

Artificial intelligence and machine learning are used to streamline and optimize clinical trials to increase their efficiency. Natural language processing and exploratory data analysis of patient records can help identify suitable patients for clinical trials. These can help identify patients with distinct symptoms. These can help examine interactions of potential trial members' specific biomarkers, predict drug interactions and side effects which can help avoid complications. Pfizer's AI implementation helped rapidly identify signals within the noise of millions of data points across their 44,000-candidate COVID-19 clinical trial.

ii) Supply Chain and Manufacturing

Data science and machine learning techniques help pharmaceutical companies better forecast demand for vaccines and drugs and distribute them efficiently. Machine learning models can help identify efficient supply systems by automating and optimizing the production steps. These will help supply drugs customized to small pools of patients in specific gene pools. Pfizer uses Machine learning to predict the maintenance cost of equipment used. Predictive maintenance using AI is the next big step for Pharmaceutical companies to reduce costs.

iii) Drug Development

Computer simulations of proteins, and tests of their interactions, and yield analysis help researchers develop and test drugs more efficiently. In 2016 Watson Health and Pfizer announced a collaboration to utilize IBM Watson for Drug Discovery to help accelerate Pfizer's research in immuno-oncology, an approach to cancer treatment that uses the body's immune system to help fight cancer. Deep learning models have been used recently for bioactivity and synthesis prediction for drugs and vaccines in addition to molecular design. Deep learning has been a revolutionary technique for drug discovery as it factors everything from new applications of medications to possible toxic reactions which can save millions in drug trials.

You can create a Machine learning model to predict molecular activity to help design medicine using this dataset . You may build a CNN or a Deep neural network for this data analyst case study project.

Access Data Science and Machine Learning Project Code Examples

9) Shell Data Analyst Case Study Project

Shell is a global group of energy and petrochemical companies with over 80,000 employees in around 70 countries. Shell uses advanced technologies and innovations to help build a sustainable energy future. Shell is going through a significant transition as the world needs more and cleaner energy solutions to be a clean energy company by 2050. It requires substantial changes in the way in which energy is used. Digital technologies, including AI and Machine Learning, play an essential role in this transformation. These include efficient exploration and energy production, more reliable manufacturing, more nimble trading, and a personalized customer experience. Using AI in various phases of the organization will help achieve this goal and stay competitive in the market. Here are a few data analytics case studies in the petrochemical industry:

i) Precision Drilling

Shell is involved in the processing mining oil and gas supply, ranging from mining hydrocarbons to refining the fuel to retailing them to customers. Recently Shell has included reinforcement learning to control the drilling equipment used in mining. Reinforcement learning works on a reward-based system based on the outcome of the AI model. The algorithm is designed to guide the drills as they move through the surface, based on the historical data from drilling records. It includes information such as the size of drill bits, temperatures, pressures, and knowledge of the seismic activity. This model helps the human operator understand the environment better, leading to better and faster results will minor damage to machinery used. 

ii) Efficient Charging Terminals

Due to climate changes, governments have encouraged people to switch to electric vehicles to reduce carbon dioxide emissions. However, the lack of public charging terminals has deterred people from switching to electric cars. Shell uses AI to monitor and predict the demand for terminals to provide efficient supply. Multiple vehicles charging from a single terminal may create a considerable grid load, and predictions on demand can help make this process more efficient.

iii) Monitoring Service and Charging Stations

Another Shell initiative trialed in Thailand and Singapore is the use of computer vision cameras, which can think and understand to watch out for potentially hazardous activities like lighting cigarettes in the vicinity of the pumps while refueling. The model is built to process the content of the captured images and label and classify it. The algorithm can then alert the staff and hence reduce the risk of fires. You can further train the model to detect rash driving or thefts in the future.

Here is a project to help you understand multiclass image classification. You can use the Hourly Energy Consumption Dataset to build an energy consumption prediction model. You can use time series with XGBoost to develop your model.

10) Zomato Case Study on Data Analytics

Zomato was founded in 2010 and is currently one of the most well-known food tech companies. Zomato offers services like restaurant discovery, home delivery, online table reservation, online payments for dining, etc. Zomato partners with restaurants to provide tools to acquire more customers while also providing delivery services and easy procurement of ingredients and kitchen supplies. Currently, Zomato has over 2 lakh restaurant partners and around 1 lakh delivery partners. Zomato has closed over ten crore delivery orders as of date. Zomato uses ML and AI to boost their business growth, with the massive amount of data collected over the years from food orders and user consumption patterns. Here are a few examples of data analyst case study project developed by the data scientists at Zomato:

i) Personalized Recommendation System for Homepage

Zomato uses data analytics to create personalized homepages for its users. Zomato uses data science to provide order personalization, like giving recommendations to the customers for specific cuisines, locations, prices, brands, etc. Restaurant recommendations are made based on a customer's past purchases, browsing history, and what other similar customers in the vicinity are ordering. This personalized recommendation system has led to a 15% improvement in order conversions and click-through rates for Zomato. 

You can use the Restaurant Recommendation Dataset to build a restaurant recommendation system to predict what restaurants customers are most likely to order from, given the customer location, restaurant information, and customer order history.

ii) Analyzing Customer Sentiment

Zomato uses Natural language processing and Machine learning to understand customer sentiments using social media posts and customer reviews. These help the company gauge the inclination of its customer base towards the brand. Deep learning models analyze the sentiments of various brand mentions on social networking sites like Twitter, Instagram, Linked In, and Facebook. These analytics give insights to the company, which helps build the brand and understand the target audience.

iii) Predicting Food Preparation Time (FPT)

Food delivery time is an essential variable in the estimated delivery time of the order placed by the customer using Zomato. The food preparation time depends on numerous factors like the number of dishes ordered, time of the day, footfall in the restaurant, day of the week, etc. Accurate prediction of the food preparation time can help make a better prediction of the Estimated delivery time, which will help delivery partners less likely to breach it. Zomato uses a Bidirectional LSTM-based deep learning model that considers all these features and provides food preparation time for each order in real-time. 

Data scientists are companies' secret weapons when analyzing customer sentiments and behavior and leveraging it to drive conversion, loyalty, and profits. These 10 data science case studies projects with examples and solutions show you how various organizations use data science technologies to succeed and be at the top of their field! To summarize, Data Science has not only accelerated the performance of companies but has also made it possible to manage & sustain their performance with ease.

FAQs on Data Analysis Case Studies

A case study in data science is an in-depth analysis of a real-world problem using data-driven approaches. It involves collecting, cleaning, and analyzing data to extract insights and solve challenges, offering practical insights into how data science techniques can address complex issues across various industries.

To create a data science case study, identify a relevant problem, define objectives, and gather suitable data. Clean and preprocess data, perform exploratory data analysis, and apply appropriate algorithms for analysis. Summarize findings, visualize results, and provide actionable recommendations, showcasing the problem-solving potential of data science techniques.

Access Solved Big Data and Data Science Projects

About the Author

author profile

ProjectPro is the only online platform designed to help professionals gain practical, hands-on experience in big data, data engineering, data science, and machine learning related technologies. Having over 270+ reusable project templates in data science and big data with step-by-step walkthroughs,

arrow link

© 2024

© 2024 Iconiq Inc.

Privacy policy

User policy

Write for ProjectPro

Currently taking bookings for May >>

case study with data interpretation

The Convergence Blog

The convergence - an online community space that's dedicated to empowering operators in the data industry by providing news and education about evergreen strategies, late-breaking data & ai developments, and free or low-cost upskilling resources that you need to thrive as a leader in the data & ai space., data analysis case study: learn from humana’s automated data analysis project.

Picture of Lillian Pierson, P.E.

Lillian Pierson, P.E.

Playback speed:

Got data? Great! Looking for that perfect data analysis case study to help you get started using it? You’re in the right place.

If you’ve ever struggled to decide what to do next with your data projects, to actually find meaning in the data, or even to decide what kind of data to collect, then KEEP READING…

Deep down, you know what needs to happen. You need to initiate and execute a data strategy that really moves the needle for your organization. One that produces seriously awesome business results.

But how you’re in the right place to find out..

As a data strategist who has worked with 10 percent of Fortune 100 companies, today I’m sharing with you a case study that demonstrates just how real businesses are making real wins with data analysis. 

In the post below, we’ll look at:

  • A shining data success story;
  • What went on ‘under-the-hood’ to support that successful data project; and
  • The exact data technologies used by the vendor, to take this project from pure strategy to pure success

If you prefer to watch this information rather than read it, it’s captured in the video below:

Here’s the url too: https://youtu.be/xMwZObIqvLQ

3 Action Items You Need To Take

To actually use the data analysis case study you’re about to get – you need to take 3 main steps. Those are:

  • Reflect upon your organization as it is today (I left you some prompts below – to help you get started)
  • Review winning data case collections (starting with the one I’m sharing here) and identify 5 that seem the most promising for your organization given it’s current set-up
  • Assess your organization AND those 5 winning case collections. Based on that assessment, select the “QUICK WIN” data use case that offers your organization the most bang for it’s buck

Step 1: Reflect Upon Your Organization

Whenever you evaluate data case collections to decide if they’re a good fit for your organization, the first thing you need to do is organize your thoughts with respect to your organization as it is today.

Before moving into the data analysis case study, STOP and ANSWER THE FOLLOWING QUESTIONS – just to remind yourself:

  • What is the business vision for our organization?
  • What industries do we primarily support?
  • What data technologies do we already have up and running, that we could use to generate even more value?
  • What team members do we have to support a new data project? And what are their data skillsets like?
  • What type of data are we mostly looking to generate value from? Structured? Semi-Structured? Un-structured? Real-time data? Huge data sets? What are our data resources like?

Jot down some notes while you’re here. Then keep them in mind as you read on to find out how one company, Humana, used its data to achieve a 28 percent increase in customer satisfaction. Also include its 63 percent increase in employee engagement! (That’s such a seriously impressive outcome, right?!)

Step 2: Review Data Case Studies

Here we are, already at step 2. It’s time for you to start reviewing data analysis case studies  (starting with the one I’m sharing below). I dentify 5 that seem the most promising for your organization given its current set-up.

Humana’s Automated Data Analysis Case Study

The key thing to note here is that the approach to creating a successful data program varies from industry to industry .

Let’s start with one to demonstrate the kind of value you can glean from these kinds of success stories.

Humana has provided health insurance to Americans for over 50 years. It is a service company focused on fulfilling the needs of its customers. A great deal of Humana’s success as a company rides on customer satisfaction, and the frontline of that battle for customers’ hearts and minds is Humana’s customer service center.

Call centers are hard to get right. A lot of emotions can arise during a customer service call, especially one relating to health and health insurance. Sometimes people are frustrated. At times, they’re upset. Also, there are times the customer service representative becomes aggravated, and the overall tone and progression of the phone call goes downhill. This is of course very bad for customer satisfaction.

Humana wanted to use artificial intelligence to improve customer satisfaction (and thus, customer retention rates & profits per customer).

Humana wanted to find a way to use artificial intelligence to monitor their phone calls and help their agents do a better job connecting with their customers in order to improve customer satisfaction (and thus, customer retention rates & profits per customer ).

In light of their business need, Humana worked with a company called Cogito, which specializes in voice analytics technology.

Cogito offers a piece of AI technology called Cogito Dialogue. It’s been trained to identify certain conversational cues as a way of helping call center representatives and supervisors stay actively engaged in a call with a customer.

The AI listens to cues like the customer’s voice pitch.

If it’s rising, or if the call representative and the customer talk over each other, then the dialogue tool will send out electronic alerts to the agent during the call.

Humana fed the dialogue tool customer service data from 10,000 calls and allowed it to analyze cues such as keywords, interruptions, and pauses, and these cues were then linked with specific outcomes. For example, if the representative is receiving a particular type of cues, they are likely to get a specific customer satisfaction result.

The Outcome

Customers were happier, and customer service representatives were more engaged..

This automated solution for data analysis has now been deployed in 200 Humana call centers and the company plans to roll it out to 100 percent of its centers in the future.

The initiative was so successful, Humana has been able to focus on next steps in its data program. The company now plans to begin predicting the type of calls that are likely to go unresolved, so they can send those calls over to management before they become frustrating to the customer and customer service representative alike.

What does this mean for you and your business?

Well, if you’re looking for new ways to generate value by improving the quantity and quality of the decision support that you’re providing to your customer service personnel, then this may be a perfect example of how you can do so.

Humana’s Business Use Cases

Humana’s data analysis case study includes two key business use cases:

  • Analyzing customer sentiment; and
  • Suggesting actions to customer service representatives.

Analyzing Customer Sentiment

First things first, before you go ahead and collect data, you need to ask yourself who and what is involved in making things happen within the business.

In the case of Humana, the actors were:

  • The health insurance system itself
  • The customer, and
  • The customer service representative

As you can see in the use case diagram above, the relational aspect is pretty simple. You have a customer service representative and a customer. They are both producing audio data, and that audio data is being fed into the system.

Humana focused on collecting the key data points, shown in the image below, from their customer service operations.

By collecting data about speech style, pitch, silence, stress in customers’ voices, length of call, speed of customers’ speech, intonation, articulation, silence, and representatives’  manner of speaking, Humana was able to analyze customer sentiment and introduce techniques for improved customer satisfaction.

Having strategically defined these data points, the Cogito technology was able to generate reports about customer sentiment during the calls.

Suggesting actions to customer service representatives.

The second use case for the Humana data program follows on from the data gathered in the first case.

In Humana’s case, Cogito generated a host of call analyses and reports about key call issues.

In the second business use case, Cogito was able to suggest actions to customer service representatives, in real-time , to make use of incoming data and help improve customer satisfaction on the spot.

The technology Humana used provided suggestions via text message to the customer service representative, offering the following types of feedback:

  • The tone of voice is too tense
  • The speed of speaking is high
  • The customer representative and customer are speaking at the same time

These alerts allowed the Humana customer service representatives to alter their approach immediately , improving the quality of the interaction and, subsequently, the customer satisfaction.

The preconditions for success in this use case were:

  • The call-related data must be collected and stored
  • The AI models must be in place to generate analysis on the data points that are recorded during the calls

Evidence of success can subsequently be found in a system that offers real-time suggestions for courses of action that the customer service representative can take to improve customer satisfaction.

Thanks to this data-intensive business use case, Humana was able to increase customer satisfaction, improve customer retention rates, and drive profits per customer.

The Technology That Supports This Data Analysis Case Study

I promised to dip into the tech side of things. This is especially for those of you who are interested in the ins and outs of how projects like this one are actually rolled out.

Here’s a little rundown of the main technologies we discovered when we investigated how Cogito runs in support of its clients like Humana.

  • For cloud data management Cogito uses AWS, specifically the Athena product
  • For on-premise big data management, the company used Apache HDFS – the distributed file system for storing big data
  • They utilize MapReduce, for processing their data
  • And Cogito also has traditional systems and relational database management systems such as PostgreSQL
  • In terms of analytics and data visualization tools, Cogito makes use of Tableau
  • And for its machine learning technology, these use cases required people with knowledge in Python, R, and SQL, as well as deep learning (Cogito uses the PyTorch library and the TensorFlow library)

These data science skill sets support the effective computing, deep learning , and natural language processing applications employed by Humana for this use case.

If you’re looking to hire people to help with your own data initiative, then people with those skills listed above, and with experience in these specific technologies, would be a huge help.

Step 3: S elect The “Quick Win” Data Use Case

Still there? Great!

It’s time to close the loop.

Remember those notes you took before you reviewed the study? I want you to STOP here and assess. Does this Humana case study seem applicable and promising as a solution, given your organization’s current set-up…

YES ▶ Excellent!

Earmark it and continue exploring other winning data use cases until you’ve identified 5 that seem like great fits for your businesses needs. Evaluate those against your organization’s needs, and select the very best fit to be your “quick win” data use case. Develop your data strategy around that.

NO , Lillian – It’s not applicable. ▶  No problem.

Discard the information and continue exploring the winning data use cases we’ve categorized for you according to business function and industry. Save time by dialing down into the business function you know your business really needs help with now. Identify 5 winning data use cases that seem like great fits for your businesses needs. Evaluate those against your organization’s needs, and select the very best fit to be your “quick win” data use case. Develop your data strategy around that data use case.

More resources to get ahead...

Get income-generating ideas for data professionals, are you tired of relying on one employer for your income are you dreaming of a side hustle that won’t put you at risk of getting fired or sued well, my friend, you’re in luck..

ideas for data analyst side jobs

This 48-page listing is here to rescue you from the drudgery of corporate slavery and set you on the path to start earning more money from your existing data expertise. Spend just 1 hour with this pdf and I can guarantee you’ll be bursting at the seams with practical, proven & profitable ideas for new income-streams you can create from your existing expertise. Learn more here!

Get the convergence newsletter.

case study with data interpretation

Income-Generating Ideas For Data Professionals

A 48-page listing of income-generating product and service ideas for data professionals who want to earn additional money from their data expertise without relying on an employer to make it happen..

case study with data interpretation

Data Strategy Action Plan

A step-by-step checklist & collaborative trello board planner for data professionals who want to get unstuck & up-leveled into their next promotion by delivering a fail-proof data strategy plan for their data projects..

case study with data interpretation

Get more actionable advice by joining The Convergence Newsletter for free below.

how does blockchain support data privacy

How Does Blockchain Support Data Privacy and Storage Security?

data compliance

4 Top Data Compliance Tips and Tricks

4 steps to selecting an optimal analytics tool

4 steps to selecting an optimal analytics tool

Proven evergreen data migration strategy for data professionals who want to GET PROMOTED FAST

Proven Evergreen Data Migration Strategy for Data Professionals Who Want to GET PROMOTED FAST

Curious on what does a data product manager do?

What does a data product manager do? 3 types of work I do

learn what is gdpr compliace

What is GDPR compliance and what does it mean to your company’s bottom line?

case study with data interpretation

Fractional CMO for deep tech B2B businesses. Specializing in go-to-market strategy, SaaS product growth, and consulting revenue growth. American expat serving clients worldwide since 2012.

Get connected, © data-mania, 2012 - 2024+, all rights reserved - terms & conditions  -  privacy policy | products protected by copyscape, privacy overview.

case study with data interpretation

Get The Newsletter

U.S. flag

An official website of the United States government

The .gov means it’s official. Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

The site is secure. The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

  • Publications
  • Account settings

Preview improvements coming to the PMC website in October 2024. Learn More or Try it out now .

  • Advanced Search
  • Journal List
  • Ann Indian Acad Neurol
  • v.16(4); Oct-Dec 2013

Design and data analysis case-controlled study in clinical research

Sanjeev v. thomas.

Department of Neurology, Sree Chitra Tirunal Institute for Medical Sciences and Technology, Trivandrum, Kerala, India

Karthik Suresh

1 Department of Pulmonary and Critical Care Medicine, Johns Hopkins University School of Medicine, Louiseville, USA

Geetha Suresh

2 Department of Justice Administration, University of Louisville, Louiseville, USA

Clinicians during their training period and practice are often called upon to conduct studies to explore the association between certain exposures and disease states or interventions and outcomes. More often they need to interpret the results of research data published in the medical literature. Case-control studies are one of the most frequently used study designs for these purposes. This paper explains basic features of case control studies, rationality behind applying case control design with appropriate examples and limitations of this design. Analysis of sensitivity and specificity along with template to calculate various ratios are explained with user friendly tables and calculations in this article. The interpretation of some of the laboratory results requires sound knowledge of the various risk ratios and positive or negative predictive values for correct identification for unbiased analysis. A major advantage of case-control study is that they are small and retrospective and so they are economical than cohort studies and randomized controlled trials.


Clinicians think of case-control study when they want to ascertain association between one clinical condition and an exposure or when a researcher wants to compare patients with disease exposed to the risk factors to non-exposed control group. In other words, case-control study compares subjects who have disease or outcome (cases) with subjects who do not have the disease or outcome (controls). Historically, case control studies came into fashion in the early 20 th century, when great interest arose in the role of environmental factors (such as pipe smoke) in the pathogenesis of disease. In the 1950s, case control studies were used to link cigarette smoke and lung cancer. Case-control studies look back in time to compare “what happened” in each group to determine the relationship between the risk factor and disease. The case-control study has important advantages, including cost and ease of deployment. However, it is important to note that a positive relationship between exposure and disease does not imply causality.

At the center of the case-control study is a collection of cases. [ Figure 1 ] This explains why this type of study is often used to study rare diseases, where the prevalence of the disease may not be high enough to permit for a cohort study. A cohort study identifies patients with and without an exposure and then “looks forward” to see whether or not greater numbers of patients with an exposure develop disease.

An external file that holds a picture, illustration, etc.
Object name is AIAN-16-483-g001.jpg

Comparison of cohort and case control studies

For instance, Yang et al . studied antiepileptic drug (AED) associated rashes in Asians in a case-control study.[ 1 ] They collected cases of confirmed anti-epileptic induced severe cutaneous reactions (such as Stevens Johnson syndrome) and then, using appropriate controls, analyzed various exposures (including type of [AED] used) to look for risk factors to developing AED induced skin disease.

Choosing controls is very important aspect of case-control study design. The investigator must weigh the need for the controls to be relevant against the tendency to over match controls such that potential differences may become muted. In general, one may consider three populations: Cases, the relevant control population and the population at large. For the study above, the cases include patients with AED skin disease. In this case, the relevant control population is a group of Asian patients without skin disease. It is important for controls to be relevant: In the anti-epileptic study, it would not be appropriate to choose a population across ethnicities since one of the premises of the paper revolves around particularly susceptibility to AED drug rashes in Asian populations.

One popular method of choosing controls is to choose patients from a geographic population at large. In studying the relationship between non-steroidal anti-inflammatory drugs and Parkinson's disease (PD), Wahner et al . chose a control population from several rural California counties.[ 2 ] There are other methods of choosing controls (using patients without disease admitted to the hospital during the time of study, neighbors of disease positive cases, using mail routes to identify disease negative cases). However, one must be careful not to introduce bias into control selection. For instance, a study that enrolls cases from a clinic population should not use a hospital population as control. Studies looking at geography specific population (e.g., Neurocysticercosis in India) cannot use controls from large studies done in other populations (registries of patients from countries where disease prevalence may be drastically different than in India). In general, geographic clustering is probably the easiest way to choose controls for case-control studies.

Two popular ways of choosing controls include hospitalized patients and patients from the general population. Choosing hospitalized, disease negative patients offers several advantages, including good rates of response (patients admitted to the hospital are generally already being examined and evaluated and often tend to be available to further questioning for a study, compared with the general population, where rates of response may be much lower) and possibly less amnestic bias (patients who are already in the hospital are, by default, being asked to remember details of their presenting illnesses and as such, may more reliably remember details of exposures). However, using hospitalized patients has one large disadvantage; these patients have higher severity of disease since they required hospitalization in the first place. In addition, patients may be hospitalized for disease processes that may share features with diseases under study, thus confounding results.

Using a general population offers the advantage of being a true control group, random in its choosing and without any common features that may confound associations. However, disadvantages include poor response rates and biasing based on geography. Administering long histories and questions regarding exposures are often hard to accomplish in the general population due to the number of people willing (or rather, not willing) to undergo testing. In addition, choosing cases from the general population from particular geographic areas may bias the population toward certain characteristics (such as a socio-economic status) of that geographic population. Consider a study that uses cases from a referral clinic population that draws patients from across socio-economic strata. Using a control group selected from a population from a very affluent or very impoverished area may be problematic unless the socio-economic status is included in the final analysis.

In case-controls studies, cases are usually available before controls. When studying specific diseases, cases are often collected from specialty clinics that see large numbers of patients with a specific disease. Consider for example, the study by Garwood et al .[ 3 ] which looked at patients with established PD and looked for associations between prior amphetamine use and subsequent development various neurologic disorders. Patients in this study were chosen from specialty clinics that see large numbers of patients with certain neurologic disorders. Case definitions are very important when planning to choose cases. For instance, in a hypothetical study aiming to study cases of peripheral neuropathy, will all patients who carry a diagnosis of peripheral neuropathy be included? Or, will only patients with definite electromyography evidence of neuropathy be included? If a disease process with known histopathology is being studied, will tissue diagnosis be required for all cases? More stringent case definitions that require multiple pieces of data to be present may limit the number of cases that can be used in the study. Less stringent criteria (for instance, counting all patients with the diagnosis of “peripheral neuropathy” listed in the chart) may inadvertently choose a group of cases that are too heterogeneous.

The disease history status of the chosen cases must also be decided. Will the cases being chosen have newly diagnosed disease, or will cases of ongoing/longstanding disease also be included? Will decedent cases be included? This is important when looking at exposures in the following fashion: Consider exposure X that is associated with disease Y. Suppose that exposure X negatively affects disease Y such that patients that are X + have more severe disease. Now, a case-control study that used only patients with long-standing or ongoing disease might miss a potential association between X and Y because X + patients, due to their more aggressive course of disease, are no longer alive and therefore were not included in the analysis. If this particular confounding effect is of concern, it can be circumvented by using incident cases only.

Selection bias occurs when the exposure of interest results in more careful screening of a population, thus mimicking an association. The classic example of this phenomenon was noted in the 70s, when certain studies noted a relationship between estrogen use and endometrial cancer. However, on close analysis, it was noted that patients who used estrogen were more likely to experience vaginal bleeding, which in turn is often a cause for close examination by physicians to rule out endometrial cancer. This is often seen with certain drug exposures as well. A drug may produce various symptoms, which lead to closer physician evaluation, thus leading to more disease positive cases. Thus, when analyzed in a retrospective fashion, more of the cases may have a particular exposure only insofar as that particular exposure led to evaluations that resulted in a diagnosis, but without any direct association or causality between the exposure and disease.

One advantage of case-control studies is the ability to study multiple exposures and other risk factors within one study. In addition, the “exposure” being studied can be biochemical in nature. Consider the study, which looked at a genetic variant of a kinase enzyme as a risk factor for development of Alzheimer's disease.[ 4 ] Compare this with the study mentioned earlier by Garwood et al .,[ 3 ] where exposure data was collected by surveys and questionnaires. In this study, the authors drew blood work on cases and controls in order to assess their polymorphism status. Indeed, more than one exposure can be assessed in the same study and with planning, a researcher may look at several variables, including biochemical ones, in single case-control study.

Matching is one of three ways (along with exclusion and statistical adjustment) to adjust for differences. Matching attempts to make sure that the control group is sufficiently similar to the cases group, with respects to variables such as age, sex, etc., Cases and controls should not be matched on variables that will be analyzed for possible associations to disease. Not only should exposure variables not be included, but neither should variables that are closely related to these variables. Lastly, overmatching should be avoided. If the control group is too similar to the cases group, the study may fail to detect the difference even if one exists. In addition, adding matching categories increases expense of the study.

One measure of association derived from case control studies are sensitivity and specificity ratios. These measures are important to a researcher, to understand the correct classification. A good understanding of sensitivity and specificity is essential to understand receiver operating characteristic curve and in distinguishing correct classification of positive exposure and disease with negative exposure and no disease. Table 1 explains a hypothetical example and method of calculation of specificity and sensitivity analysis.

Hypothetical example of sensitivity, specificity and predictive values

An external file that holds a picture, illustration, etc.
Object name is AIAN-16-483-g002.jpg

Interpretation of sensitivity, specificity and predictive values

Sensitivity and specificity are statistical measures of the performance of a two by two classification of cases and controls (sick or healthy) against positives and negatives (exposed or non-exposed).[ 5 ] Sensitivity measures or identifies the proportion of actual positives identified as the percentage of sick people who are correctly identified as sick. Specificity measures or identifies the proportion of negatives identified as the percentage of healthy people who are correctly identified as healthy. Theoretically, optimum prediction aims at 100% sensitivity and specificity with a minimum of margin of error. Table 1 also shows false positive rate, which is referred to as Type I error commonly stated as α “Alpha” is calculated using the following formula: 100 − specificity, which is equal to 100 − 90.80 = 9.20% for Table 1 example. Type 1 error is also known as false positive error is referred to as a false alarm, indicates that a condition is present when it is actually not present. In the above mentioned example, a false positive error indicates the percent falsely identified healthy as sick. The reason why we want Type 1 error to be as minimum as possible is because healthy should not get treatment.

The false negative rate, which is referred to as Type II error commonly stated as β “Beta” is calculated using the following formula: 100 − sensitivity which is equal to 100 − 73.30 = 26.70% for Table 1 example. Type II error is also known as false negative error indicates that a condition is not present when it should have been present. In the above mentioned example, a false negative error indicates percent falsely identified sick as healthy. A Type 1 error unnecessarily treats a healthy, which in turn increases the budget and Type II error would risk the sick, which would act against study objectives. A researcher wants to minimize both errors, which not a simple issue because an effort to decrease one type of error increases the other type of error. The only way to minimize both type of error statistically is by increasing sample size, which may be difficult sometimes not feasible or expensive. If the sample size is too low it lacks precision and it is too large, time and resources will be wasted. Hence, the question is what should be the sample size so that the study has the power to generalize the result obtained from the study. The researcher has to decide whether, the study has enough power to make a judgment of the population from their sample. The researcher has to decide this issue in the process of designing an experiment, how large a sample is needed to enable reliable judgment.

Statistical power is same as sensitivity (73.30%). In this example, large number of false positives and few false negatives indicate the test conducted alone is not the best test to confirm the disease. Higher statistical power increase statistical significance by reducing Type 1 error which increases confidence interval. In other words, larger the power more accurately the study can mirror the behavior of the study population.

The positive predictive values (PPV) or the precision rate is referred to as the proportion of positive test results, which means correct diagnoses. If the test correctly identifies all positive conditions then the PPV would be 100% and negative predictive value (NPV) would be 0. The calculative PPV in Table 1 is 11.8%, which is not large enough to predict cases with test conducted alone. However, the NPV 99.9% indicates the test correctly identifies negative conditions.

Clinical interpretation of a test

In a sample, there are two groups those who have the disease and those who do not have the disease. A test designed to detect that disease can have two results a positive result that states that the disease is present and a negative result that states that the disease is absent. In an ideal situation, we would want the test to be positive for all persons who have the disease and test to be negative for all persons who do not have the disease. Unfortunately, reality is often far from ideal. The clinician who had ordered the test has the result as positive or negative. What conclusion can he or she make about the disease status for his patient? The first step would be to examine the reliability of the test in statistical terms. (1) What is the sensitivity of the test? (2) What is the specificity of the test? The second step is to examine it applicability to his patient. (3) What is the PPV of the test? (4) What is the NPV of the test?

Suppose the test result had come as positive. In this example the test has a sensitivity of 73.3% and specificity of 90.8%. This test is capable of detecting the disease status in 73% of cases only. It has a false positivity of 9.2%. The PPV of the test is 11.8%. In other words, there is a good possibility that the test result is false positive and the person does not have the disease. We need to look at other test results and the clinical situation. Suppose the PPV of this test was close to 80 or 90%, one could conclude that most likely the person has the disease state if the test result is positive.

Suppose the test result had come as negative. The NPV of this test is 99.9%, which means this test gave a negative result in a patient with the disease only very rarely. Hence, there is only 0.1% possibility that the person who tested negative has in fact the disease. Probably no further tests are required unless the clinical suspicion is very high.

It is very important how the clinician interprets the result of a test. The usefulness of a positive result or negative result depends upon the PPV or NPV of the test respectively. A screening test should have high sensitivity and high PPV. A confirmatory test should have high specificity and high NPV.

Case control method is most efficient, for the study of rare diseases and most common diseases. Other measures of association from case control studies are calculation of odds ratio (OR) and risk ratio which is presented in Table 2 .

Different ratio calculation templates with sample calculation

An external file that holds a picture, illustration, etc.
Object name is AIAN-16-483-g003.jpg

Absolute risk means the probability of an event occurring and are not compared with any other type of risk. Absolute risk is expressed as a ratio or percent. In the example, absolute risk reduction indicates 27.37% decline in risk. Relative risk (RR) on the other hand compares the risk among exposed and non-exposed. In the example provided in Table 2 , the non-exposed control group is 69.93% less likely compared to exposed cases. Reader should keep in mind that RR does not mean increase in risk. This means that while a 100% likely risk among those exposed cases, unexposed control is less likely by 69.93%. RR does not explain actual risk but is expressed as relative increase or decrease in risk of exposed compared to non-exposed.

OR help the researcher to conclude whether the odds of a certain event or outcome are same for two groups. It calculates the odds of a health outcome when exposed compared to non-exposed. In our example an OR of. 207 can be interpreted as the non-exposed group is less likely to experience the event compared to the exposed group. If the OR is greater than 1 (example 1.11) means that the exposed are 1.11 times more likely to be riskier than the non-exposed.

Event rate for cases (E) and controls (C) in biostatistics explains how event ratio is a measure of how often a particular statistical exposure results in occurrence of disease within the experimental group (cases) of an experiment. This value in our example is 11.76%. This value or percent explains the extent of risk to patients exposed, compared with the non-exposed.

The statistical tests that can be used for ascertain an association depends upon the variable characteristics also. If the researcher wants to find the association between two categorical variables (e.g., a positive versus negative test result and disease state expressed as present or absent), Cochran-Armitage test, which is same as Pearson Chi-squared test can be used. When the objective is to find the association between two interval or ratio level (continuous) variables, correlation and regression analysis can be performed. In order to evaluate statistical significant difference between the means of cases and control, a test of group difference can be performed. If the researcher wants to find statically significant difference among means of more than two groups, analysis of variance can be performed. A detailed explanation and how to calculate various statistical tests will be published in later issues. The success of the research directly and indirectly depends on how the following biases or systematic errors, are controlled.

When selecting cases and controls, based on exposed or not-exposed factors, the ability of subjects to recall information on exposure is collected retrospectively and often forms the basis for recall bias. Recall bias is a methodological issue. Problems of recall method are: Limitations in human ability to recall and cases may remember their exposure with more accuracy than the controls. Other possible bias is the selection bias. In case-control studies, the cases and controls are selected from the same inherited characteristics. For instance, cases collected from referral clinics often exposed to selection bias cases. If selection bias is not controlled, the findings of association, most likely may be due to of chance resulting from the study design. Another possible bias is information bias, which arises because of misclassification of the level of exposure or misclassification of disease or other symptoms of outcome itself.

Case control studies are good for studying rare diseases, but they are not generally used to study rare exposures. As Kaelin and Bayona explains[ 6 ] if a researcher want to study the risk of asthma from working in a nuclear submarine shipyard, a case control study may not be a best option because a very small proportion of people with asthma might be exposed. Similarly, case-control studies cannot be the best option to study multiple diseases or conditions because the selection of the control group may not be comparable for multiple disease or conditions selected. The major advantage of case-control study is that they are small and retrospective and so they are economical than cohort studies and randomized controlled trials.

Source of Support: Nil

Conflict of Interest: Nil

Humanities Data Analysis: Case Studies with Python

Humanities data analysis: case studies with python #.


Humanities Data Analysis: Case Studies with Python is a practical guide to data-intensive humanities research using the Python programming language. The book, written by Folgert Karsdorp , Mike Kestemont and Allen Riddell , was originally published with Princeton University Press in 2021 (for a printed version of the book, see the publisher’s website ), and is now available as an Open Access interactive Juptyer Book.

The book begins with an overview of the place of data science in the humanities, and proceeds to cover data carpentry: the essential techniques for gathering, cleaning, representing, and transforming textual and tabular data. Then, drawing from real-world, publicly available data sets that cover a variety of scholarly domains, the book delves into detailed case studies. Focusing on textual data analysis, the authors explore such diverse topics as network analysis, genre theory, onomastics, literacy, author attribution, mapping, stylometry, topic modeling, and time series analysis. Exercises and resources for further reading are provided at the end of each chapter.

What is the book about?

Learn to how effectively gather, read, store and parse different data formats, such as CSV , XML , HTML , PDF , and JSON data.

Construct Vector Space Models for texts and represent data in a tabular format. Learn how use these and other representations (such as topics ) to assess similarities and distances between texts.

Emphasizes visual storytelling via data visualizations of character networks , patterns of cultural change , statistical distributions , and (shifts in) geographical distributions .

Work on real-world case studies using publicly available data sets. Dive into the world of historical cookbooks , French drama , Danish folktale collections , the Tate art gallery , mysterious medieval manuscripts , and many more.

Accompanying Data #

The book features a large number of quality datasets. These datasets are published online and are associated with the DOI 10.5281/zenodo.891264 . They can be downloaded from the address https://doi.org/10.5281/zenodo.891264 .

Citing HDA #

If you use Humanities Data Analysis in an academic publication, please cite the original publication:

  • Cancer Nursing Practice
  • Emergency Nurse
  • Evidence-Based Nursing
  • Learning Disability Practice
  • Mental Health Practice
  • Nurse Researcher
  • Nursing Children and Young People
  • Nursing Management
  • Nursing Older People
  • Nursing Standard
  • Primary Health Care
  • RCN Nursing Awards
  • Nursing Live
  • Nursing Careers and Job Fairs
  • CPD webinars on-demand
  • --> Advanced -->

case study with data interpretation

  • Clinical articles
  • Expert advice
  • Career advice
  • Revalidation

Data analysis Previous     Next

Qualitative case study data analysis: an example from practice, catherine houghton lecturer, school of nursing and midwifery, national university of ireland, galway, republic of ireland, kathy murphy professor of nursing, national university of ireland, galway, ireland, david shaw lecturer, open university, milton keynes, uk, dympna casey senior lecturer, national university of ireland, galway, ireland.

Aim To illustrate an approach to data analysis in qualitative case study methodology.

Background There is often little detail in case study research about how data were analysed. However, it is important that comprehensive analysis procedures are used because there are often large sets of data from multiple sources of evidence. Furthermore, the ability to describe in detail how the analysis was conducted ensures rigour in reporting qualitative research.

Data sources The research example used is a multiple case study that explored the role of the clinical skills laboratory in preparing students for the real world of practice. Data analysis was conducted using a framework guided by the four stages of analysis outlined by Morse ( 1994 ): comprehending, synthesising, theorising and recontextualising. The specific strategies for analysis in these stages centred on the work of Miles and Huberman ( 1994 ), which has been successfully used in case study research. The data were managed using NVivo software.

Review methods Literature examining qualitative data analysis was reviewed and strategies illustrated by the case study example provided.

Discussion Each stage of the analysis framework is described with illustration from the research example for the purpose of highlighting the benefits of a systematic approach to handling large data sets from multiple sources.

Conclusion By providing an example of how each stage of the analysis was conducted, it is hoped that researchers will be able to consider the benefits of such an approach to their own case study analysis.

Implications for research/practice This paper illustrates specific strategies that can be employed when conducting data analysis in case study research and other qualitative research designs.

Nurse Researcher . 22, 5, 8-12. doi: 10.7748/nr.22.5.8.e1307

This article has been subject to double blind peer review

None declared

Received: 02 February 2014

Accepted: 16 April 2014

Case study data analysis - case study research methodology - clinical skills research - qualitative case study methodology - qualitative data analysis - qualitative research

User not found

Want to read more?

Already have access log in, 3-month trial offer for £5.25/month.

  • Unlimited access to all 10 RCNi Journals
  • RCNi Learning featuring over 175 modules to easily earn CPD time
  • NMC-compliant RCNi Revalidation Portfolio to stay on track with your progress
  • Personalised newsletters tailored to your interests
  • A customisable dashboard with over 200 topics

Alternatively, you can purchase access to this article for the next seven days. Buy now

Are you a student? Our student subscription has content especially for you. Find out more

case study with data interpretation

15 May 2015 / Vol 22 issue 5




Share article: Qualitative case study data analysis: an example from practice

We use cookies on this site to enhance your user experience.

By clicking any link on this page you are giving your consent for us to set cookies.

case study with data interpretation

Login | Register

  • Past Issues
  • Focus & Scope
  • Ongoing CFP
  • Special Section CFP
  • Book Review Call
  • Submission Guidelines
  • Terms of Publication
  • Before Submitting
  • Review Process
  • Peer Review Guidelines
  • Leadership Team
  • Start Submission
  • Become a Reviewer

‘The Real Data Set’: A Case of Challenging Power Dynamics and Questioning the Boundaries of Research Production

orcid logo


  • Harvard Citation Style
  • Vancouver Citation Style
  • APA Citation Style
  • Download RIS
  • Download BibTeX

While the co-production of knowledge through community-engaged research is intended to be a reciprocally beneficial process, academic institutions have often devalued community expertise by treating community organizations as subjects rather than co-creators of knowledge. Drawing from Black Feminist Epistemology, this ethnographic study examines how one community-based organization, Los Angeles Community Action Network (LA CAN), partners with academic researchers, including their discourse around partnerships and how they challenged power dynamics between community and their university partners. This paper discusses key themes from their partnerships, including centering community members’ expertise through their lived experience and forming long-term mutual relationships rooted in abolition and the Black Radical Tradition. Drawing on an analysis of LA CAN’s organizing and research processes with academic partners, we discuss how the centering of community expertise and forming relationships with academics aligned on these values can help to challenge the traditional power dynamics in community-university partnerships, resulting in different ways of knowing or what LA CAN referred to as “the real data set.” 

Keywords: Abolition, Community-engaged research, Power, Knowledge production, Community partnerships

Copeland, V. & Wells, R., (2024) “‘The Real Data Set’: A Case of Challenging Power Dynamics and Questioning the Boundaries of Research Production”, Michigan Journal of Community Service Learning 30(1). doi: https://doi.org/10.3998/mjcsl.3676

Downloads: Download PDF View PDF Download XML


Published on 26 apr 2024, peer reviewed, creative commons attribution-noncommercial-noderivs 4.0.

Pow.er- (n) The ability to define phenomena; the ability to make these phenomena act in a desired manner . - Huey P. Newton, Text on the front of the Los Angeles Community Action Network T-shirt

The co-production of knowledge in community-engaged research is intended to be an ethical and mutually beneficial process that engages researchers from multiple perspectives. However, this history of research production includes examples of unethical practices and power dynamics largely benefiting the academy (e.g., London et al., 2022 ). While research on service-learning and community engagement offer examples of partnerships grounded in trust and shared power (e.g., Reardon, 2006 ; Strand et al., 2003 ), university processes can place barriers on more equitable approaches to research. For example, academic processes, such as the Institutional Review Board, often fail to acknowledge community members as researchers, reinforcing the idea of community members as subjects and adding to this power hierarchy ( Fouche & Chubb, 2017 ). Academic researchers from decolonial, post-colonial, participatory, and feminist thought highlight the need to critically engage with and dismantle these power dynamics within community-engaged research ( Askins & Pain, 2011 ; Chatterton, 2006 ; Collins, 2013 ; Ritterbusch, 2019 ). Researchers in academic institutions must reflect on the ways that we uphold these power structures within engaged research, both intentionally and unintentionally. Drawing from McKittrick’s ( 2021 ) discussions on challenging hegemonic epistemologies and rethinking knowledge production, we focus on one community-based organization (CBO) that has challenged traditional power dynamics between community and their university partners.

This ethnographic study of Los Angeles Community Action Network (LA CAN) examines core aspects of their approach to partnerships with academic institutions and how their orientation to partnerships shapes knowledge creation and theory. First, we discuss how academic institutions and epistemologies have traditionally focused more on university expertise and neglect community expertise. We discuss scholarship on community engaged research that challenge these ideas and power dynamics, as well as barriers due to university processes and assumptions. We then present approaches that we have used in our research and how our approaches guide our work with community partners and led us to this study. We draw from an ethnographic study with LA CAN, discussing LA CAN’s approach to partnerships with academics. This includes discourse that emphasizes the role of community members as experts and equal partners in knowledge creation and long-term relationships with their academic partners rooted in shared value systems. Using McKittrick’s ( 2021 ) framework “that sharing stories is creative rigorous radical theory” (p. 73), LA CAN’s process of generating theory, which included interpreting problems and defining solutions, is deeply rooted in the Black Radical Tradition, using lived experiences to analyze and push back against institutions working against Black wellbeing and liberation. Moving from doing research on communities to doing research with communities requires us to de-center academic researchers as sole creators of knowledge, and instead build long-term mutual relationships with community researchers and theorists. This paper helps to show how shifting power dynamics can lead to new forms of knowledge production.

Literature Review: Epistemological Frameworks and Knowledge Production Within the Academy

The co-production of knowledge through community-engaged research faces many barriers due to using academia as a research partner. Academic institutions bring a set of common ideologies or values that drive academic research and have an impact on research processes with communities ( Collins, 2000 ; McKittrick, 2021 ; Stockdill & Danico, 2012 ; Tuck & Yang, 2019 ), When thinking about community-engaged research, McKittrick urges researchers to examine how we come to know what we know, and where we know from. Academic institutions in the United States context have normalized a largely positivist and objectivist standard that treats “subjective” ways of knowing through experience as less rigorous (Collins). Knowledge produced by those who have been categorized as “other” have been largely ignored and often rendered insignificant or unreliable (Collins; Tuck & Yang). In contrast, McKittrick describes radical theory-making that occurs outside of existing systems of knowledge. This includes the “act of sharing stories as the theory and the methodology” for radical theory generation (p. 73) and imagining new ways of being. The role of epistemology, or the ways in which we “assess knowledge or why we believe what we believe to be true” thus becomes essential for how we value knowledge from community sources (Collins, p. 328).

Patricia Hill Collins ( 2000 ), in Black Feminist Epistemology, writes about how normalized hegemonic practices have been perpetuated by the academy. She states that “White men control Western structures of knowledge validation, their interests pervade the themes, paradigms, and epistemologies of traditional scholarship” (Collins, p.328). She adds that “in the United States, the social institutions that legitimate knowledge as well as the Western or Eurocentric epistemologies that they uphold constitute two interrelated parts of the dominant knowledge validation processes” (Collins, p. 330). Because of this, research in the Western context has created and perpetuated barriers to conducting research that deviates from the Western, Eurocentric knowledge validation process. The prioritization and categorization of certain sources of knowledge are also fueled by academic and institutional processes such as the academic institutional review board (IRB) process. In this paper, we look at a specific type of knowledge creation that seeks to challenge these hierarchies—knowledge co-created with communities through community-engaged research.

In contrast to power hierarchies often present within academic research, principles within Participatory Action Research (PAR) require discussing and addressing power relations within research processes ( Tuck & Yang, 2019 ). This is an “ethical praxis of care” that prioritizes working with communities and ongoing negotiation between research collaborators ( Cahill, 2007, p.3 ). This is not only part of Participatory Action Research, but Tuck and Yang describe how other methods such as critical ethnography, public science, or community mapping also seek to deconstruct these power dynamics. In addition, service-learning and community engagement literature discuss the importance of challenging power dynamics, such as when they discuss the importance of shared power where campus and community partners are equal partners in decision-making ( Strand et al., 2003 ) and a trusting relationship that will develop over time ( Reardon, 2006 ). These relationships include a longer-term commitment and an understanding that relationships may not develop over a set timeline in order to be open to unexpected developments ( Enos & Morton, 2003 ) and include ongoing communication and negotiation ( Nelson et al., 2015 ).

While scholars that discuss PAR and other community-engaged research highlight mutuality and shared power, this contrasts with a biomedical research model of researcher-subject dichotomies, including assumptions about research that are within the Institutional Review Board (IRB) process. Research methods such as PAR consider community members as research collaborators while IRB processes often treat community members solely as subjects of research ( Fouche & Chubb, 2017 ; Wood, 2017 ). IRB protocols such as selection of subjects do not account for how relationships are built over time without a pre-imposed agenda (as described by Enos & Morton, 2003 ) or how research protocols are co-created. In addition to devaluing community members expertise, the focus on a researcher’s relationships with individual subjects within IRB processes does not consider liability and risks at the community level ( Tamariz et al., 2015 ; Wood, 2017 ). Wood argues that Belmont principles that are seen as central to ethics in research on human subjects are not sufficient for community-engaged research as they do not account for these community-level risks and view community members solely as subjects. In addition, this focus on what Sabati ( 2019 ) calls “decontextualized” cases of ethical violations, rather than attention to a larger complex history of ethical violations, maintains harmful research processes. Ethics in community-engaged research includes not only a commitment to care, but also a commitment to challenging the institutional foundations that have built up racial hierarchies ( Cahill, 2007 ).

The IRB is one example of academic processes that are rooted in long-standing attachments to biomedical frameworks. In addition, outside agencies often prioritize and recognize academic institutions as the main sources of knowledge, so community members are often not recognized as a source of knowledge production ( London et al., 2022 ). Community partners recognized that their knowledge was seen as less credible by policy makers and identified an epistemic injustice where knowledge had more weight when it was coming from the university (London et al.). University partners can also bring assumptions about expertise and the role of the university when working with community members. Even in partnerships that emphasized mutual relationships, academics still faced challenges in letting go of the idea that they were supposed to “fix things” ( Morton & Bergbauer, 2015 ; p. 28). Bortolin ( 2011 ) describes how discourse around partnerships can still emphasize the universities’ role and see communities as a more passive recipient. To address these logics that prioritize university knowledge, McKittrick’s ( 2021 ) critical reflection on understanding how and where we generate knowledge calls us to address how our epistemological frameworks brought us to this work. We discuss how we ground ourselves in Black Feminist Epistemology and in Critical Theory and Critical Poverty Studies and then how our collective frameworks informed this paper.

Black Feminist Epistemology—Victoria Copeland

As a Black and Filipino researcher, I am highly influenced by Black Feminist Epistemology. Patricia Hill Collins’ conceptualization of Black Feminist Epistemology offers several tenets for conducting research, including an ethos of caring, an ethos of personal accountability, considering lived experiences as criterion of meaning, and use of dialogue in assessing knowledge claims ( Collins, 2000 ). Many of these ethics intersect with PAR praxis, decolonial thought, and abolitionist praxis. Black feminist epistemology has afforded me a way to consider and use a wide array of “non-normative” methodologies. In working from this epistemological framework, in contrast to views that treat community members as research subjects, I am reminded about how both academic and community researchers provide unique contributions to knowledge creation (Collins). These unique contributions not only include perspectives but also emotions that arise through co-producing knowledge. Acknowledging the emotions requires an understanding that emotion, ethics, and reason are all interconnected which simultaneously calls for personal accountability within the research process.

In discussing lived experiences as criteria of meaning, Collins ( 2000 ) further expands on a distinction between knowledge and wisdom. She refers to knowledge as adequate for those who are usually in power, but insufficient for those who have been impacted by structures of oppression and violence. She states that wisdom is a way to think about knowledge that can only be gained through lived experience (Collins). In addition to lived experience as a criterion of meaning, I also constantly think about the ways in which dialogue provides essential knowledge validation through community and connectedness. Dialogues, as described by Collins, are a way that individuals can interact harmoniously and are a form of collective empowerment. This ethic emphasizes the relationality of research and research practices. As stated by McKittrick ( 2021 ), we need to be suspicious of “how we come to know, where we know from, and the ways in which many academic methodologies refuse black life and relational thinking” (p. 120). She adds, that:

part of our intellectual task then, is to work out how different kinds and types of texts, voice, and geographies relate to each other and open up unexpected and surprising ways to talk about liberation, knowledge, history, race, gender, narrative, and blackness (McKittrick, p.121).

Critical Theory and Critical Poverty Studies—Rachel Wells

As a white woman coming to this research after nonprofit experiences, I have drawn from critical ethnography and critical poverty studies in order to examine how power and inequality are maintained, question surface assumptions and ideas of neutrality in social welfare, and to develop research that contributes to social change ( Madison, 2011 ). Critical poverty studies also include an awareness that one cannot separate our research from relationships of power that are often the subject of this research ( Crane, 2015 ). Through experiences working at nonprofit organizations, I was concerned with how community members’ ideas were not given the same weight as nonprofit professionals. This led to interest in this topic and in critical theory, specifically critical ethnography. This type of ethnography strives to give community members more authority throughout the research process, while also recognizing how a researcher’s positionality shapes this research ( Creswell, 2013 ). With this framework, I was mindful of my role and how I worked with community members in collecting data and theorizing. I was entering spaces led by Black and Latinx community members and an organization that described their work as rooted in the Black Radical Tradition, and I feel that I gained trust with community members and formed friendships. I also identified as a member of LA CAN after my time as part of multiple committees but I still had an outsider’s perspective as a white woman who had not experienced houselessness or faced eviction. Information was shared with me when I was actively involved with the organization, but as I spent more time writing, I was less involved with the organization. I write from this mixed LA CAN member/outsider position.

Within this role as an insider/outsider, I used Angen’s concept ( 2000 ) of ethical validation, which considers the ethical aspects of the research process, from the methods chosen to how findings will be used. As part of ethical validation, I examined political implications of publishing my findings and how findings could affect relationships. LA CAN had an explicit political orientation, so I considered how my representations and descriptions were aligned with this orientation. This research was not Participatory Action Research as we did not co-create research questions. However, I formed deep relationships and served as an active member, including assisting with community-led research. While I assisted with community-led research focused on action, as part of ethical validation, I reflected on how ethnographic research that was more theoretical could also lead to action. I have a deep appreciation for the time that staff and community members spent with me during this research, but interactions could be in unequal relationships. In some settings, I was also interacting with community members in a vulnerable moment. Through data analysis and writing, I am still interpreting their experiences. I aimed to be thoughtful in how I developed themes and findings with their experiences, but I must still reflect on questions of expertise and directions of knowledge production.

A Praxis of Relationality

Victoria’s roots in Black Feminist Epistemology and Rachel’s resonance with critical poverty studies provided a unique collective perspective and framework for this study. There were several parallels between both researchers’ value systems and foundational principles. Black Feminist epistemology and critical poverty studies both include a general understanding that researchers cannot separate themselves from the research. In acknowledging the embeddedness of researchers within the research process, we also understand the necessity of considering power dynamics. This acknowledgement of power and the role of the researchers is relevant to all aspects of the research process, including research design, utilization or creation of theory, collection of data, production of knowledge, and dissemination of findings. In addition, both Black Feminist Epistemology and the ethical validation process call attention to how we practice accountability. This call for accountability includes a reflection on shared value systems, which encourages the building of relationships between all researchers within the project.

The similarities between our shared value systems, and how we come to know what we know, is relevant to this study. We learn from one community-based organization, Los Angeles Community Action Network, that has challenged traditional power dynamics through their partnerships with academic researchers. Using data from a larger ethnographic study, we asked what were their key principles for partnerships and how did they view the ideas of power, expertise, and knowledge creation through these partnerships. Through these two research questions, we looked at how LA CAN’s shifting power dynamics helps to center community expertise and uplift CBOs as movement researchers.


This case is from a larger ethnographic study on two CBOs that combine community organizing and service provision within the city of Los Angeles in order to examine this approach to working with community members and how community-based organizations (CBOs) can challenge dominant narratives of poverty. Ethnography was chosen as an ideal method for understanding the context or setting for this frontline work. The multiple methods and ongoing interactions that are part of ethnography also help to develop a complex, detailed understanding ( Creswell, 2013 ). This paper focuses on one of the CBOs, Los Angeles Community Action Network (LA CAN). LA CAN was selected for the larger study due to their combination of services and organizing and their critique of traditional service provision. Through data collection, LA CAN was identified as having a distinct approach to partnerships and language that emphasized community expertise, leading to the research questions for this paper. From its founding, LA CAN has described an important connection to the Black Radical Tradition. Robinson and Robinson ( 2017 ) discuss how traditions of Black radicalism emerged from the combination of African culture, beliefs, and enslavement and that from this conjuncture “were powerful impulses to escape enslavement” (p. 16) and a tradition of resistance shaped by this history. As discussed later in the findings, LA CAN saw their work as grounded in this tradition, which shaped partnerships and discourse. Thus, we identified LA CAN as illustrative of ways in which power dynamics can be challenged, along with the different forms of knowledge that result from this shift. In addition, documents from LA CAN described long-term relationships with researchers from several Los Angeles academic institutions, so this was an opportunity to see how one CBO partnered with researchers from multiple universities. We focus not on a specific partnership, but instead on LA CAN’s broader approach to partnerships.

Los Angeles Community Action Network

Located in the Skid Row neighborhood in the city of Los Angeles, LA CAN is focused on community organizing and power building among low-income community members. At the time of data collection, LA CAN’s twitter bio referred to “Fighting for Human Rights from the epicenter of Human Rights violations, Skid Row U.S.A.” The Skid Row neighborhood is a social service hub where community members face threats of over-policing and displacement. This neighborhood has a high concentration of unhoused residents 1 , both residents living in mega-shelters and unsheltered residents living in tents, as well as low-income tenants living in residential hotels or single-room occupancy (SRO) units. Due to downtown’s proximity to Skid Row, downtown business interests have formed partnerships with the police. As a result, residents in Skid Row have experienced a high police presence (e.g., Blasi, 2007 ).

LA CAN’s structure includes multiple committees where community members can get involved in organizing, including the housing committee, human and civil rights committee, food and wellness collaborative, and Downtown Women’s Action Coalition (DWAC). While LA CAN describes organizing as the core of their work, the LA CAN organizing model also includes community services and community building in support of their organizing, such as how their legal clinic was created to “remove the barriers to involvement that our members and constituents are facing on a daily basis.” This hybrid combination of services and community organizing is a way to respond to their political environment and conditions for community members ( Gates, 2014 ; Hyde, 1992 ). In this model, services can enhance organizing and political strategies, such as the Black Panther free breakfast program that highlighted contradictions of the state ( Heynen, 2009 ). LA CAN drew from this model, with both services and organizing as part of efforts for social change.

As an organization in the “epicenter of human rights violations,” LA CAN operates with the idea that community members who have faced structural racism and state violence should be the ones determining solutions. Through this approach, they follow principles that Sen ( 2003 ) described as new organizing. Unlike earlier organizing models that prioritized “winning issues” over the concerns of the most marginalized communities, this model centers the concerns of community members who have been marginalized. This approach includes political education ( Sen, 2003 ) and this political education happens through the long-term relationships and leadership development that are central to organizing ( Han, 2014 ) and the support and care for community members.

Methods of Data Collection

For this paper, we draw from participant observation and document review primarily collected between 2018 and 2019 and then additional social media posts from March–June 2020. Data was collected and analyzed by the first author as part of the larger study. As partnerships with universities arose as a distinctive characteristic, the first author conducted additional analysis and then findings were refined through conversations between both authors.

Participant Observation. Participant observation for the larger study primarily focused on frontline events and activities that included interactions with community members—such as community events, public actions, and organizing meetings with community members—during a 15-month period. This included a routine presence at the weekly housing committee meetings, where low-income tenants met to plan tenant outreach and housing justice campaigns, and a semi-regular presence at bi-monthly resident organizing meetings. Participant observation included public actions and events for both community residents and the larger public. The relationship with LA CAN extended beyond events that were observed for research, so field notes were not collected for events when there was no consent.

Document Review. Documents for this study include organizational websites and social media posts. While a full social media review was not feasible, social media related to events helped to capture additional perspectives on participant observation. Primary data collection was completed by January 2020, but we also include social media posts from March 15–June 1, 2020 to document their posts during the first few months of the COVID-19 pandemic. LA CAN was active in supporting community members during the COVID-19 pandemic and calling out government failures and inequities. Through this work, LA CAN maintained their distinctive approach to partnerships.

Analysis. As an ethnographic study, multiple methods of analysis—including memos, mapping, and identifying key events and examples that illustrated themes—happened concurrently ( Fetterman, 2010 ). The first step of analysis included close reading of field notes and interview transcripts and ongoing memo writing to document patterns. Memo-writing included memos of emergent findings and initial patterns from ongoing field work; memos that recorded impressions and key themes from interviews; and integrative memos that made connections between fieldnotes and interview excerpts and emerging findings ( Emerson et al., 2011 ). Memos provided an opportunity to identify key events or examples that illustrated themes ( Fetterman, 2010 ). As these themes emerged, the first author completed additional rounds of analysis to document additional instances in the data and compare across them. Through this process, LA CAN’s approach to partnerships and their discourse around partnerships arose as a distinct characteristic, so additional rounds of analysis documented key relationships with academic partners and compared LA CAN’s relationships with their academic partners to their discussion of the university as a whole. As part of ongoing analysis, we examined how LA CAN’s connection to Black radical thought and their abolitionist framework shaped their overall work and partnerships. Findings were refined through ongoing discussions among the two authors. These ongoing discussions also included discussing our positionalities and epistemological frameworks, as well as our individual relationships to LA CAN.

As an academic study, this study was collected in accordance to IRB protocols; however, we acknowledge that IRB principles are not sufficient for ethical community-engaged research (e.g., Wood, 2017 ) and additional conversations and relationships are key to ethical research. Before beginning research, staff shared expectations for researchers based on prior experiences with doctoral students completing dissertations, and discussed the importance of volunteering first before any research conversations. The first author then served as an active volunteer for nine months before beginning research. During participant observation, the first author shared emerging findings and communicated potential papers with staff as part of transparency, and also looked for ways to support the organization as part of reciprocity. The first author discussed confidentiality with the Executive Director, Pete White, before beginning data collection. Due to the public role and how comments were often in a public setting, White gave permission to use his name when writing up findings along with permission to name the organization. The participation and responses are confidential for all other participants, so we do not refer to the names of other participants in this study.

We identified two core aspects of LA CAN’s approach to partnerships with universities—their discourse that centered community expertise and partnerships through long-term relationships. These two themes—centering community expertise and long-term mutual relationships—were shaped by LA CAN’s overarching commitment to Black liberation and abolitionist praxis. Before describing these themes, we first present an event: LA CAN’s celebration for Fela Kuti’s birthday in October 2018, to introduce LA CAN’s approach to partnerships and knowledge.

In 2018, LA CAN partnered with activists and artists to host Los Angeles’ celebration for Fela Kuti’s birthday. Fela Kuti was a musician and activist who launched a musical style, Afrobeat. The Facebook post advertising the event described Fela Kuti’s importance and the goals of this event:

This will be a night of culture, creation and community, held together with sounds from the motherland, in honor and celebration of the father of Afro-beat and pioneer of Pan Africanism, Fela Anikulapo Kuti and serve as a reminder that our struggle has seen us create the most fierce and beautiful forms of resistance.

Before the panel of community activists and artists, the Executive Director of LA CAN, Pete White, introduced a professor who then discussed the connection between social movements and music. After the talk, White commented that as this professor was speaking, one panelist was nodding. White described how community members, or event panelists in this case, were validating or approving of what faculty were saying. He commented that “usually academia will tell us about community.” He contrasted these traditional methods to this event where “here community members were able to speak truth to what academia is saying.”

Through this discussion, White contrasted their view from the usual approach of “academics telling us.” Instead, in this partnership, White centered community members’ contributions and he emphasized the role of community members providing the knowledge validation for what academics were staying. This reversed the direction of expertise and power so that the experiences of community members were able to provide the expertise and be the source of approval for academic information. This event also highlighted organizing and partnerships rooted in struggles for liberation when the social media posting highlighted a “reminder that our struggle has seen us create the most fierce and beautiful forms of resistance.”

To understand how LA CAN developed partnerships where community members were “speaking truth” as opposed to the traditional approach of “academics telling us,” we discuss the two themes—discourse that centers community members’ expertise and long-term mutual relationships—with each theme drawing from events and social media postings, and then how their overarching liberatory praxis underlies their approach.

“The Real Data Set:” Discourse that Centering Community Expertise

LA CAN’s mission includes “serving as a vehicle to ensure we have voice, power & opinion in the decisions that are directly affecting us” and this mission influences their philosophy of knowledge creation. Similar to the event discussed above, LA CAN frequently emphasized how community members were experts and producers of knowledge who held authority over external decision-making. This message was shared with community members through multiple meetings and events to emphasize their expertise. In 2019, community residents and active volunteers gathered for team leader training for an upcoming survey for a community-driven research project. LA CAN was working with university partners to collect surveys, but LA CAN members were leading the survey project. In his introduction to community members, White referred to this future survey data as “our stories, our truths.” He compared LA CAN’s community-driven survey to more traditional academic projects. White commented that based on the community members’ experiences and strengths, “we could get better data” (compared to academics). He was not referring to the metrics used by academics, but similar to how Collins ( 2000 ) described wisdom from lived experience, survey questions were informed by the experiences that community members lived with each day. He described how the community was often used as a petri dish by academia, but this survey was designed with community knowledge to “interrogate and fix.” White differentiated how community members would use this data, commenting that “In academia, they call them recommendations, but we call them action steps.” In addition to being experts, community members’ knowledge also included a vision of what should happen and was more likely to lead to actionable change.

LA CAN emphasized their core ideas about expertise and knowledge creation to multiple audiences, including policymakers and academic researchers. Both the preparation meeting and then a 2019 town hall about that year’s Homeless Count with Skid Row community members and the head of the Los Angeles Homeless Service Authority (LAHSA) shows how they emphasized this message to multiple audiences. The 2019 Homeless Count documented an increase in Los Angeles City and County’s unhoused population. Shortly after the count was released, staff discussed the results of the homeless count at an organizing meeting with community members. A staff member asked each participant to share their reaction to the Homeless Count results. After each person shared, the staff member commented that “our own eyes told us it didn’t sit well” as community members thought that numbers were even higher than the report mentioned. Drawing on community members’ personal experiences, staff asked community members what LA CAN should do as an organization and prepared them for a town hall where they could ask questions of government officials and offer their expertise, with the goal that this knowledge and expertise would lead to action.

After this meeting to prepare, LA CAN subsequently held a town hall in June to discuss the results of this count that included a presentation by LAHSA, a panel featuring community members who were part of the Downtown Women’s Action Coalition (DWAC), poetry, and a video. Before introducing this panel of DWAC members, White referred to different types of data and how this panel of community members sharing their experiences included data that is rich, “data that jumps off the page, in this case, data that jumps off the stage” (referring to the stage that panelists were on). After this event, LA CAN used social media to highlight community members’ testimony. LA CAN’s twitter post about this event showed a picture of DWAC members speaking and commented, “DWAC Respondents panel, this presentation is flawless, it’s the real data set.” This “real dataset” that drew from community member experiences and their visions for change contrasted with the recent homeless count data. These posts highlighted the strength of community members as well as the importance of data from community knowledge. This language emphasized how this data showed a rich, full picture, highlighting community knowledge as “the real data set.”

Long Term Mutual Relationships: Highlighting a Different Type of Partnership

LA CAN’s belief that community expertise should drive critical decisions was threaded through their long-term relationships with academic partners. LA CAN formed partnerships with researchers from multiple universities but each relationship was built upon shared values, including abolitionist praxis, their mission that community members should be the decision-makers, the recognition of community expertise as described above. Relationships were not set up for one-time projects but for long-term projects. For example, as the first author met to talk about prospective dissertation research, staff shared expectations of first being an active volunteer and active participant. The intent was to establish a long-term relationship that could lay the groundwork for research. LA CAN identified partners in line with their core values and who were committed to these longer-term goals. Partnerships included researchers from multiple universities, but larger audiences, including other members of that same university, did not always adopt these narratives. The following example shows how LA CAN formed a long-term relationship with researchers from the University of Southern California (USC), and then LA CAN challenged partnership descriptions from that same university that overlooked community members’ contributions.

As concerns were increasing over the new COVID-19 pandemic in late February and early March 2020, LA CAN highlighted problems with access to hygiene and water for unhoused residents. Public health agencies were emphasizing the importance of washing hands, but many unhoused residents did not have access to water. LA CAN had been partnering with researchers from the USC Annenberg School for Journalism and Communication to create portable phone charging stations for a collaboration called Skid Row Power. Due to increasing concerns about the COVID-19 pandemic, this collaboration shifted their efforts to developing hand-washing stations. Because the city was not maintaining or refilling the existing stations, the Skid Row Power collaboration focused on hand washing stations that could be refilled or maintained by community residents.

As a result of their efforts, USC’s Annenberg Media posted an article on how USC students were working with LA CAN. This article focused largely on the work of USC, leading LA CAN to respond on social media in order to reframe this portrayal. Using the Facebook feature to add an emotion and location, they started their post with “Feeling annoyed at Skid Row” and then stated:

We were pleased to see our Handwashing Stations Campaign covered by Annenberg Media. We really appreciate the spotlight it shines on the hard work of our USC allies. We were also annoyed… so allow us to vent a little. The framing of the article, as often happens, casts the academic side of our partnership as ‘saviors’, bringing their skills to bear on problems that plague less fortunate folks. In fact, our work is rooted from the start in deep collaboration between grassroots organizations and academics. The community partners’ expertise and ideas are just as critical to the project’s success as the academics’ expertise. Neither side could do it without the other. So when COVID-19 hit the streets, we were ready to hit the ground running.

LA CAN then described this partnership as a long-term partnership.

This particular collaboration did not emerge out of the blue. LA CAN and USC researchers have been working together for many years, in settings ranging from research/action projects and classes to inter-disciplinary collaboratives. Together, we engage in participatory action research, leveraging co-design practices and wisdom from years of grassroots organizing… Any journalist out there interested in telling that story? We’d be happy to talk to you.

They called out this portrayal where academics were seen as “fixing things” (as discussed by Morton & Bergbauer, 2015 ) or were seen as ‘saviors’ and instead offered a different narrative.

Nine days later, LA CAN worked with their USC partners on an article that was more in line with their approach, highlighting the long-term mutual relationship where both sides brought expertise, titled “How can the houseless fight the coronavirus? A community organization partners with academics to create a grassroots hand-washing infrastructure.” LA CAN’s new Facebook post described this article:

You may remember that we weren’t exactly thrilled by the way an article portrayed our handwashing campaign partnership with a USC class. We vented a bit, but then just decided to write the story ourselves…It tells the story of our community/academic collaboration over the past few weeks, explaining that “to create lasting change, we believe grassroots organizations and academics must work together to understand obstacles, design and test practical solutions, and develop community practices around those solutions.”

In addition to calling out larger university practices, LA CAN responded by focusing on their strong partnerships with researchers. Here, relationships between academic partners and community groups did not focus on a single project or research aim, but that this was a long-term relationship. Even if media outlets described it differently, LA CAN identified partners with similar views on expertise and sharing power. This project focused on providing hand-washing stations, but partnerships were not bound to a single project. As LA CAN described “leveraging the wisdom from years of grassroots organizing,” this social media post also discussed how this wisdom resulted in new knowledge and solutions. In their partnerships with academics, LA CAN centered community members’ ideas and uplifted their key role in creating actionable change.

LA CAN partnered with researchers at the University of Southern California (USC) for specific projects, but LA CAN also had an analysis of the larger University’s role in their neighborhood. USC was located in Historic South Central, a historically Black neighborhood and now primarily Latinx and Black neighborhood where housing costs were increasing. USC’s student body did not reflect the neighborhood demographics and increasing development associated with USC further increased concerns about gentrification. At the town hall about the 2019 Homeless Count discussed previously, USC and their role in the larger survey for the Homeless Count came up during discussions. When the acronym USC was mentioned, White referred to data “from the University of South Central, as we used to call it.” As White referred to USC as University of South Central instead of its name of University of Southern California, he emphasized their location within a Black and Latinx community. The audience was not necessarily university administration for this town hall, but rather he offered this critique and reminder to community members and government staff.

Partnerships Rooted in an Abolitionist Framework and the Black Radical Tradition

LA CAN recognized their foundational and ongoing work as rooted in the Black Radical Tradition. In contrast to solely a utopian vision, Black radicalism includes striving for freedom alongside an understanding of the importance of struggles for liberation ( Robinson & Robinson, 2017 ). This description can be seen in the earlier discussion of the Fela Kuti celebration where LA CAN’s Facebook post commented that “our struggle has seen us create the most fierce and beautiful forms of resistance.” In a neighborhood where residents have dealt with the effects of structural racism, this connection was a critical part of their mission. As White described in an article with the Los Angeles Sentinel, “If Black folks aren’t picked up, lifted up, no one else will be.” (Quoted in Muhammed, 2017 ). In this article, White also described how this framework shaped their relationships.

We’ve been unapologetically Black from the get go, and because we sort of focus on the work on the ethos of the Black radical tradition, even when we have non-Black people, allies, accomplices and other members, they also understand that we’re moving and moved by the Black experience (Quoted in Muhammed).

White referred to how allies and accomplices understood “that we’re moving and moved by the Black experience,” and this included their relationships with academic partners. While a larger discussion of how this orientation shaped their day-to-day work and organizing strategies is outside the scope of this paper, we discuss how this commitment shaped their partnerships and emphasis on community knowledge.

The Black Radical Tradition and abolitionist praxis shaped LA CAN’s values of community expertise and collaborative knowledge creation. LA CAN’s abolitionist orientation was reiterated when a staff member described their stance towards policing methods at a March 2019 press conference: “We don’t want to reform, we want to abolish, we are not looking for kinder, gentler racism.” LA CAN approached research projects with an intent for action and to change the conditions of communities that were abandoned and harmed by the state. They recognized institutional racism that led to these conditions and did not want a “kinder, gentler racism” when addressing these issues. This value system was carried through in language that they used to demonstrate community expertise. In a 2019 event on housing justice that brought together academic researchers and community activists, White welcomed participants to this space. He described the history of LA CAN and the community members who came together to organize. In a community affected by structural racism and “organized abandonment” (Gilmore, as referenced by LA CAN staff), he described how community members were leading efforts: “We were canaries in the coal mine, but canaries with teeth.” Here, White recognized how community members bore the brunt of government decisions, but they were also drawing from community knowledge and fighting back.

This orientation also shaped their view on power and on knowledge. In the quote on the front of LA CAN’s t-shirt described at the beginning of this paper, LA CAN referenced Huey P. Newton, a Black revolutionary who helped found the Black Panther Party. In this quote, they expressed ideas on power as both “the ability to define phenomenon” and the “ability to make the phenomenon act in a desired manner.” Power was connected to defining the situation, and LA CAN’s commitment to community-centered expertise helped to define phenomena. In addition to emphasizing community knowledge, LA CAN recognized that being able to act on this knowledge was critical. While they had a critique of the larger university and their use of knowledge as “recommendations,” they formed partnerships with academics that recognized both aspects of this definition of power.

LA CAN formed partnerships with academics who shared this orientation, but then with their emphasis on long-term relationships, they continued to deepen relationships around this commitment. At the annual member retreat in December 2019, staff presented a document with principles for engagement. Guiding principles included “Power is within our communities,” “Our struggles are interconnected,” and “Commitment to Black liberation.” After presenting these principles, staff asked for feedback, and one person reflected that “to abide by this, we need to know each other better.” Relationships were critical for following these principles and for questioning systemic racism. While this meeting primarily included community members, they carried this same perspective into their relationships with academics. These values created a foundation for collaborations with academic partners, allowing for relationships to form upon a common goal. Additionally, this commitment ensured that future relationships and research prioritized the knowledge of individuals who were most affected by the multiple forms of oppression. This prioritizing of community expertise and setting terms around partnerships diminished power hierarchies that are often within research partnerships.

This paper highlights LA CAN’s approach to knowledge production and power within partnerships, and can offer lessons for academic researchers. As McKittrick ( 2019 ) states, part of the work in seeking liberation and “reimagining our world” is thinking with and across “knowledge systems” with the intention of “recalibrating who and what we are and what they think we are” (p. 243). LA CAN’s orientation to research ethics and partnerships was inextricably tied to intentional collective action and goals of creating change for and with community. LA CAN’s research was often a direct response to the social and political context that individuals live in. Drawing from the idea of community wisdom from lived experience and knowledge validation through community as described by Collins ( 2000 ), research at LA CAN was a product of multiple voices and collective imaginations. As a result, they described how they “could get better data.” Transforming the larger university was not their focus or priority, but rather this paper discusses how LA CAN focused on partnerships that challenged traditional academic-community relationships. The findings from LA CAN’s approach to partnerships offer lessons for community-engaged research.

The Real Data Set: Centering Community Expertise

As opposed to universities as the expert or provider of knowledge, we identified how LA CAN reframed partnerships where community members are “speaking truth.” This contrasts with assumptions such as in IRB processes where academics are the experts and community members are seen as subjects. Even within some community partnerships aimed to be reciprocal, academics struggled with letting go of the idea that they were there to “fix things” ( Morton & Bergbauer, 2015 ; also described in Bortolin, 2011 ). Instead, the phrase “the real dataset” as used by LA CAN emphasizes the central role of communities as changemakers, creators, and experts. This challenges academics to relinquish attachments to claiming “expertise” and their role as ‘saviors’ that are antithetical to relational research practices. Moreover, it raises questions about research ownership. Driven by countering racism and years of violence, LA CAN reiterated the importance of communities leading research. This expertise and lived experience led not only to “the real data set,” but this community expertise also led to “action steps” or data that would result in change.

Mutual Long-term Relationships for Collective Action

In addition to the language that LA CAN uses, we also found that partnerships included long-term relationships and action. Nelson et al. ( 2015 ) discuss how these relationships included a two-way exchange and ongoing communication and examining power within relationships (also in Reardon, 2006 ). This is similar to discussions of relationships within service-learning literature that emphasize a deeper commitment or openness to unanticipated developments ( Enos & Morton, 2003 ) or the emphasis on slow strategic relationships that Avila ( 2023 ) describes as part of community organizing. We also identified the importance of this two-way exchange and how that was key to building trust. In addition, this paper shows the importance of shared values to establishing these reciprocal relationships between academic and community researchers. In their social media posts, LA CAN described how researchers and grassroots organizations worked together for many years “in deep collaboration.” Lanz et al. ( 2021 ), who also describe a LA CAN research partnership, argue that the process and interactions were just as important as the end product. These relationships and mutual learning were themselves an outcome.

This process of mutual learning and establishing trust is not something that can be bound within biomedical models of research, and does not often fit into standardized timelines or justifications. These challenges are also discussed in literature on slow scholarship based on a feminist ethics of care. Wahab, Mehotra, and Myers ( 2021 ) describe how slow scholarship centers relationships and considers this time required to cultivate relationships as part of knowledge production (also in Mountz et al., 2015 ). Here, slow is not just about time and building relationships, but also about examining power and inequality (Mountz et al.). As a result, the collaborative form of knowledge production that happens through slow scholarship is then a counter narrative to dominant forms of knowledge production (Wahab, Mehotra, & Myers). In this paper, LA CAN’s relationships with researchers were personal, committed to care and action, and extended beyond a single project, and they used their relationships and actions as a counter narrative. While some projects had IRB approval, as researchers are working within an academic context, they first formed relationships outside of IRB and worked together on additional projects, such as supporting events and joint organizing and activism.

Key Values and Principles as Foundations of Relationships

From this study, we identified that relationships between researchers across various organizations and institutions were based upon trust, care, reflection, and a shared-value system. Due to these relationships often being formed in the context of various social injustices and violences, they require empathy and understanding as well as shared commitment to a collective purpose ( Ritterbusch, 2019 ). We also identified a shared understanding, such as the description of how allies, including academic partners, understood that they “were moving and moved by the Black experience.” We found that LA CAN’s commitment to abolitionist praxis and Black liberation created standards and expectations for partnerships, and they chose to partner with academic researchers who were aligned ethically and ideologically. Vakil et al. ( 2016 ) describe this as a politicized trust between academic and community partners that is an ongoing process, which includes examining racialized tensions and power hierarchies present within partnerships. As LA CAN described principles for engagement, including that “to abide by this, we must get to know each other better,” LA CAN also included this politicized trust and ongoing work in their long-term relationships.

When the larger universities did not share this commitment, LA CAN identified researchers with these values and offered a counter narrative through their work and discourse. As London et al. ( 2022 ) discuss, community organizations recognized that their knowledge is often not seen as legitimate and that many institutional practices and logics reinforce this epistemic injustice. London et al. argue that much needs to be done to challenge these practices and logics. Through their discourse that was shared with community members as well with larger audiences, LA CAN called out the injustices that were identified by London et al. LA CAN has advanced these ideas at the community level and through the relationships described in this paper; researchers who are also committed to these goals can carry out these tasks within the university.

Limitations and Conclusion

This study has some limitations that can affect transferability of findings. LA CAN is a unique organization with powerful examples of discourse and a dynamic Executive Director. While Pete White was the only person referred to by name due to IRB requirements, many quotes come from his discourse during public events. The uniqueness of LA CAN could affect how findings transfer to other partnerships. However, LA CAN provides a model for what long-term relationships can look like when they are rooted in a shared understanding and when an organization has strong examples of discourse, whether through an event or social media. Literature on service-learning and civic-engagement highlight the importance of a central office or university support for thoughtful community-engaged research (e.g., Enos & Morton, 2003 ; Strand et al., 2003 ). Because we examine this from the community partner’s side and look at their relationship with individual researchers, we cannot speak to the role of institutional support for the academics who partnered with LA CAN. Instead of examining partnerships from the university’s perspective, this provides a different view by examining multiple partnerships through the lens of a community organization. While this study draws from events from a specific period of data collection, relationships extended beyond data collection. For example, speakers from LA CAN have been guest speakers at courses, and authors were involved in smaller ways after data collection finished. We hope that relationships continue after research, so we cannot separate data collection from these relationships. We do not see these relationships as limitations as they are an important part of accountability, but we acknowledge that they shape data collection and analysis.

Despite limitations, the examples provided in this paper, guided by our collective epistemological frameworks, offer lessons for community-engaged research. While this specific study and the larger ethnography is not PAR, we write and analyze data with their core principles and ethics in mind and share findings as part of ongoing conversations. The example of LA CAN helps to compare their community-driven research to research projects that center academic expertise and adhere to more positivist standards. Following the ethics drawn out by LA CAN as well as Black Feminist epistemology and PAR research praxis, we highlight the importance of engaging in discussion and taking personal accountability throughout the entire research process. Further, we consider how academic researchers can follow community-driven research projects and assist in a reciprocal way. LA CAN exemplified the importance of centering community voices through long-term reciprocal research relationships that had shared value systems and action-oriented purposes. In addition to emphasizing community expertise, LA CAN challenged portrayals that identified academics as the primary experts. Furthermore, they highlighted the importance of being grounded in a Black Radical liberatory and abolitionist praxis that underlies their research endeavors and partnerships. Community-engaged research that happens through relationships aligned on core principles can result in a different type of knowledge production, where community members can “speak truth” to research and help to define the phenomenon and determine how research should be used for change. While the idea of “learning from the community” or “asking the community” can feel like a hollow phrase at times, LA CAN offers a different way of approaching community knowledge and key principles for partnerships.

  • As opposed to using the term homeless, LA CAN preferred the terms houseless or unhoused. As an LA CAN staff member described, for some people, their tent was their home, but they did not have a house. Thus, we use the term unhoused or houseless in this paper. ⮭

Angen, M. J. (2000). Evaluating interpretive inquiry: Reviewing the validity debate and opening the dialogue. Qualitative Health Research , 10(4), 378–395.   http://doi.org/10.1177/104973230001000308

Askins, K., & Pain, R. (2011). Contact zones: Participation, materiality, and the messiness of interaction. Environment and Planning D: Society and Space , 29(5), 803–821.   http://doi.org/10.1068/d11109

Avila, M. (2023). Transformative civic engagement through community organizing . Taylor & Francis.

Blasi, G. (2007). Policing our way out of homelessness: The first year of the safer cities initiative on skid row . Los Angeles, CA: USC Center for Sustainable Cities.

Bortolin, K. (2011). Serving ourselves: How the discourse on community engagement privileges the university over the community. Michigan Journal of Community Service Learning , 18(1), 49–59.

Cahill, C. (2007). Repositioning ethical commitments: Participatory action research as a relational praxis of social change. ACME An International Journal for Critical Geographies , 6(3), 360–373. https://www.acme-journal.org/index.php/acme/article/view/784/643

Chatterton, P. (2006). “Give up activism” and change the world in unknown ways: Or, learning to walk with others on uncommon ground. Antipode , 38(2), 259–281.   http://doi.org/10.1111/j.1467-8330.2006.00579.x

Collins, P. H. (2000). Black feminist thought: Knowledge, consciousness, and the politics of empowerment (2nd ed.). Routledge.

Collins, P. H. (2013). On intellectual activism . Temple University Press.

Crane, E.S. (2015). Theory should ride the bus. In A. Roy & E. S. Crane (Eds.), Territories of poverty: Rethinking north and south (pp. 344–354). University of Georgia Press.

Creswell, J. W. (2013). Qualitative inquiry & research design: Choosing among five approaches (3rd ed.). Sage Publications.

Emerson, R. M., Fretz, R. I., & Shaw, L. L. (2011). Writing ethnographic fieldnotes (2nd ed.) . Chicago, IL: University of Chicago Press.

Enos, S., & Morton, K. (2003). Developing a theory and practice of campus-community partnerships. Building partnerships for service-learning , 20–41.

Fetterman, D.M. (2010). Ethnography: Step-by-step (3rd ed.). Sage Publications.

Fouché, C. B., & Chubb, L. A. (2017). Action researchers encountering ethical review: a literature synthesis on challenges and strategies. Educational Action Research , 25(1), 23–34.   http://doi.org/10.1080/09650792.2015.1128956

Gates, A. B. 2014. Integrating social services and social change: Lessons from an immigrant worker center. Journal of Community Practice , 22, 102–129.   http://doi.org/10.1080/10705422.2014.901270

Han, H. (2014). How organizations develop activists: Civic associations and leadership in the 21st century . New York, NY: Oxford University Press.

Heynen, N. (2009). Bending the bars of empire from every ghetto for survival: the Black Panther Party’s radical antihunger politics of social reproduction and scale. Annals of the Association of American Geographers , 99, 406–422.   http://doi.org/10.1080/00045600802683767

Hyde, C. (1992). The ideational system of social movement agencies. In Y. Hasenfeld (Ed.), Human services as complex organizations (pp. 121–144). Newbury Park, CA: Sage Publications.

Lanz, P., Cunningham, T., Nguyen, H., White, P., & Bar, F. (2021). Skid Row Power Now! A participatory co-design project to power up digital devices in Skid Row. Proceedings of the 10 th International Conference on Communities & Technologies-Wicked Problems in the Age of Tech , 10, 121–127.   http://doi.org/10.1145/3461564.3461595

London, R.A., Glass, R.D., Chang, E., Sabati, S., & Nojan, S. (2022). “We are about life-changing research”: Community partner perspectives on community-engaged research collaborations. Journal of Higher Education Outreach and Engagement , 26, 19–36. https://openjournals.libs.uga.edu/jheoe/article/view/2512/2722

Madison, D. S. (2011). Critical ethnography: Method, ethics, and performance (2nd Ed.). Thousand Oaks, CA: Sage Publications.

McKittrick, K. (2019). Rift. In Antipode Editorial Collective (Eds.), Keywords in radical geography: Antipode at 50 (pp. 243–248). Antipode Foundation Ltd.

McKittrick, K. (2021). Dear science and other stories . Duke University Press.

Morton, K., & Bergbauer, S. (2015). A case for community: Starting with relationships and prioritizing community as method in service-learning. Michigan Journal of Community Service Learning , 22(1), 18–31.

Mountz, A., Bonds, A., Mansfield, B., Loyd, J., Hyndman, J., Walton-Roberts, M., … & Curran, W. (2015). For slow scholarship: A feminist politics of resistance through collective action in the neoliberal university. ACME: An International Journal for Critical Geographies , 14(4), 1235–1259.

Muhammed, C. (2017, March 15). Homeless activist Pete White goes 1-on-1 with the Sentinel. Los Angeles Sentinel . https://lasentinel.net/homeless-activist-pete-white-goes-1-on-1-with-the-sentinel.html

Nelson, I., London, R. A., & Strobel, K. (2015). Reinventing the role of the university researcher. Educational Researcher , 44(1), 17–26.   http://doi.org/10.3102/0013189X1557038

Reardon, K. M. (2006). Promoting reciprocity within community/university development partnerships: Lessons from the field. Planning, Practice & Research , 21(1), 95–107.

Ritterbusch, A. (2019). Empathy at knifepoint: The dangers of research and lite pedagogies for social justice movement. Antipode , 51(4), 1296–1317.   http://doi.org/10.1111/anti.12530

Robinson, C., & Robinson, E. (2017). Preface. In G. T. Johnson & A. Lubin (Eds.), Futures of Black radicalism . Verso Books.

Sabati, S. (2019). Upholding “colonial unknowing” through the IRB: Reframing institutional research ethics. Qualitative Inquiry , 25(9–10), 1056–1064.   http://doi.org/10.1177/1077800418787214

Sen, R. (2003). Stir it up: Lessons in community organizing and advocacy . John Wiley & Sons.

Stockdill, B. C., & Danico, M. Y. (Eds.). (2012). Transforming the ivory tower: Challenging racism, sexism, and homophobia in the academy . University of Hawaii Press.

Strand, K., Marullo, S., Cutforth, N., Stoecker, R., & Donohue, P. (2003). Principles of best practice for community-based research. Michigan Journal of Community Service Learning , 9(3).

Tamariz, L., Medina, H., Taylor, J., Carrasquillo, O., Kobetz, E., & Palacio, A. (2015). Are research ethics committees prepared for community-based participatory research? Journal of Empirical Research on Human Research Ethics , 10(5), 488–495.   http://doi.org/10.1177/1556264615615008

Tuck, E. & Yang, K. W. (2019). Series editors introduction. In L. Tuhiwai Smith, E. Tuck & K. E. Yang (Eds.), Indigenous and Decolonizing Studies in Education: Mapping the Long View . Taylor & Francis.

Vakil, S., McKinney de Royston, M., Suad Nasir, N. I., & Kirshner, B. (2016). Rethinking race and power in design-based research: Reflections from the field. Cognition and Instruction , 34(3), 194–209.   http://doi.org/10.1080/07370008.2016.1169817

Wahab, S., Mehrotra, G. R., & Myers, K. E. (2021). Slow scholarship for social work: A praxis of resistance and creativity. Qualitative Social Work , 21(1), 147–159.   http://doi.org/10.1177/1473325021990865

Wood, L. (2017). The ethical implications of community-based research: A call to rethink current review board requirements. International Journal of Qualitative Methods , 16(1), 1609406917748276

Author Note

We are grateful for the staff and members of Los Angeles Community Action Network for welcoming us into your organization and allowing us to learn and organize with you. We gratefully acknowledge Brenda Tully for comments on earlier drafts and the two anonymous reviewers for their thoughtful comments. This research was partially funded through the generous support of the UCLA Graduate Division Dissertation Year Fellowship and a UCSB Blum Center Research Fellowship on Poverty, Inequality, and Democracy.

Rachel Wells, PhD , is an Assistant Professor of Social Work and the MSW Program Director at Lewis University. Her research examines assumptions about poverty that shape social services and the role of community-based organizations in low-income neighborhoods and she has worked with grassroots organizations and housing justice efforts as part of her research.

Victoria Copeland, PhD , is a researcher, organizer, and senior policy analyst. Their current work focuses on the use of data and technology within the criminal legal system and social services.

Both authors contributed equally.

Corresponding author: Rachel Wells, Lewis University, Department of Social Welfare, One University Parkway, Romeoville IL 60446, United States. E-mail address: [email protected]

Harvard-Style Citation

Copeland, V & Wells, R. (2024) '‘The Real Data Set’: A Case of Challenging Power Dynamics and Questioning the Boundaries of Research Production', Michigan Journal of Community Service Learning . 30(1) doi: 10.3998/mjcsl.3676

Show: Vancouver Citation Style | APA Citation Style

Vancouver-Style Citation

Copeland, V & Wells, R. ‘The Real Data Set’: A Case of Challenging Power Dynamics and Questioning the Boundaries of Research Production. Michigan Journal of Community Service Learning. 2024 4; 30(1) doi: 10.3998/mjcsl.3676

Show: Harvard Citation Style | APA Citation Style

APA-Style Citation

Copeland, V & Wells, R. (2024, 4 26). ‘The Real Data Set’: A Case of Challenging Power Dynamics and Questioning the Boundaries of Research Production. Michigan Journal of Community Service Learning 30(1) doi: 10.3998/mjcsl.3676

Show: Harvard Citation Style | {% trans 'Vancouver Citation Style' %}

Non Specialist Summary

This article has no summary

Thank you for visiting nature.com. You are using a browser version with limited support for CSS. To obtain the best experience, we recommend you use a more up to date browser (or turn off compatibility mode in Internet Explorer). In the meantime, to ensure continued support, we are displaying the site without styles and JavaScript.

  • View all journals
  • My Account Login
  • Explore content
  • About the journal
  • Publish with us
  • Sign up for alerts
  • Open access
  • Published: 17 April 2024

The economic commitment of climate change

  • Maximilian Kotz   ORCID: orcid.org/0000-0003-2564-5043 1 , 2 ,
  • Anders Levermann   ORCID: orcid.org/0000-0003-4432-4704 1 , 2 &
  • Leonie Wenz   ORCID: orcid.org/0000-0002-8500-1568 1 , 3  

Nature volume  628 ,  pages 551–557 ( 2024 ) Cite this article

77k Accesses

3413 Altmetric

Metrics details

  • Environmental economics
  • Environmental health
  • Interdisciplinary studies
  • Projection and prediction

Global projections of macroeconomic climate-change damages typically consider impacts from average annual and national temperatures over long time horizons 1 , 2 , 3 , 4 , 5 , 6 . Here we use recent empirical findings from more than 1,600 regions worldwide over the past 40 years to project sub-national damages from temperature and precipitation, including daily variability and extremes 7 , 8 . Using an empirical approach that provides a robust lower bound on the persistence of impacts on economic growth, we find that the world economy is committed to an income reduction of 19% within the next 26 years independent of future emission choices (relative to a baseline without climate impacts, likely range of 11–29% accounting for physical climate and empirical uncertainty). These damages already outweigh the mitigation costs required to limit global warming to 2 °C by sixfold over this near-term time frame and thereafter diverge strongly dependent on emission choices. Committed damages arise predominantly through changes in average temperature, but accounting for further climatic components raises estimates by approximately 50% and leads to stronger regional heterogeneity. Committed losses are projected for all regions except those at very high latitudes, at which reductions in temperature variability bring benefits. The largest losses are committed at lower latitudes in regions with lower cumulative historical emissions and lower present-day income.

Similar content being viewed by others

case study with data interpretation

Climate damage projections beyond annual temperature

case study with data interpretation

Investment incentive reduced by climate damages can be restored by optimal policy

case study with data interpretation

Climate economics support for the UN climate targets

Projections of the macroeconomic damage caused by future climate change are crucial to informing public and policy debates about adaptation, mitigation and climate justice. On the one hand, adaptation against climate impacts must be justified and planned on the basis of an understanding of their future magnitude and spatial distribution 9 . This is also of importance in the context of climate justice 10 , as well as to key societal actors, including governments, central banks and private businesses, which increasingly require the inclusion of climate risks in their macroeconomic forecasts to aid adaptive decision-making 11 , 12 . On the other hand, climate mitigation policy such as the Paris Climate Agreement is often evaluated by balancing the costs of its implementation against the benefits of avoiding projected physical damages. This evaluation occurs both formally through cost–benefit analyses 1 , 4 , 5 , 6 , as well as informally through public perception of mitigation and damage costs 13 .

Projections of future damages meet challenges when informing these debates, in particular the human biases relating to uncertainty and remoteness that are raised by long-term perspectives 14 . Here we aim to overcome such challenges by assessing the extent of economic damages from climate change to which the world is already committed by historical emissions and socio-economic inertia (the range of future emission scenarios that are considered socio-economically plausible 15 ). Such a focus on the near term limits the large uncertainties about diverging future emission trajectories, the resulting long-term climate response and the validity of applying historically observed climate–economic relations over long timescales during which socio-technical conditions may change considerably. As such, this focus aims to simplify the communication and maximize the credibility of projected economic damages from future climate change.

In projecting the future economic damages from climate change, we make use of recent advances in climate econometrics that provide evidence for impacts on sub-national economic growth from numerous components of the distribution of daily temperature and precipitation 3 , 7 , 8 . Using fixed-effects panel regression models to control for potential confounders, these studies exploit within-region variation in local temperature and precipitation in a panel of more than 1,600 regions worldwide, comprising climate and income data over the past 40 years, to identify the plausibly causal effects of changes in several climate variables on economic productivity 16 , 17 . Specifically, macroeconomic impacts have been identified from changing daily temperature variability, total annual precipitation, the annual number of wet days and extreme daily rainfall that occur in addition to those already identified from changing average temperature 2 , 3 , 18 . Moreover, regional heterogeneity in these effects based on the prevailing local climatic conditions has been found using interactions terms. The selection of these climate variables follows micro-level evidence for mechanisms related to the impacts of average temperatures on labour and agricultural productivity 2 , of temperature variability on agricultural productivity and health 7 , as well as of precipitation on agricultural productivity, labour outcomes and flood damages 8 (see Extended Data Table 1 for an overview, including more detailed references). References  7 , 8 contain a more detailed motivation for the use of these particular climate variables and provide extensive empirical tests about the robustness and nature of their effects on economic output, which are summarized in Methods . By accounting for these extra climatic variables at the sub-national level, we aim for a more comprehensive description of climate impacts with greater detail across both time and space.

Constraining the persistence of impacts

A key determinant and source of discrepancy in estimates of the magnitude of future climate damages is the extent to which the impact of a climate variable on economic growth rates persists. The two extreme cases in which these impacts persist indefinitely or only instantaneously are commonly referred to as growth or level effects 19 , 20 (see Methods section ‘Empirical model specification: fixed-effects distributed lag models’ for mathematical definitions). Recent work shows that future damages from climate change depend strongly on whether growth or level effects are assumed 20 . Following refs.  2 , 18 , we provide constraints on this persistence by using distributed lag models to test the significance of delayed effects separately for each climate variable. Notably, and in contrast to refs.  2 , 18 , we use climate variables in their first-differenced form following ref.  3 , implying a dependence of the growth rate on a change in climate variables. This choice means that a baseline specification without any lags constitutes a model prior of purely level effects, in which a permanent change in the climate has only an instantaneous effect on the growth rate 3 , 19 , 21 . By including lags, one can then test whether any effects may persist further. This is in contrast to the specification used by refs.  2 , 18 , in which climate variables are used without taking the first difference, implying a dependence of the growth rate on the level of climate variables. In this alternative case, the baseline specification without any lags constitutes a model prior of pure growth effects, in which a change in climate has an infinitely persistent effect on the growth rate. Consequently, including further lags in this alternative case tests whether the initial growth impact is recovered 18 , 19 , 21 . Both of these specifications suffer from the limiting possibility that, if too few lags are included, one might falsely accept the model prior. The limitations of including a very large number of lags, including loss of data and increasing statistical uncertainty with an increasing number of parameters, mean that such a possibility is likely. By choosing a specification in which the model prior is one of level effects, our approach is therefore conservative by design, avoiding assumptions of infinite persistence of climate impacts on growth and instead providing a lower bound on this persistence based on what is observable empirically (see Methods section ‘Empirical model specification: fixed-effects distributed lag models’ for further exposition of this framework). The conservative nature of such a choice is probably the reason that ref.  19 finds much greater consistency between the impacts projected by models that use the first difference of climate variables, as opposed to their levels.

We begin our empirical analysis of the persistence of climate impacts on growth using ten lags of the first-differenced climate variables in fixed-effects distributed lag models. We detect substantial effects on economic growth at time lags of up to approximately 8–10 years for the temperature terms and up to approximately 4 years for the precipitation terms (Extended Data Fig. 1 and Extended Data Table 2 ). Furthermore, evaluation by means of information criteria indicates that the inclusion of all five climate variables and the use of these numbers of lags provide a preferable trade-off between best-fitting the data and including further terms that could cause overfitting, in comparison with model specifications excluding climate variables or including more or fewer lags (Extended Data Fig. 3 , Supplementary Methods Section  1 and Supplementary Table 1 ). We therefore remove statistically insignificant terms at later lags (Supplementary Figs. 1 – 3 and Supplementary Tables 2 – 4 ). Further tests using Monte Carlo simulations demonstrate that the empirical models are robust to autocorrelation in the lagged climate variables (Supplementary Methods Section  2 and Supplementary Figs. 4 and 5 ), that information criteria provide an effective indicator for lag selection (Supplementary Methods Section  2 and Supplementary Fig. 6 ), that the results are robust to concerns of imperfect multicollinearity between climate variables and that including several climate variables is actually necessary to isolate their separate effects (Supplementary Methods Section  3 and Supplementary Fig. 7 ). We provide a further robustness check using a restricted distributed lag model to limit oscillations in the lagged parameter estimates that may result from autocorrelation, finding that it provides similar estimates of cumulative marginal effects to the unrestricted model (Supplementary Methods Section 4 and Supplementary Figs. 8 and 9 ). Finally, to explicitly account for any outstanding uncertainty arising from the precise choice of the number of lags, we include empirical models with marginally different numbers of lags in the error-sampling procedure of our projection of future damages. On the basis of the lag-selection procedure (the significance of lagged terms in Extended Data Fig. 1 and Extended Data Table 2 , as well as information criteria in Extended Data Fig. 3 ), we sample from models with eight to ten lags for temperature and four for precipitation (models shown in Supplementary Figs. 1 – 3 and Supplementary Tables 2 – 4 ). In summary, this empirical approach to constrain the persistence of climate impacts on economic growth rates is conservative by design in avoiding assumptions of infinite persistence, but nevertheless provides a lower bound on the extent of impact persistence that is robust to the numerous tests outlined above.

Committed damages until mid-century

We combine these empirical economic response functions (Supplementary Figs. 1 – 3 and Supplementary Tables 2 – 4 ) with an ensemble of 21 climate models (see Supplementary Table 5 ) from the Coupled Model Intercomparison Project Phase 6 (CMIP-6) 22 to project the macroeconomic damages from these components of physical climate change (see Methods for further details). Bias-adjusted climate models that provide a highly accurate reproduction of observed climatological patterns with limited uncertainty (Supplementary Table 6 ) are used to avoid introducing biases in the projections. Following a well-developed literature 2 , 3 , 19 , these projections do not aim to provide a prediction of future economic growth. Instead, they are a projection of the exogenous impact of future climate conditions on the economy relative to the baselines specified by socio-economic projections, based on the plausibly causal relationships inferred by the empirical models and assuming ceteris paribus. Other exogenous factors relevant for the prediction of economic output are purposefully assumed constant.

A Monte Carlo procedure that samples from climate model projections, empirical models with different numbers of lags and model parameter estimates (obtained by 1,000 block-bootstrap resamples of each of the regressions in Supplementary Figs. 1 – 3 and Supplementary Tables 2 – 4 ) is used to estimate the combined uncertainty from these sources. Given these uncertainty distributions, we find that projected global damages are statistically indistinguishable across the two most extreme emission scenarios until 2049 (at the 5% significance level; Fig. 1 ). As such, the climate damages occurring before this time constitute those to which the world is already committed owing to the combination of past emissions and the range of future emission scenarios that are considered socio-economically plausible 15 . These committed damages comprise a permanent income reduction of 19% on average globally (population-weighted average) in comparison with a baseline without climate-change impacts (with a likely range of 11–29%, following the likelihood classification adopted by the Intergovernmental Panel on Climate Change (IPCC); see caption of Fig. 1 ). Even though levels of income per capita generally still increase relative to those of today, this constitutes a permanent income reduction for most regions, including North America and Europe (each with median income reductions of approximately 11%) and with South Asia and Africa being the most strongly affected (each with median income reductions of approximately 22%; Fig. 1 ). Under a middle-of-the road scenario of future income development (SSP2, in which SSP stands for Shared Socio-economic Pathway), this corresponds to global annual damages in 2049 of 38 trillion in 2005 international dollars (likely range of 19–59 trillion 2005 international dollars). Compared with empirical specifications that assume pure growth or pure level effects, our preferred specification that provides a robust lower bound on the extent of climate impact persistence produces damages between these two extreme assumptions (Extended Data Fig. 3 ).

figure 1

Estimates of the projected reduction in income per capita from changes in all climate variables based on empirical models of climate impacts on economic output with a robust lower bound on their persistence (Extended Data Fig. 1 ) under a low-emission scenario compatible with the 2 °C warming target and a high-emission scenario (SSP2-RCP2.6 and SSP5-RCP8.5, respectively) are shown in purple and orange, respectively. Shading represents the 34% and 10% confidence intervals reflecting the likely and very likely ranges, respectively (following the likelihood classification adopted by the IPCC), having estimated uncertainty from a Monte Carlo procedure, which samples the uncertainty from the choice of physical climate models, empirical models with different numbers of lags and bootstrapped estimates of the regression parameters shown in Supplementary Figs. 1 – 3 . Vertical dashed lines show the time at which the climate damages of the two emission scenarios diverge at the 5% and 1% significance levels based on the distribution of differences between emission scenarios arising from the uncertainty sampling discussed above. Note that uncertainty in the difference of the two scenarios is smaller than the combined uncertainty of the two respective scenarios because samples of the uncertainty (climate model and empirical model choice, as well as model parameter bootstrap) are consistent across the two emission scenarios, hence the divergence of damages occurs while the uncertainty bounds of the two separate damage scenarios still overlap. Estimates of global mitigation costs from the three IAMs that provide results for the SSP2 baseline and SSP2-RCP2.6 scenario are shown in light green in the top panel, with the median of these estimates shown in bold.

Damages already outweigh mitigation costs

We compare the damages to which the world is committed over the next 25 years to estimates of the mitigation costs required to achieve the Paris Climate Agreement. Taking estimates of mitigation costs from the three integrated assessment models (IAMs) in the IPCC AR6 database 23 that provide results under comparable scenarios (SSP2 baseline and SSP2-RCP2.6, in which RCP stands for Representative Concentration Pathway), we find that the median committed climate damages are larger than the median mitigation costs in 2050 (six trillion in 2005 international dollars) by a factor of approximately six (note that estimates of mitigation costs are only provided every 10 years by the IAMs and so a comparison in 2049 is not possible). This comparison simply aims to compare the magnitude of future damages against mitigation costs, rather than to conduct a formal cost–benefit analysis of transitioning from one emission path to another. Formal cost–benefit analyses typically find that the net benefits of mitigation only emerge after 2050 (ref.  5 ), which may lead some to conclude that physical damages from climate change are simply not large enough to outweigh mitigation costs until the second half of the century. Our simple comparison of their magnitudes makes clear that damages are actually already considerably larger than mitigation costs and the delayed emergence of net mitigation benefits results primarily from the fact that damages across different emission paths are indistinguishable until mid-century (Fig. 1 ).

Although these near-term damages constitute those to which the world is already committed, we note that damage estimates diverge strongly across emission scenarios after 2049, conveying the clear benefits of mitigation from a purely economic point of view that have been emphasized in previous studies 4 , 24 . As well as the uncertainties assessed in Fig. 1 , these conclusions are robust to structural choices, such as the timescale with which changes in the moderating variables of the empirical models are estimated (Supplementary Figs. 10 and 11 ), as well as the order in which one accounts for the intertemporal and international components of currency comparison (Supplementary Fig. 12 ; see Methods for further details).

Damages from variability and extremes

Committed damages primarily arise through changes in average temperature (Fig. 2 ). This reflects the fact that projected changes in average temperature are larger than those in other climate variables when expressed as a function of their historical interannual variability (Extended Data Fig. 4 ). Because the historical variability is that on which the empirical models are estimated, larger projected changes in comparison with this variability probably lead to larger future impacts in a purely statistical sense. From a mechanistic perspective, one may plausibly interpret this result as implying that future changes in average temperature are the most unprecedented from the perspective of the historical fluctuations to which the economy is accustomed and therefore will cause the most damage. This insight may prove useful in terms of guiding adaptation measures to the sources of greatest damage.

figure 2

Estimates of the median projected reduction in sub-national income per capita across emission scenarios (SSP2-RCP2.6 and SSP2-RCP8.5) as well as climate model, empirical model and model parameter uncertainty in the year in which climate damages diverge at the 5% level (2049, as identified in Fig. 1 ). a , Impacts arising from all climate variables. b – f , Impacts arising separately from changes in annual mean temperature ( b ), daily temperature variability ( c ), total annual precipitation ( d ), the annual number of wet days (>1 mm) ( e ) and extreme daily rainfall ( f ) (see Methods for further definitions). Data on national administrative boundaries are obtained from the GADM database version 3.6 and are freely available for academic use ( https://gadm.org/ ).

Nevertheless, future damages based on empirical models that consider changes in annual average temperature only and exclude the other climate variables constitute income reductions of only 13% in 2049 (Extended Data Fig. 5a , likely range 5–21%). This suggests that accounting for the other components of the distribution of temperature and precipitation raises net damages by nearly 50%. This increase arises through the further damages that these climatic components cause, but also because their inclusion reveals a stronger negative economic response to average temperatures (Extended Data Fig. 5b ). The latter finding is consistent with our Monte Carlo simulations, which suggest that the magnitude of the effect of average temperature on economic growth is underestimated unless accounting for the impacts of other correlated climate variables (Supplementary Fig. 7 ).

In terms of the relative contributions of the different climatic components to overall damages, we find that accounting for daily temperature variability causes the largest increase in overall damages relative to empirical frameworks that only consider changes in annual average temperature (4.9 percentage points, likely range 2.4–8.7 percentage points, equivalent to approximately 10 trillion international dollars). Accounting for precipitation causes smaller increases in overall damages, which are—nevertheless—equivalent to approximately 1.2 trillion international dollars: 0.01 percentage points (−0.37–0.33 percentage points), 0.34 percentage points (0.07–0.90 percentage points) and 0.36 percentage points (0.13–0.65 percentage points) from total annual precipitation, the number of wet days and extreme daily precipitation, respectively. Moreover, climate models seem to underestimate future changes in temperature variability 25 and extreme precipitation 26 , 27 in response to anthropogenic forcing as compared with that observed historically, suggesting that the true impacts from these variables may be larger.

The distribution of committed damages

The spatial distribution of committed damages (Fig. 2a ) reflects a complex interplay between the patterns of future change in several climatic components and those of historical economic vulnerability to changes in those variables. Damages resulting from increasing annual mean temperature (Fig. 2b ) are negative almost everywhere globally, and larger at lower latitudes in regions in which temperatures are already higher and economic vulnerability to temperature increases is greatest (see the response heterogeneity to mean temperature embodied in Extended Data Fig. 1a ). This occurs despite the amplified warming projected at higher latitudes 28 , suggesting that regional heterogeneity in economic vulnerability to temperature changes outweighs heterogeneity in the magnitude of future warming (Supplementary Fig. 13a ). Economic damages owing to daily temperature variability (Fig. 2c ) exhibit a strong latitudinal polarisation, primarily reflecting the physical response of daily variability to greenhouse forcing in which increases in variability across lower latitudes (and Europe) contrast decreases at high latitudes 25 (Supplementary Fig. 13b ). These two temperature terms are the dominant determinants of the pattern of overall damages (Fig. 2a ), which exhibits a strong polarity with damages across most of the globe except at the highest northern latitudes. Future changes in total annual precipitation mainly bring economic benefits except in regions of drying, such as the Mediterranean and central South America (Fig. 2d and Supplementary Fig. 13c ), but these benefits are opposed by changes in the number of wet days, which produce damages with a similar pattern of opposite sign (Fig. 2e and Supplementary Fig. 13d ). By contrast, changes in extreme daily rainfall produce damages in all regions, reflecting the intensification of daily rainfall extremes over global land areas 29 , 30 (Fig. 2f and Supplementary Fig. 13e ).

The spatial distribution of committed damages implies considerable injustice along two dimensions: culpability for the historical emissions that have caused climate change and pre-existing levels of socio-economic welfare. Spearman’s rank correlations indicate that committed damages are significantly larger in countries with smaller historical cumulative emissions, as well as in regions with lower current income per capita (Fig. 3 ). This implies that those countries that will suffer the most from the damages already committed are those that are least responsible for climate change and which also have the least resources to adapt to it.

figure 3

Estimates of the median projected change in national income per capita across emission scenarios (RCP2.6 and RCP8.5) as well as climate model, empirical model and model parameter uncertainty in the year in which climate damages diverge at the 5% level (2049, as identified in Fig. 1 ) are plotted against cumulative national emissions per capita in 2020 (from the Global Carbon Project) and coloured by national income per capita in 2020 (from the World Bank) in a and vice versa in b . In each panel, the size of each scatter point is weighted by the national population in 2020 (from the World Bank). Inset numbers indicate the Spearman’s rank correlation ρ and P -values for a hypothesis test whose null hypothesis is of no correlation, as well as the Spearman’s rank correlation weighted by national population.

To further quantify this heterogeneity, we assess the difference in committed damages between the upper and lower quartiles of regions when ranked by present income levels and historical cumulative emissions (using a population weighting to both define the quartiles and estimate the group averages). On average, the quartile of countries with lower income are committed to an income loss that is 8.9 percentage points (or 61%) greater than the upper quartile (Extended Data Fig. 6 ), with a likely range of 3.8–14.7 percentage points across the uncertainty sampling of our damage projections (following the likelihood classification adopted by the IPCC). Similarly, the quartile of countries with lower historical cumulative emissions are committed to an income loss that is 6.9 percentage points (or 40%) greater than the upper quartile, with a likely range of 0.27–12 percentage points. These patterns reemphasize the prevalence of injustice in climate impacts 31 , 32 , 33 in the context of the damages to which the world is already committed by historical emissions and socio-economic inertia.

Contextualizing the magnitude of damages

The magnitude of projected economic damages exceeds previous literature estimates 2 , 3 , arising from several developments made on previous approaches. Our estimates are larger than those of ref.  2 (see first row of Extended Data Table 3 ), primarily because of the facts that sub-national estimates typically show a steeper temperature response (see also refs.  3 , 34 ) and that accounting for other climatic components raises damage estimates (Extended Data Fig. 5 ). However, we note that our empirical approach using first-differenced climate variables is conservative compared with that of ref.  2 in regard to the persistence of climate impacts on growth (see introduction and Methods section ‘Empirical model specification: fixed-effects distributed lag models’), an important determinant of the magnitude of long-term damages 19 , 21 . Using a similar empirical specification to ref.  2 , which assumes infinite persistence while maintaining the rest of our approach (sub-national data and further climate variables), produces considerably larger damages (purple curve of Extended Data Fig. 3 ). Compared with studies that do take the first difference of climate variables 3 , 35 , our estimates are also larger (see second and third rows of Extended Data Table 3 ). The inclusion of further climate variables (Extended Data Fig. 5 ) and a sufficient number of lags to more adequately capture the extent of impact persistence (Extended Data Figs. 1 and 2 ) are the main sources of this difference, as is the use of specifications that capture nonlinearities in the temperature response when compared with ref.  35 . In summary, our estimates develop on previous studies by incorporating the latest data and empirical insights 7 , 8 , as well as in providing a robust empirical lower bound on the persistence of impacts on economic growth, which constitutes a middle ground between the extremes of the growth-versus-levels debate 19 , 21 (Extended Data Fig. 3 ).

Compared with the fraction of variance explained by the empirical models historically (<5%), the projection of reductions in income of 19% may seem large. This arises owing to the fact that projected changes in climatic conditions are much larger than those that were experienced historically, particularly for changes in average temperature (Extended Data Fig. 4 ). As such, any assessment of future climate-change impacts necessarily requires an extrapolation outside the range of the historical data on which the empirical impact models were evaluated. Nevertheless, these models constitute the most state-of-the-art methods for inference of plausibly causal climate impacts based on observed data. Moreover, we take explicit steps to limit out-of-sample extrapolation by capping the moderating variables of the interaction terms at the 95th percentile of the historical distribution (see Methods ). This avoids extrapolating the marginal effects outside what was observed historically. Given the nonlinear response of economic output to annual mean temperature (Extended Data Fig. 1 and Extended Data Table 2 ), this is a conservative choice that limits the magnitude of damages that we project. Furthermore, back-of-the-envelope calculations indicate that the projected damages are consistent with the magnitude and patterns of historical economic development (see Supplementary Discussion Section  5 ).

Missing impacts and spatial spillovers

Despite assessing several climatic components from which economic impacts have recently been identified 3 , 7 , 8 , this assessment of aggregate climate damages should not be considered comprehensive. Important channels such as impacts from heatwaves 31 , sea-level rise 36 , tropical cyclones 37 and tipping points 38 , 39 , as well as non-market damages such as those to ecosystems 40 and human health 41 , are not considered in these estimates. Sea-level rise is unlikely to be feasibly incorporated into empirical assessments such as this because historical sea-level variability is mostly small. Non-market damages are inherently intractable within our estimates of impacts on aggregate monetary output and estimates of these impacts could arguably be considered as extra to those identified here. Recent empirical work suggests that accounting for these channels would probably raise estimates of these committed damages, with larger damages continuing to arise in the global south 31 , 36 , 37 , 38 , 39 , 40 , 41 , 42 .

Moreover, our main empirical analysis does not explicitly evaluate the potential for impacts in local regions to produce effects that ‘spill over’ into other regions. Such effects may further mitigate or amplify the impacts we estimate, for example, if companies relocate production from one affected region to another or if impacts propagate along supply chains. The current literature indicates that trade plays a substantial role in propagating spillover effects 43 , 44 , making their assessment at the sub-national level challenging without available data on sub-national trade dependencies. Studies accounting for only spatially adjacent neighbours indicate that negative impacts in one region induce further negative impacts in neighbouring regions 45 , 46 , 47 , 48 , suggesting that our projected damages are probably conservative by excluding these effects. In Supplementary Fig. 14 , we assess spillovers from neighbouring regions using a spatial-lag model. For simplicity, this analysis excludes temporal lags, focusing only on contemporaneous effects. The results show that accounting for spatial spillovers can amplify the overall magnitude, and also the heterogeneity, of impacts. Consistent with previous literature, this indicates that the overall magnitude (Fig. 1 ) and heterogeneity (Fig. 3 ) of damages that we project in our main specification may be conservative without explicitly accounting for spillovers. We note that further analysis that addresses both spatially and trade-connected spillovers, while also accounting for delayed impacts using temporal lags, would be necessary to adequately address this question fully. These approaches offer fruitful avenues for further research but are beyond the scope of this manuscript, which primarily aims to explore the impacts of different climate conditions and their persistence.

Policy implications

We find that the economic damages resulting from climate change until 2049 are those to which the world economy is already committed and that these greatly outweigh the costs required to mitigate emissions in line with the 2 °C target of the Paris Climate Agreement (Fig. 1 ). This assessment is complementary to formal analyses of the net costs and benefits associated with moving from one emission path to another, which typically find that net benefits of mitigation only emerge in the second half of the century 5 . Our simple comparison of the magnitude of damages and mitigation costs makes clear that this is primarily because damages are indistinguishable across emissions scenarios—that is, committed—until mid-century (Fig. 1 ) and that they are actually already much larger than mitigation costs. For simplicity, and owing to the availability of data, we compare damages to mitigation costs at the global level. Regional estimates of mitigation costs may shed further light on the national incentives for mitigation to which our results already hint, of relevance for international climate policy. Although these damages are committed from a mitigation perspective, adaptation may provide an opportunity to reduce them. Moreover, the strong divergence of damages after mid-century reemphasizes the clear benefits of mitigation from a purely economic perspective, as highlighted in previous studies 1 , 4 , 6 , 24 .

Historical climate data

Historical daily 2-m temperature and precipitation totals (in mm) are obtained for the period 1979–2019 from the W5E5 database. The W5E5 dataset comes from ERA-5, a state-of-the-art reanalysis of historical observations, but has been bias-adjusted by applying version 2.0 of the WATCH Forcing Data to ERA-5 reanalysis data and precipitation data from version 2.3 of the Global Precipitation Climatology Project to better reflect ground-based measurements 49 , 50 , 51 . We obtain these data on a 0.5° × 0.5° grid from the Inter-Sectoral Impact Model Intercomparison Project (ISIMIP) database. Notably, these historical data have been used to bias-adjust future climate projections from CMIP-6 (see the following section), ensuring consistency between the distribution of historical daily weather on which our empirical models were estimated and the climate projections used to estimate future damages. These data are publicly available from the ISIMIP database. See refs.  7 , 8 for robustness tests of the empirical models to the choice of climate data reanalysis products.

Future climate data

Daily 2-m temperature and precipitation totals (in mm) are taken from 21 climate models participating in CMIP-6 under a high (RCP8.5) and a low (RCP2.6) greenhouse gas emission scenario from 2015 to 2100. The data have been bias-adjusted and statistically downscaled to a common half-degree grid to reflect the historical distribution of daily temperature and precipitation of the W5E5 dataset using the trend-preserving method developed by the ISIMIP 50 , 52 . As such, the climate model data reproduce observed climatological patterns exceptionally well (Supplementary Table 5 ). Gridded data are publicly available from the ISIMIP database.

Historical economic data

Historical economic data come from the DOSE database of sub-national economic output 53 . We use a recent revision to the DOSE dataset that provides data across 83 countries, 1,660 sub-national regions with varying temporal coverage from 1960 to 2019. Sub-national units constitute the first administrative division below national, for example, states for the USA and provinces for China. Data come from measures of gross regional product per capita (GRPpc) or income per capita in local currencies, reflecting the values reported in national statistical agencies, yearbooks and, in some cases, academic literature. We follow previous literature 3 , 7 , 8 , 54 and assess real sub-national output per capita by first converting values from local currencies to US dollars to account for diverging national inflationary tendencies and then account for US inflation using a US deflator. Alternatively, one might first account for national inflation and then convert between currencies. Supplementary Fig. 12 demonstrates that our conclusions are consistent when accounting for price changes in the reversed order, although the magnitude of estimated damages varies. See the documentation of the DOSE dataset for further discussion of these choices. Conversions between currencies are conducted using exchange rates from the FRED database of the Federal Reserve Bank of St. Louis 55 and the national deflators from the World Bank 56 .

Future socio-economic data

Baseline gridded gross domestic product (GDP) and population data for the period 2015–2100 are taken from the middle-of-the-road scenario SSP2 (ref.  15 ). Population data have been downscaled to a half-degree grid by the ISIMIP following the methodologies of refs.  57 , 58 , which we then aggregate to the sub-national level of our economic data using the spatial aggregation procedure described below. Because current methodologies for downscaling the GDP of the SSPs use downscaled population to do so, per-capita estimates of GDP with a realistic distribution at the sub-national level are not readily available for the SSPs. We therefore use national-level GDP per capita (GDPpc) projections for all sub-national regions of a given country, assuming homogeneity within countries in terms of baseline GDPpc. Here we use projections that have been updated to account for the impact of the COVID-19 pandemic on the trajectory of future income, while remaining consistent with the long-term development of the SSPs 59 . The choice of baseline SSP alters the magnitude of projected climate damages in monetary terms, but when assessed in terms of percentage change from the baseline, the choice of socio-economic scenario is inconsequential. Gridded SSP population data and national-level GDPpc data are publicly available from the ISIMIP database. Sub-national estimates as used in this study are available in the code and data replication files.

Climate variables

Following recent literature 3 , 7 , 8 , we calculate an array of climate variables for which substantial impacts on macroeconomic output have been identified empirically, supported by further evidence at the micro level for plausible underlying mechanisms. See refs.  7 , 8 for an extensive motivation for the use of these particular climate variables and for detailed empirical tests on the nature and robustness of their effects on economic output. To summarize, these studies have found evidence for independent impacts on economic growth rates from annual average temperature, daily temperature variability, total annual precipitation, the annual number of wet days and extreme daily rainfall. Assessments of daily temperature variability were motivated by evidence of impacts on agricultural output and human health, as well as macroeconomic literature on the impacts of volatility on growth when manifest in different dimensions, such as government spending, exchange rates and even output itself 7 . Assessments of precipitation impacts were motivated by evidence of impacts on agricultural productivity, metropolitan labour outcomes and conflict, as well as damages caused by flash flooding 8 . See Extended Data Table 1 for detailed references to empirical studies of these physical mechanisms. Marked impacts of daily temperature variability, total annual precipitation, the number of wet days and extreme daily rainfall on macroeconomic output were identified robustly across different climate datasets, spatial aggregation schemes, specifications of regional time trends and error-clustering approaches. They were also found to be robust to the consideration of temperature extremes 7 , 8 . Furthermore, these climate variables were identified as having independent effects on economic output 7 , 8 , which we further explain here using Monte Carlo simulations to demonstrate the robustness of the results to concerns of imperfect multicollinearity between climate variables (Supplementary Methods Section  2 ), as well as by using information criteria (Supplementary Table 1 ) to demonstrate that including several lagged climate variables provides a preferable trade-off between optimally describing the data and limiting the possibility of overfitting.

We calculate these variables from the distribution of daily, d , temperature, T x , d , and precipitation, P x , d , at the grid-cell, x , level for both the historical and future climate data. As well as annual mean temperature, \({\bar{T}}_{x,y}\) , and annual total precipitation, P x , y , we calculate annual, y , measures of daily temperature variability, \({\widetilde{T}}_{x,y}\) :

the number of wet days, Pwd x , y :

and extreme daily rainfall:

in which T x , d , m , y is the grid-cell-specific daily temperature in month m and year y , \({\bar{T}}_{x,m,{y}}\) is the year and grid-cell-specific monthly, m , mean temperature, D m and D y the number of days in a given month m or year y , respectively, H the Heaviside step function, 1 mm the threshold used to define wet days and P 99.9 x is the 99.9th percentile of historical (1979–2019) daily precipitation at the grid-cell level. Units of the climate measures are degrees Celsius for annual mean temperature and daily temperature variability, millimetres for total annual precipitation and extreme daily precipitation, and simply the number of days for the annual number of wet days.

We also calculated weighted standard deviations of monthly rainfall totals as also used in ref.  8 but do not include them in our projections as we find that, when accounting for delayed effects, their effect becomes statistically indistinct and is better captured by changes in total annual rainfall.

Spatial aggregation

We aggregate grid-cell-level historical and future climate measures, as well as grid-cell-level future GDPpc and population, to the level of the first administrative unit below national level of the GADM database, using an area-weighting algorithm that estimates the portion of each grid cell falling within an administrative boundary. We use this as our baseline specification following previous findings that the effect of area or population weighting at the sub-national level is negligible 7 , 8 .

Empirical model specification: fixed-effects distributed lag models

Following a wide range of climate econometric literature 16 , 60 , we use panel regression models with a selection of fixed effects and time trends to isolate plausibly exogenous variation with which to maximize confidence in a causal interpretation of the effects of climate on economic growth rates. The use of region fixed effects, μ r , accounts for unobserved time-invariant differences between regions, such as prevailing climatic norms and growth rates owing to historical and geopolitical factors. The use of yearly fixed effects, η y , accounts for regionally invariant annual shocks to the global climate or economy such as the El Niño–Southern Oscillation or global recessions. In our baseline specification, we also include region-specific linear time trends, k r y , to exclude the possibility of spurious correlations resulting from common slow-moving trends in climate and growth.

The persistence of climate impacts on economic growth rates is a key determinant of the long-term magnitude of damages. Methods for inferring the extent of persistence in impacts on growth rates have typically used lagged climate variables to evaluate the presence of delayed effects or catch-up dynamics 2 , 18 . For example, consider starting from a model in which a climate condition, C r , y , (for example, annual mean temperature) affects the growth rate, Δlgrp r , y (the first difference of the logarithm of gross regional product) of region r in year y :

which we refer to as a ‘pure growth effects’ model in the main text. Typically, further lags are included,

and the cumulative effect of all lagged terms is evaluated to assess the extent to which climate impacts on growth rates persist. Following ref.  18 , in the case that,

the implication is that impacts on the growth rate persist up to NL years after the initial shock (possibly to a weaker or a stronger extent), whereas if

then the initial impact on the growth rate is recovered after NL years and the effect is only one on the level of output. However, we note that such approaches are limited by the fact that, when including an insufficient number of lags to detect a recovery of the growth rates, one may find equation ( 6 ) to be satisfied and incorrectly assume that a change in climatic conditions affects the growth rate indefinitely. In practice, given a limited record of historical data, including too few lags to confidently conclude in an infinitely persistent impact on the growth rate is likely, particularly over the long timescales over which future climate damages are often projected 2 , 24 . To avoid this issue, we instead begin our analysis with a model for which the level of output, lgrp r , y , depends on the level of a climate variable, C r , y :

Given the non-stationarity of the level of output, we follow the literature 19 and estimate such an equation in first-differenced form as,

which we refer to as a model of ‘pure level effects’ in the main text. This model constitutes a baseline specification in which a permanent change in the climate variable produces an instantaneous impact on the growth rate and a permanent effect only on the level of output. By including lagged variables in this specification,

we are able to test whether the impacts on the growth rate persist any further than instantaneously by evaluating whether α L  > 0 are statistically significantly different from zero. Even though this framework is also limited by the possibility of including too few lags, the choice of a baseline model specification in which impacts on the growth rate do not persist means that, in the case of including too few lags, the framework reverts to the baseline specification of level effects. As such, this framework is conservative with respect to the persistence of impacts and the magnitude of future damages. It naturally avoids assumptions of infinite persistence and we are able to interpret any persistence that we identify with equation ( 9 ) as a lower bound on the extent of climate impact persistence on growth rates. See the main text for further discussion of this specification choice, in particular about its conservative nature compared with previous literature estimates, such as refs.  2 , 18 .

We allow the response to climatic changes to vary across regions, using interactions of the climate variables with historical average (1979–2019) climatic conditions reflecting heterogenous effects identified in previous work 7 , 8 . Following this previous work, the moderating variables of these interaction terms constitute the historical average of either the variable itself or of the seasonal temperature difference, \({\hat{T}}_{r}\) , or annual mean temperature, \({\bar{T}}_{r}\) , in the case of daily temperature variability 7 and extreme daily rainfall, respectively 8 .

The resulting regression equation with N and M lagged variables, respectively, reads:

in which Δlgrp r , y is the annual, regional GRPpc growth rate, measured as the first difference of the logarithm of real GRPpc, following previous work 2 , 3 , 7 , 8 , 18 , 19 . Fixed-effects regressions were run using the fixest package in R (ref.  61 ).

Estimates of the coefficients of interest α i , L are shown in Extended Data Fig. 1 for N  =  M  = 10 lags and for our preferred choice of the number of lags in Supplementary Figs. 1 – 3 . In Extended Data Fig. 1 , errors are shown clustered at the regional level, but for the construction of damage projections, we block-bootstrap the regressions by region 1,000 times to provide a range of parameter estimates with which to sample the projection uncertainty (following refs.  2 , 31 ).

Spatial-lag model

In Supplementary Fig. 14 , we present the results from a spatial-lag model that explores the potential for climate impacts to ‘spill over’ into spatially neighbouring regions. We measure the distance between centroids of each pair of sub-national regions and construct spatial lags that take the average of the first-differenced climate variables and their interaction terms over neighbouring regions that are at distances of 0–500, 500–1,000, 1,000–1,500 and 1,500–2000 km (spatial lags, ‘SL’, 1 to 4). For simplicity, we then assess a spatial-lag model without temporal lags to assess spatial spillovers of contemporaneous climate impacts. This model takes the form:

in which SL indicates the spatial lag of each climate variable and interaction term. In Supplementary Fig. 14 , we plot the cumulative marginal effect of each climate variable at different baseline climate conditions by summing the coefficients for each climate variable and interaction term, for example, for average temperature impacts as:

These cumulative marginal effects can be regarded as the overall spatially dependent impact to an individual region given a one-unit shock to a climate variable in that region and all neighbouring regions at a given value of the moderating variable of the interaction term.

Constructing projections of economic damage from future climate change

We construct projections of future climate damages by applying the coefficients estimated in equation ( 10 ) and shown in Supplementary Tables 2 – 4 (when including only lags with statistically significant effects in specifications that limit overfitting; see Supplementary Methods Section  1 ) to projections of future climate change from the CMIP-6 models. Year-on-year changes in each primary climate variable of interest are calculated to reflect the year-to-year variations used in the empirical models. 30-year moving averages of the moderating variables of the interaction terms are calculated to reflect the long-term average of climatic conditions that were used for the moderating variables in the empirical models. By using moving averages in the projections, we account for the changing vulnerability to climate shocks based on the evolving long-term conditions (Supplementary Figs. 10 and 11 show that the results are robust to the precise choice of the window of this moving average). Although these climate variables are not differenced, the fact that the bias-adjusted climate models reproduce observed climatological patterns across regions for these moderating variables very accurately (Supplementary Table 6 ) with limited spread across models (<3%) precludes the possibility that any considerable bias or uncertainty is introduced by this methodological choice. However, we impose caps on these moderating variables at the 95th percentile at which they were observed in the historical data to prevent extrapolation of the marginal effects outside the range in which the regressions were estimated. This is a conservative choice that limits the magnitude of our damage projections.

Time series of primary climate variables and moderating climate variables are then combined with estimates of the empirical model parameters to evaluate the regression coefficients in equation ( 10 ), producing a time series of annual GRPpc growth-rate reductions for a given emission scenario, climate model and set of empirical model parameters. The resulting time series of growth-rate impacts reflects those occurring owing to future climate change. By contrast, a future scenario with no climate change would be one in which climate variables do not change (other than with random year-to-year fluctuations) and hence the time-averaged evaluation of equation ( 10 ) would be zero. Our approach therefore implicitly compares the future climate-change scenario to this no-climate-change baseline scenario.

The time series of growth-rate impacts owing to future climate change in region r and year y , δ r , y , are then added to the future baseline growth rates, π r , y (in log-diff form), obtained from the SSP2 scenario to yield trajectories of damaged GRPpc growth rates, ρ r , y . These trajectories are aggregated over time to estimate the future trajectory of GRPpc with future climate impacts:

in which GRPpc r , y =2020 is the initial log level of GRPpc. We begin damage estimates in 2020 to reflect the damages occurring since the end of the period for which we estimate the empirical models (1979–2019) and to match the timing of mitigation-cost estimates from most IAMs (see below).

For each emission scenario, this procedure is repeated 1,000 times while randomly sampling from the selection of climate models, the selection of empirical models with different numbers of lags (shown in Supplementary Figs. 1 – 3 and Supplementary Tables 2 – 4 ) and bootstrapped estimates of the regression parameters. The result is an ensemble of future GRPpc trajectories that reflect uncertainty from both physical climate change and the structural and sampling uncertainty of the empirical models.

Estimates of mitigation costs

We obtain IPCC estimates of the aggregate costs of emission mitigation from the AR6 Scenario Explorer and Database hosted by IIASA 23 . Specifically, we search the AR6 Scenarios Database World v1.1 for IAMs that provided estimates of global GDP and population under both a SSP2 baseline and a SSP2-RCP2.6 scenario to maintain consistency with the socio-economic and emission scenarios of the climate damage projections. We find five IAMs that provide data for these scenarios, namely, MESSAGE-GLOBIOM 1.0, REMIND-MAgPIE 1.5, AIM/GCE 2.0, GCAM 4.2 and WITCH-GLOBIOM 3.1. Of these five IAMs, we use the results only from the first three that passed the IPCC vetting procedure for reproducing historical emission and climate trajectories. We then estimate global mitigation costs as the percentage difference in global per capita GDP between the SSP2 baseline and the SSP2-RCP2.6 emission scenario. In the case of one of these IAMs, estimates of mitigation costs begin in 2020, whereas in the case of two others, mitigation costs begin in 2010. The mitigation cost estimates before 2020 in these two IAMs are mostly negligible, and our choice to begin comparison with damage estimates in 2020 is conservative with respect to the relative weight of climate damages compared with mitigation costs for these two IAMs.

Data availability

Data on economic production and ERA-5 climate data are publicly available at https://doi.org/10.5281/zenodo.4681306 (ref. 62 ) and https://www.ecmwf.int/en/forecasts/datasets/reanalysis-datasets/era5 , respectively. Data on mitigation costs are publicly available at https://data.ene.iiasa.ac.at/ar6/#/downloads . Processed climate and economic data, as well as all other necessary data for reproduction of the results, are available at the public repository https://doi.org/10.5281/zenodo.10562951  (ref. 63 ).

Code availability

All code necessary for reproduction of the results is available at the public repository https://doi.org/10.5281/zenodo.10562951  (ref. 63 ).

Glanemann, N., Willner, S. N. & Levermann, A. Paris Climate Agreement passes the cost-benefit test. Nat. Commun. 11 , 110 (2020).

Article   ADS   CAS   PubMed   PubMed Central   Google Scholar  

Burke, M., Hsiang, S. M. & Miguel, E. Global non-linear effect of temperature on economic production. Nature 527 , 235–239 (2015).

Article   ADS   CAS   PubMed   Google Scholar  

Kalkuhl, M. & Wenz, L. The impact of climate conditions on economic production. Evidence from a global panel of regions. J. Environ. Econ. Manag. 103 , 102360 (2020).

Article   Google Scholar  

Moore, F. C. & Diaz, D. B. Temperature impacts on economic growth warrant stringent mitigation policy. Nat. Clim. Change 5 , 127–131 (2015).

Article   ADS   Google Scholar  

Drouet, L., Bosetti, V. & Tavoni, M. Net economic benefits of well-below 2°C scenarios and associated uncertainties. Oxf. Open Clim. Change 2 , kgac003 (2022).

Ueckerdt, F. et al. The economically optimal warming limit of the planet. Earth Syst. Dyn. 10 , 741–763 (2019).

Kotz, M., Wenz, L., Stechemesser, A., Kalkuhl, M. & Levermann, A. Day-to-day temperature variability reduces economic growth. Nat. Clim. Change 11 , 319–325 (2021).

Kotz, M., Levermann, A. & Wenz, L. The effect of rainfall changes on economic production. Nature 601 , 223–227 (2022).

Kousky, C. Informing climate adaptation: a review of the economic costs of natural disasters. Energy Econ. 46 , 576–592 (2014).

Harlan, S. L. et al. in Climate Change and Society: Sociological Perspectives (eds Dunlap, R. E. & Brulle, R. J.) 127–163 (Oxford Univ. Press, 2015).

Bolton, P. et al. The Green Swan (BIS Books, 2020).

Alogoskoufis, S. et al. ECB Economy-wide Climate Stress Test: Methodology and Results European Central Bank, 2021).

Weber, E. U. What shapes perceptions of climate change? Wiley Interdiscip. Rev. Clim. Change 1 , 332–342 (2010).

Markowitz, E. M. & Shariff, A. F. Climate change and moral judgement. Nat. Clim. Change 2 , 243–247 (2012).

Riahi, K. et al. The shared socioeconomic pathways and their energy, land use, and greenhouse gas emissions implications: an overview. Glob. Environ. Change 42 , 153–168 (2017).

Auffhammer, M., Hsiang, S. M., Schlenker, W. & Sobel, A. Using weather data and climate model output in economic analyses of climate change. Rev. Environ. Econ. Policy 7 , 181–198 (2013).

Kolstad, C. D. & Moore, F. C. Estimating the economic impacts of climate change using weather observations. Rev. Environ. Econ. Policy 14 , 1–24 (2020).

Dell, M., Jones, B. F. & Olken, B. A. Temperature shocks and economic growth: evidence from the last half century. Am. Econ. J. Macroecon. 4 , 66–95 (2012).

Newell, R. G., Prest, B. C. & Sexton, S. E. The GDP-temperature relationship: implications for climate change damages. J. Environ. Econ. Manag. 108 , 102445 (2021).

Kikstra, J. S. et al. The social cost of carbon dioxide under climate-economy feedbacks and temperature variability. Environ. Res. Lett. 16 , 094037 (2021).

Article   ADS   CAS   Google Scholar  

Bastien-Olvera, B. & Moore, F. Persistent effect of temperature on GDP identified from lower frequency temperature variability. Environ. Res. Lett. 17 , 084038 (2022).

Eyring, V. et al. Overview of the Coupled Model Intercomparison Project Phase 6 (CMIP6) experimental design and organization. Geosci. Model Dev. 9 , 1937–1958 (2016).

Byers, E. et al. AR6 scenarios database. Zenodo https://zenodo.org/records/7197970 (2022).

Burke, M., Davis, W. M. & Diffenbaugh, N. S. Large potential reduction in economic damages under UN mitigation targets. Nature 557 , 549–553 (2018).

Kotz, M., Wenz, L. & Levermann, A. Footprint of greenhouse forcing in daily temperature variability. Proc. Natl Acad. Sci. 118 , e2103294118 (2021).

Article   CAS   PubMed   PubMed Central   Google Scholar  

Myhre, G. et al. Frequency of extreme precipitation increases extensively with event rareness under global warming. Sci. Rep. 9 , 16063 (2019).

Min, S.-K., Zhang, X., Zwiers, F. W. & Hegerl, G. C. Human contribution to more-intense precipitation extremes. Nature 470 , 378–381 (2011).

England, M. R., Eisenman, I., Lutsko, N. J. & Wagner, T. J. The recent emergence of Arctic Amplification. Geophys. Res. Lett. 48 , e2021GL094086 (2021).

Fischer, E. M. & Knutti, R. Anthropogenic contribution to global occurrence of heavy-precipitation and high-temperature extremes. Nat. Clim. Change 5 , 560–564 (2015).

Pfahl, S., O’Gorman, P. A. & Fischer, E. M. Understanding the regional pattern of projected future changes in extreme precipitation. Nat. Clim. Change 7 , 423–427 (2017).

Callahan, C. W. & Mankin, J. S. Globally unequal effect of extreme heat on economic growth. Sci. Adv. 8 , eadd3726 (2022).

Diffenbaugh, N. S. & Burke, M. Global warming has increased global economic inequality. Proc. Natl Acad. Sci. 116 , 9808–9813 (2019).

Callahan, C. W. & Mankin, J. S. National attribution of historical climate damages. Clim. Change 172 , 40 (2022).

Burke, M. & Tanutama, V. Climatic constraints on aggregate economic output. National Bureau of Economic Research, Working Paper 25779. https://doi.org/10.3386/w25779 (2019).

Kahn, M. E. et al. Long-term macroeconomic effects of climate change: a cross-country analysis. Energy Econ. 104 , 105624 (2021).

Desmet, K. et al. Evaluating the economic cost of coastal flooding. National Bureau of Economic Research, Working Paper 24918. https://doi.org/10.3386/w24918 (2018).

Hsiang, S. M. & Jina, A. S. The causal effect of environmental catastrophe on long-run economic growth: evidence from 6,700 cyclones. National Bureau of Economic Research, Working Paper 20352. https://doi.org/10.3386/w2035 (2014).

Ritchie, P. D. et al. Shifts in national land use and food production in Great Britain after a climate tipping point. Nat. Food 1 , 76–83 (2020).

Dietz, S., Rising, J., Stoerk, T. & Wagner, G. Economic impacts of tipping points in the climate system. Proc. Natl Acad. Sci. 118 , e2103081118 (2021).

Bastien-Olvera, B. A. & Moore, F. C. Use and non-use value of nature and the social cost of carbon. Nat. Sustain. 4 , 101–108 (2021).

Carleton, T. et al. Valuing the global mortality consequences of climate change accounting for adaptation costs and benefits. Q. J. Econ. 137 , 2037–2105 (2022).

Bastien-Olvera, B. A. et al. Unequal climate impacts on global values of natural capital. Nature 625 , 722–727 (2024).

Malik, A. et al. Impacts of climate change and extreme weather on food supply chains cascade across sectors and regions in Australia. Nat. Food 3 , 631–643 (2022).

Article   ADS   PubMed   Google Scholar  

Kuhla, K., Willner, S. N., Otto, C., Geiger, T. & Levermann, A. Ripple resonance amplifies economic welfare loss from weather extremes. Environ. Res. Lett. 16 , 114010 (2021).

Schleypen, J. R., Mistry, M. N., Saeed, F. & Dasgupta, S. Sharing the burden: quantifying climate change spillovers in the European Union under the Paris Agreement. Spat. Econ. Anal. 17 , 67–82 (2022).

Dasgupta, S., Bosello, F., De Cian, E. & Mistry, M. Global temperature effects on economic activity and equity: a spatial analysis. European Institute on Economics and the Environment, Working Paper 22-1 (2022).

Neal, T. The importance of external weather effects in projecting the macroeconomic impacts of climate change. UNSW Economics Working Paper 2023-09 (2023).

Deryugina, T. & Hsiang, S. M. Does the environment still matter? Daily temperature and income in the United States. National Bureau of Economic Research, Working Paper 20750. https://doi.org/10.3386/w20750 (2014).

Hersbach, H. et al. The ERA5 global reanalysis. Q. J. R. Meteorol. Soc. 146 , 1999–2049 (2020).

Cucchi, M. et al. WFDE5: bias-adjusted ERA5 reanalysis data for impact studies. Earth Syst. Sci. Data 12 , 2097–2120 (2020).

Adler, R. et al. The New Version 2.3 of the Global Precipitation Climatology Project (GPCP) Monthly Analysis Product 1072–1084 (University of Maryland, 2016).

Lange, S. Trend-preserving bias adjustment and statistical downscaling with ISIMIP3BASD (v1.0). Geosci. Model Dev. 12 , 3055–3070 (2019).

Wenz, L., Carr, R. D., Kögel, N., Kotz, M. & Kalkuhl, M. DOSE – global data set of reported sub-national economic output. Sci. Data 10 , 425 (2023).

Article   PubMed   PubMed Central   Google Scholar  

Gennaioli, N., La Porta, R., Lopez De Silanes, F. & Shleifer, A. Growth in regions. J. Econ. Growth 19 , 259–309 (2014).

Board of Governors of the Federal Reserve System (US). U.S. dollars to euro spot exchange rate. https://fred.stlouisfed.org/series/AEXUSEU (2022).

World Bank. GDP deflator. https://data.worldbank.org/indicator/NY.GDP.DEFL.ZS (2022).

Jones, B. & O’Neill, B. C. Spatially explicit global population scenarios consistent with the Shared Socioeconomic Pathways. Environ. Res. Lett. 11 , 084003 (2016).

Murakami, D. & Yamagata, Y. Estimation of gridded population and GDP scenarios with spatially explicit statistical downscaling. Sustainability 11 , 2106 (2019).

Koch, J. & Leimbach, M. Update of SSP GDP projections: capturing recent changes in national accounting, PPP conversion and Covid 19 impacts. Ecol. Econ. 206 (2023).

Carleton, T. A. & Hsiang, S. M. Social and economic impacts of climate. Science 353 , aad9837 (2016).

Article   PubMed   Google Scholar  

Bergé, L. Efficient estimation of maximum likelihood models with multiple fixed-effects: the R package FENmlm. DEM Discussion Paper Series 18-13 (2018).

Kalkuhl, M., Kotz, M. & Wenz, L. DOSE - The MCC-PIK Database Of Subnational Economic output. Zenodo https://zenodo.org/doi/10.5281/zenodo.4681305 (2021).

Kotz, M., Wenz, L. & Levermann, A. Data and code for “The economic commitment of climate change”. Zenodo https://zenodo.org/doi/10.5281/zenodo.10562951 (2024).

Dasgupta, S. et al. Effects of climate change on combined labour productivity and supply: an empirical, multi-model study. Lancet Planet. Health 5 , e455–e465 (2021).

Lobell, D. B. et al. The critical role of extreme heat for maize production in the United States. Nat. Clim. Change 3 , 497–501 (2013).

Zhao, C. et al. Temperature increase reduces global yields of major crops in four independent estimates. Proc. Natl Acad. Sci. 114 , 9326–9331 (2017).

Wheeler, T. R., Craufurd, P. Q., Ellis, R. H., Porter, J. R. & Prasad, P. V. Temperature variability and the yield of annual crops. Agric. Ecosyst. Environ. 82 , 159–167 (2000).

Rowhani, P., Lobell, D. B., Linderman, M. & Ramankutty, N. Climate variability and crop production in Tanzania. Agric. For. Meteorol. 151 , 449–460 (2011).

Ceglar, A., Toreti, A., Lecerf, R., Van der Velde, M. & Dentener, F. Impact of meteorological drivers on regional inter-annual crop yield variability in France. Agric. For. Meteorol. 216 , 58–67 (2016).

Shi, L., Kloog, I., Zanobetti, A., Liu, P. & Schwartz, J. D. Impacts of temperature and its variability on mortality in New England. Nat. Clim. Change 5 , 988–991 (2015).

Xue, T., Zhu, T., Zheng, Y. & Zhang, Q. Declines in mental health associated with air pollution and temperature variability in China. Nat. Commun. 10 , 2165 (2019).

Article   ADS   PubMed   PubMed Central   Google Scholar  

Liang, X.-Z. et al. Determining climate effects on US total agricultural productivity. Proc. Natl Acad. Sci. 114 , E2285–E2292 (2017).

Desbureaux, S. & Rodella, A.-S. Drought in the city: the economic impact of water scarcity in Latin American metropolitan areas. World Dev. 114 , 13–27 (2019).

Damania, R. The economics of water scarcity and variability. Oxf. Rev. Econ. Policy 36 , 24–44 (2020).

Davenport, F. V., Burke, M. & Diffenbaugh, N. S. Contribution of historical precipitation change to US flood damages. Proc. Natl Acad. Sci. 118 , e2017524118 (2021).

Dave, R., Subramanian, S. S. & Bhatia, U. Extreme precipitation induced concurrent events trigger prolonged disruptions in regional road networks. Environ. Res. Lett. 16 , 104050 (2021).

Download references


We gratefully acknowledge financing from the Volkswagen Foundation and the Deutsche Gesellschaft für Internationale Zusammenarbeit (GIZ) GmbH on behalf of the Government of the Federal Republic of Germany and Federal Ministry for Economic Cooperation and Development (BMZ).

Open access funding provided by Potsdam-Institut für Klimafolgenforschung (PIK) e.V.

Author information

Authors and affiliations.

Research Domain IV, Research Domain IV, Potsdam Institute for Climate Impact Research, Potsdam, Germany

Maximilian Kotz, Anders Levermann & Leonie Wenz

Institute of Physics, Potsdam University, Potsdam, Germany

Maximilian Kotz & Anders Levermann

Mercator Research Institute on Global Commons and Climate Change, Berlin, Germany

Leonie Wenz

You can also search for this author in PubMed   Google Scholar


All authors contributed to the design of the analysis. M.K. conducted the analysis and produced the figures. All authors contributed to the interpretation and presentation of the results. M.K. and L.W. wrote the manuscript.

Corresponding author

Correspondence to Leonie Wenz .

Ethics declarations

Competing interests.

The authors declare no competing interests.

Peer review

Peer review information.

Nature thanks Xin-Zhong Liang, Chad Thackeray and the other, anonymous, reviewer(s) for their contribution to the peer review of this work. Peer reviewer reports are available.

Additional information

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Extended data figures and tables

Extended data fig. 1 constraining the persistence of historical climate impacts on economic growth rates..

The results of a panel-based fixed-effects distributed lag model for the effects of annual mean temperature ( a ), daily temperature variability ( b ), total annual precipitation ( c ), the number of wet days ( d ) and extreme daily precipitation ( e ) on sub-national economic growth rates. Point estimates show the effects of a 1 °C or one standard deviation increase (for temperature and precipitation variables, respectively) at the lower quartile, median and upper quartile of the relevant moderating variable (green, orange and purple, respectively) at different lagged periods after the initial shock (note that these are not cumulative effects). Climate variables are used in their first-differenced form (see main text for discussion) and the moderating climate variables are the annual mean temperature, seasonal temperature difference, total annual precipitation, number of wet days and annual mean temperature, respectively, in panels a – e (see Methods for further discussion). Error bars show the 95% confidence intervals having clustered standard errors by region. The within-region R 2 , Bayesian and Akaike information criteria for the model are shown at the top of the figure. This figure shows results with ten lags for each variable to demonstrate the observed levels of persistence, but our preferred specifications remove later lags based on the statistical significance of terms shown above and the information criteria shown in Extended Data Fig. 2 . The resulting models without later lags are shown in Supplementary Figs. 1 – 3 .

Extended Data Fig. 2 Incremental lag-selection procedure using information criteria and within-region R 2 .

Starting from a panel-based fixed-effects distributed lag model estimating the effects of climate on economic growth using the real historical data (as in equation ( 4 )) with ten lags for all climate variables (as shown in Extended Data Fig. 1 ), lags are incrementally removed for one climate variable at a time. The resulting Bayesian and Akaike information criteria are shown in a – e and f – j , respectively, and the within-region R 2 and number of observations in k – o and p – t , respectively. Different rows show the results when removing lags from different climate variables, ordered from top to bottom as annual mean temperature, daily temperature variability, total annual precipitation, the number of wet days and extreme annual precipitation. Information criteria show minima at approximately four lags for precipitation variables and ten to eight for temperature variables, indicating that including these numbers of lags does not lead to overfitting. See Supplementary Table 1 for an assessment using information criteria to determine whether including further climate variables causes overfitting.

Extended Data Fig. 3 Damages in our preferred specification that provides a robust lower bound on the persistence of climate impacts on economic growth versus damages in specifications of pure growth or pure level effects.

Estimates of future damages as shown in Fig. 1 but under the emission scenario RCP8.5 for three separate empirical specifications: in orange our preferred specification, which provides an empirical lower bound on the persistence of climate impacts on economic growth rates while avoiding assumptions of infinite persistence (see main text for further discussion); in purple a specification of ‘pure growth effects’ in which the first difference of climate variables is not taken and no lagged climate variables are included (the baseline specification of ref.  2 ); and in pink a specification of ‘pure level effects’ in which the first difference of climate variables is taken but no lagged terms are included.

Extended Data Fig. 4 Climate changes in different variables as a function of historical interannual variability.

Changes in each climate variable of interest from 1979–2019 to 2035–2065 under the high-emission scenario SSP5-RCP8.5, expressed as a percentage of the historical variability of each measure. Historical variability is estimated as the standard deviation of each detrended climate variable over the period 1979–2019 during which the empirical models were identified (detrending is appropriate because of the inclusion of region-specific linear time trends in the empirical models). See Supplementary Fig. 13 for changes expressed in standard units. Data on national administrative boundaries are obtained from the GADM database version 3.6 and are freely available for academic use ( https://gadm.org/ ).

Extended Data Fig. 5 Contribution of different climate variables to overall committed damages.

a , Climate damages in 2049 when using empirical models that account for all climate variables, changes in annual mean temperature only or changes in both annual mean temperature and one other climate variable (daily temperature variability, total annual precipitation, the number of wet days and extreme daily precipitation, respectively). b , The cumulative marginal effects of an increase in annual mean temperature of 1 °C, at different baseline temperatures, estimated from empirical models including all climate variables or annual mean temperature only. Estimates and uncertainty bars represent the median and 95% confidence intervals obtained from 1,000 block-bootstrap resamples from each of three different empirical models using eight, nine or ten lags of temperature terms.

Extended Data Fig. 6 The difference in committed damages between the upper and lower quartiles of countries when ranked by GDP and cumulative historical emissions.

Quartiles are defined using a population weighting, as are the average committed damages across each quartile group. The violin plots indicate the distribution of differences between quartiles across the two extreme emission scenarios (RCP2.6 and RCP8.5) and the uncertainty sampling procedure outlined in Methods , which accounts for uncertainty arising from the choice of lags in the empirical models, uncertainty in the empirical model parameter estimates, as well as the climate model projections. Bars indicate the median, as well as the 10th and 90th percentiles and upper and lower sixths of the distribution reflecting the very likely and likely ranges following the likelihood classification adopted by the IPCC.

Supplementary information

Supplementary information, peer review file, rights and permissions.

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/ .

Reprints and permissions

About this article

Cite this article.

Kotz, M., Levermann, A. & Wenz, L. The economic commitment of climate change. Nature 628 , 551–557 (2024). https://doi.org/10.1038/s41586-024-07219-0

Download citation

Received : 25 January 2023

Accepted : 21 February 2024

Published : 17 April 2024

Issue Date : 18 April 2024

DOI : https://doi.org/10.1038/s41586-024-07219-0

Share this article

Anyone you share the following link with will be able to read this content:

Sorry, a shareable link is not currently available for this article.

Provided by the Springer Nature SharedIt content-sharing initiative

By submitting a comment you agree to abide by our Terms and Community Guidelines . If you find something abusive or that does not comply with our terms or guidelines please flag it as inappropriate.

Quick links

  • Explore articles by subject
  • Guide to authors
  • Editorial policies

Sign up for the Nature Briefing newsletter — what matters in science, free to your inbox daily.

case study with data interpretation

The independent source for health policy research, polling, and news.

A New Use for Wegovy Opens the Door to Medicare Coverage for Millions of People with Obesity

Juliette Cubanski , Tricia Neuman , Nolan Sroczynski , and Anthony Damico Published: Apr 24, 2024

The FDA recently approved a new use for Wegovy (semaglutide), the blockbuster anti-obesity drug, to reduce the risk of heart attacks and stroke in people with cardiovascular disease who are overweight or obese. Wegovy belongs to a class of medications called GLP-1 (glucagon-like peptide-1) agonists that were initially approved to treat type 2 diabetes but are also highly effective anti-obesity drugs. The new FDA-approved indication for Wegovy paves the way for Medicare coverage of this drug and broader coverage by other insurers. Medicare is currently prohibited by law from covering Wegovy and other medications when used specifically for obesity. However, semaglutide is covered by Medicare as a treatment for diabetes, branded as Ozempic.

What does the FDA’s decision mean for Medicare coverage of Wegovy?

The FDA’s decision opens the door to Medicare coverage of Wegovy, which was first approved by the FDA as an anti-obesity medication. Soon after the FDA’s approval of the new use for Wegovy, the Centers for Medicare & Medicaid Services (CMS) issued a memo indicating that Medicare Part D plans can add Wegovy to their formularies now that it has a medically-accepted indication that is not specifically excluded from Medicare coverage . Because Wegovy is a self-administered injectable drug, coverage will be provided under Part D , Medicare’s outpatient drug benefit offered by private stand-alone drug plans and Medicare Advantage plans, not Part B, which covers physician-administered drugs.

How many Medicare beneficiaries could be eligible for coverage of Wegovy for its new use?

Figure 1: An Estimated 1 in 4 Medicare Beneficiaries With Obesity or Overweight Could Be Eligible for Medicare Part D Coverage of Wegovy to Reduce the Risk of Serious Heart Problems

Of these 3.6 million beneficiaries, 1.9 million also had diabetes (other than Type 1) and may already have been eligible for Medicare coverage of GLP-1s as diabetes treatments prior to the FDA’s approval of the new use of Wegovy.

Not all people who are eligible based on the new indication are likely to take Wegovy, however. Some might be dissuaded by the potential side effects and adverse reactions . Out-of-pocket costs could also be a barrier. Based on the list price of $1,300 per month (not including rebates or other discounts negotiated by pharmacy benefit managers), Wegovy could be covered as a specialty tier drug, where Part D plans are allowed to charge coinsurance of 25% to 33%. Because coinsurance amounts are pegged to the list price, Medicare beneficiaries required to pay coinsurance could face monthly costs of $325 to $430 before they reach the new cap on annual out-of-pocket drug spending established by the Inflation Reduction Act – around $3,300 in 2024, based on brand drugs only, and $2,000 in 2025. But even paying $2,000 out of pocket would still be beyond the reach of many people with Medicare who live on modest incomes . Ultimately, how much beneficiaries pay out of pocket will depend on Part D plan coverage and formulary tier placement of Wegovy.

Further, some people may have difficulty accessing Wegovy if Part D plans apply prior authorization and step therapy tools to manage costs and ensure appropriate use. These factors could have a dampening effect on use by Medicare beneficiaries, even among the target population.

When will Medicare Part D plans begin covering Wegovy?

Some Part D plans have already announced that they will begin covering Wegovy this year, although it is not yet clear how widespread coverage will be in 2024. While Medicare drug plans can add new drugs to their formularies during the year to reflect new approvals and expanded indications, plans are not required to cover every new drug that comes to market. Part D plans are required to cover at least two drugs in each category or class and all or substantially all drugs in six protected classes . However, facing a relatively high price and potentially large patient population for Wegovy, many Part D plans might be reluctant to expand coverage now, since they can’t adjust their premiums mid-year to account for higher costs associated with use of this drug. So, broader coverage in 2025 could be more likely.

How might expanded coverage of Wegovy affect Medicare spending?

The impact on Medicare spending associated with expanded coverage of Wegovy will depend in part on how many Part D plans add coverage for it and the extent to which plans apply restrictions on use like prior authorization; how many people who qualify to take the drug use it; and negotiated prices paid by plans. For example, if plans receive a 50% rebate on the list price of $1,300 per month (or $15,600 per year), that could mean annual net costs per person around $7,800. If 10% of the target population (an estimated 360,000 people) uses Wegovy for a full year, that would amount to additional net Medicare Part D spending of $2.8 billion for one year for this one drug alone.

It’s possible that Medicare could select semaglutide for drug price negotiation as early as 2025, based on the earliest FDA approval of Ozempic in late 2017 . For small-molecule drugs like semaglutide, at least seven years must have passed from its FDA approval date to be eligible for selection, and for drugs with multiple FDA approvals, CMS will use the earliest approval date to make this determination. If semaglutide is selected for negotiation next year, a negotiated price would be available beginning in 2027. This could help to lower Medicare and out-of-pocket spending on semaglutide products, including Wegovy as well as Ozempic and Rybelsus, the oral formulation approved for type 2 diabetes. As of 2022, gross Medicare spending on Ozempic alone placed it sixth among the 10 top-selling drugs in Medicare Part D, with annual gross spending of $4.6 billion, based on KFF analysis . This estimate does not include rebates, which Medicare’s actuaries estimated to be  31.5% overall in 2022  but could be as high as  69%  for Ozempic, according to one estimate.

What does this mean for Medicare coverage of anti-obesity drugs?

For now, use of GLP-1s specifically for obesity continues to be excluded from Medicare coverage by law. But the FDA’s decision signals a turning point for broader Medicare coverage of GLP-1s since Wegovy can now be used to reduce the risk of heart attack and stroke by people with cardiovascular disease and obesity or overweight, and not only as an anti-obesity drug. And more pathways to Medicare coverage could open up if these drugs gain FDA approval for other uses . For example, Eli Lilly has just reported clinical trial results showing the benefits of its GLP-1, Zepbound (tirzepatide), in reducing the occurrence of sleep apnea events among people with obesity or overweight. Lilly reportedly plans to seek FDA approval for this use and if approved, the drug would be the first pharmaceutical treatment on the market for sleep apnea.

If more Medicare beneficiaries with obesity or overweight gain access to GLP-1s based on other approved uses for these medications, that could reduce the cost of proposed legislation to lift the statutory prohibition on Medicare coverage of anti-obesity drugs. This is because the Congressional Budget Office (CBO), Congress’s official scorekeeper for proposed legislation, would incorporate the cost of coverage for these other uses into its baseline estimates for Medicare spending, which means that the incremental cost of changing the law to allow Medicare coverage for anti-obesity drugs would be lower than it would be without FDA’s approval of these drugs for other uses. Ultimately how widely Medicare Part D coverage of GLP-1s expands could have far-reaching effects on people with obesity and on Medicare spending.

  • Medicare Part D
  • Chronic Diseases
  • Heart Disease
  • Medicare Advantage

news release

  • An Estimated 1 in 4 Medicare Beneficiaries With Obesity or Overweight Could Be Eligible for Medicare Coverage of Wegovy, an Anti-Obesity Drug, to Reduce Heart Risk

Also of Interest

  • An Overview of the Medicare Part D Prescription Drug Benefit
  • FAQs about the Inflation Reduction Act’s Medicare Drug Price Negotiation Program
  • What Could New Anti-Obesity Drugs Mean for Medicare?
  • Medicare Spending on Ozempic and Other GLP-1s Is Skyrocketing

U.S. flag

An official website of the United States government, Department of Justice.

Here's how you know

Official websites use .gov A .gov website belongs to an official government organization in the United States.

Secure .gov websites use HTTPS A lock ( Lock A locked padlock ) or https:// means you’ve safely connected to the .gov website. Share sensitive information only on official, secure websites.

Interpretation of Audio Forensic Information from the Shooting of Journalist Shireen Abu Akleh

This study discusses the interpretation of audio forensic information based on a case study of the shooting of journalist Shireen Abu Akleh.

In this paper, researchers describe the acoustic evidence from the shooting of journalist Shireen Abu Akleh to provide a forensic estimate of the distance between the firearm and the recording microphones. The evidence includes the estimates of the various geometric and physical parameters and the likely range of uncertainty of those measurements. User generated recordings (UGRs) are increasingly presented as evidence in forensic investigations due to the widespread use of handheld cameras, smartphones, and other portable field recording devices. A case study comes from the shooting death of well-known Al Jazeera television correspondent Shireen Abu Akleh on May 11, 2022. Ms. Abu Akleh was killed by a gunshot while reporting from the West Bank city of Jenin during a clash between Israeli Defense Forces and armed Palestinian militants. The fatal gunshot was not captured on video, but the microphones of at least two cameras at the scene recorded the sound of multiple gunshots. (Published Abstract Provided)

Additional Details

Related topics, similar publications.

  • Understanding the Trafficking of Children for the Purposes of Labor in the United States
  • The Effect of Occupational Status on Health: Putting the Social in Socioeconomic Status
  • A State-wide Analysis of Characteristics and Predictors of Dual System Involvement Among Child Victims of Human Trafficking

Numbers, Facts and Trends Shaping Your World

Read our research on:

Full Topic List

Regions & Countries

  • Publications
  • Our Methods
  • Short Reads
  • Tools & Resources

Read Our Research On:

Fewer than 1% of federal criminal defendants were acquitted in 2022

Former President Donald Trump pleaded not guilty this week to federal criminal charges related to his alleged mishandling of classified documents after his departure from the White House in 2021. The unprecedented charges against Trump and his subsequent plea raise the question: How common is it for defendants in federal criminal cases to plead not guilty, go to trial and ultimately be acquitted?

The U.S. Justice Department’s indictment of former President Donald Trump, and his subsequent plea of not guilty, prompted Pew Research Center to examine how many defendants in federal criminal cases are acquitted in a typical year. The analysis builds on an earlier Center analysis that examined trial and acquittal rates in federal and state courts.

All statistics cited in this analysis come from the Judicial Business 2022 report by the Administrative Office of the U.S. Courts. Information about the total number of defendants in federal criminal cases in the United States, as well as how their cases ended, is drawn from Table D-4 . Information about defendants in the Southern District of Florida is drawn from Table D-7 and Table D-9 .

The statistics in this analysis include all defendants charged in U.S. district courts with felonies and serious misdemeanors, as well as some defendants charged with petty offenses. They do not include federal defendants whose cases were handled by magistrate judges or the much broader universe of defendants in state courts. Defendants who enter pleas of “no contest,” in which they accept criminal punishment but do not admit guilt, are also excluded.

This analysis is based on the 2022 federal fiscal year, which began Oct. 1, 2021, and ended Sept. 30, 2022.

In fiscal year 2022, only 290 of 71,954 defendants in federal criminal cases – about 0.4% – went to trial and were acquitted, according to a Pew Research Center analysis of the latest available statistics from the federal judiciary . Another 1,379 went to trial and were found guilty (1.9%).

A chart that shows trials are rare in the federal criminal justice system, and acquittals are even rarer.

The overwhelming majority of defendants in federal criminal cases that year did not go to trial at all. About nine-in-ten (89.5%) pleaded guilty, while another 8.2% had their case dismissed at some point in the judicial process, according to the data from the Administrative Office of the U.S. Courts.

These statistics include all defendants charged in U.S. district courts with felonies and serious misdemeanors, as well as some defendants charged with petty offenses. They do not include federal defendants whose cases were handled by magistrate judges or the much broader universe of defendants in state courts. Defendants who entered pleas of “no contest,” in which they accept criminal punishment but do not admit guilt, are also excluded. The 2022 federal fiscal year began Oct. 1, 2021, and ended Sept. 30, 2022.

The U.S. Justice Department indicted Trump earlier this month on 37 counts relating to seven criminal charges : willful retention of national defense information, conspiracy to obstruct justice, withholding a document or record, corruptly concealing a document or record, concealing a document in a federal investigation, scheme to conceal, and false statements and representations.

Trump’s case is being heard in the U.S. District Court for the Southern District of Florida, where acquittal rates look similar to the national average. In fiscal 2022, only 12 of 1,944 total defendants in the Southern District of Florida – about 0.6% – were acquitted at trial. As was the case nationally, the vast majority of defendants in Florida’s Southern District (86.2%) pleaded guilty that year, while 10.7% had their cases dismissed.

It’s not clear from the federal judiciary’s statistics how many other defendants nationally or in the Southern District of Florida faced the same or similar charges that Trump is facing or how those cases ended.

Broadly speaking, however, the charges against Trump are rare . In fiscal 2022, more than eight-in-ten federal criminal defendants in the United States faced charges related to one of four other broad categories of crime: drug offenses (31%), immigration offenses (25%), firearms and explosives offenses (16%) or property offenses (11%). In Florida’s Southern District, too, more than eight-in-ten defendants faced charges related to these four categories.

Trump, of course, is not a typical federal defendant. He is the first former president ever to face federal criminal charges and is running for president again in 2024. The federal case against Trump is still in its early stages, and it’s unclear when – or whether – it will proceed to trial.

  • Criminal Justice
  • Donald Trump
  • Federal Government

John Gramlich's photo

John Gramlich is an associate director at Pew Research Center

What the data says about crime in the U.S.

8 facts about black lives matter, #blacklivesmatter turns 10, support for the black lives matter movement has dropped considerably from its peak in 2020, before release of video showing tyre nichols’ beating, public views of police conduct had improved modestly, most popular.

1615 L St. NW, Suite 800 Washington, DC 20036 USA (+1) 202-419-4300 | Main (+1) 202-857-8562 | Fax (+1) 202-419-4372 |  Media Inquiries

Research Topics

  • Age & Generations
  • Coronavirus (COVID-19)
  • Economy & Work
  • Family & Relationships
  • Gender & LGBTQ
  • Immigration & Migration
  • International Affairs
  • Internet & Technology
  • Methodological Research
  • News Habits & Media
  • Non-U.S. Governments
  • Other Topics
  • Politics & Policy
  • Race & Ethnicity
  • Email Newsletters

ABOUT PEW RESEARCH CENTER  Pew Research Center is a nonpartisan fact tank that informs the public about the issues, attitudes and trends shaping the world. It conducts public opinion polling, demographic research, media content analysis and other empirical social science research. Pew Research Center does not take policy positions. It is a subsidiary of  The Pew Charitable Trusts .

Copyright 2024 Pew Research Center

Terms & Conditions

Privacy Policy

Cookie Settings

Reprints, Permissions & Use Policy

Log in using your username and password

  • Search More Search for this keyword Advanced search
  • Latest content
  • Current issue
  • Hosted content
  • BMJ Journals More You are viewing from: Google Indexer

You are here

  • Online First
  • Inappropriate use of proton pump inhibitors in clinical practice globally: a systematic review and meta-analysis
  • Article Text
  • Article info
  • Citation Tools
  • Rapid Responses
  • Article metrics

Download PDF

  • http://orcid.org/0000-0002-5111-7861 Amit K Dutta 1 ,
  • http://orcid.org/0000-0003-2472-3409 Vishal Sharma 2 ,
  • Abhinav Jain 3 ,
  • Anshuman Elhence 4 ,
  • Manas K Panigrahi 5 ,
  • Srikant Mohta 6 ,
  • Richard Kirubakaran 7 ,
  • Mathew Philip 8 ,
  • http://orcid.org/0000-0003-1700-7543 Mahesh Goenka 9 ,
  • Shobna Bhatia 10 ,
  • http://orcid.org/0000-0002-9435-3557 Usha Dutta 2 ,
  • D Nageshwar Reddy 11 ,
  • Rakesh Kochhar 12 ,
  • http://orcid.org/0000-0002-1305-189X Govind K Makharia 4
  • 1 Gastroenterology , Christian Medical College and Hospital Vellore , Vellore , India
  • 2 Gastroenterology , Post Graduate Institute of Medical Education and Research , Chandigarh , India
  • 3 Gastroenterology , Gastro 1 Hospital , Ahmedabad , India
  • 4 Gastroenterology and Human Nutrition , All India Institute of Medical Sciences , New Delhi , India
  • 5 Gastroenterology , All India Institute of Medical Sciences - Bhubaneswar , Bhubaneswar , India
  • 6 Department of Gastroenterology , Narayana Superspeciality Hospital , Kolkata , India
  • 7 Center of Biostatistics and Evidence Based Medicine , Vellore , India
  • 8 Lisie Hospital , Cochin , India
  • 9 Apollo Gleneagles Hospital , Kolkata , India
  • 10 Gastroenterology , National Institute of Medical Science , Jaipur , India
  • 11 Asian Institute of Gastroenterology , Hyderabad , India
  • 12 Gastroenterology , Paras Hospitals, Panchkula , Chandigarh , India
  • Correspondence to Dr Amit K Dutta, Gastroenterology, Christian Medical College and Hospital Vellore, Vellore, Tamil Nadu, India; akdutta1995{at}gmail.com


Statistics from Altmetric.com

Request permissions.

If you wish to reuse any or all of this article please use the link below which will take you to the Copyright Clearance Center’s RightsLink service. You will be able to get a quick price and instant permission to reuse the content in many different ways.


We read with interest the population-based cohort studies by Abrahami et al on proton pump inhibitors (PPI) and the risk of gastric and colon cancers. 1 2 PPI are used at all levels of healthcare and across different subspecialties for various indications. 3 4 A recent systematic review on the global trends and practices of PPI recognised 28 million PPI users from 23 countries, suggesting that 23.4% of the adults were using PPI. 5 Inappropriate use of PPI appears to be frequent, although there is a lack of compiled information on the prevalence of inappropriate overuse of PPI. Hence, we conducted a systematic review and meta-analysis on the inappropriate overuse of PPI globally.

Supplemental material

Overall, 79 studies, including 20 050 patients, reported on the inappropriate overuse of PPI and were included in this meta-analysis. The pooled proportion of inappropriate overuse of PPI was 0.60 (95% CI 0.55 to 0.65, I 2 97%, figure 1 ). The proportion of inappropriate overuse by dose was 0.17 (0.08 to 0.33) and by duration of use was 0.17 (0.07 to 0.35). Subgroup analysis was done to assess for heterogeneity ( figure 2A ). No significant differences in the pooled proportion of inappropriate overuse were noted based on the study design, setting (inpatient or outpatient), data source, human development index of the country, indication for use, sample size estimation, year of publication and study quality. However, regional differences were noted (p<0.01): Australia—40%, North America—56%, Europe—61%, Asia—62% and Africa—91% ( figure 2B ). The quality of studies was good in 27.8%, fair in 62.03% and low in 10.12%. 6

  • Download figure
  • Open in new tab
  • Download powerpoint

Forest plot showing inappropriate overuse of proton pump inhibitors.

(A) Subgroup analysis of inappropriate overuse of proton pump inhibitors (PPI). (B) Prevalence of inappropriate overuse of PPI across different countries of the world. NA, data not available.

This is the first systematic review and meta-analysis on global prescribing inappropriateness of PPI. The results of this meta-analysis are concerning and suggest that about 60% of PPI prescriptions in clinical practice do not have a valid indication. The overuse of PPI appears to be a global problem and across all age groups including geriatric subjects (63%). Overprescription increases the patient’s cost, pill burden and risk of adverse effects. 7–9 The heterogeneity in the outcome data persisted after subgroup analysis. Hence, this may be inherent to the practice of PPI use rather than related to factors such as study design, setting or study quality.

Several factors (both physician and patient-related) may contribute to the high magnitude of PPI overuse. These include a long list of indications for use, availability of the drug ‘over the counter’, an exaggerated sense of safety, and lack of awareness about the correct indications, dose and duration of therapy. A recently published guideline makes detailed recommendations on the accepted indications for the use of PPI, including the dose and duration, and further such documents may help to promote its rational use. 3 Overall, there is a need for urgent adoption of PPI stewardship practices, as is done for antibiotics. Apart from avoiding prescription when there is no indication, effective deprescription strategies are also required. 10 We hope the result of the present systematic review and meta-analysis will create awareness about the current situation and translate into a change in clinical practice globally.

Ethics statements

Patient consent for publication.

Not applicable.

Ethics approval

  • Abrahami D ,
  • McDonald EG ,
  • Schnitzer ME , et al
  • Jearth V , et al
  • Malfertheiner P ,
  • Megraud F ,
  • Rokkas T , et al
  • Shanika LGT ,
  • Reynolds A ,
  • Pattison S , et al
  • O’Connell D , et al
  • Choudhury A ,
  • Gillis KA ,
  • Lees JS , et al
  • Paynter S , et al
  • Targownik LE ,
  • Fisher DA ,

X @drvishal82

Contributors AKD: concept, study design, data acquisition and interpretation, drafting the manuscript and approval of the manuscript. VS: study design, data acquisition, analysis and interpretation, drafting the manuscript and approval of the manuscript. AJ, AE, MKP, SM: data acquisition and interpretation, critical revision of the manuscript, and approval of the manuscript. RK: study design, data analysis and interpretation, critical revision of the manuscript and approval of the manuscript. MP, MG, SB, UD, DNR, RK: data interpretation, critical revision of the manuscript and approval of the manuscript. GKM: concept, study design, data interpretation, drafting the manuscript, critical revision and approval of the manuscript.

Funding The authors have not declared a specific grant for this research from any funding agency in the public, commercial or not-for-profit sectors.

Competing interests None declared.

Provenance and peer review Not commissioned; internally peer reviewed.

Supplemental material This content has been supplied by the author(s). It has not been vetted by BMJ Publishing Group Limited (BMJ) and may not have been peer-reviewed. Any opinions or recommendations discussed are solely those of the author(s) and are not endorsed by BMJ. BMJ disclaims all liability and responsibility arising from any reliance placed on the content. Where the content includes any translated material, BMJ does not warrant the accuracy and reliability of the translations (including but not limited to local regulations, clinical guidelines, terminology, drug names and drug dosages), and is not responsible for any error and/or omissions arising from translation and adaptation or otherwise.

Read the full text or download the PDF:


  1. 49 Free Case Study Templates ( + Case Study Format Examples + )

    case study with data interpretation

  2. case study data interpretation

    case study with data interpretation

  3. How to analysis a case study

    case study with data interpretation

  4. How to Customize a Case Study Infographic With Animated Data

    case study with data interpretation

  5. (PDF) Conceptualizing Big Data: Analysis of Case Studies

    case study with data interpretation


    case study with data interpretation


  1. [R18] Case study 2 data analysis using R Language

  2. (Mastering JMP) Visualizing and Exploring Data

  3. Qualitative Research Designs

  4. Difference between Data Analytics and Data Science . #shorts #short

  5. Analysis & Interpretation Of Data (Part-I)



  1. Case Study

    A case study is a research method that involves an in-depth examination and analysis of a particular phenomenon or case, such as an individual, organization, community, event, or situation. It is a qualitative research approach that aims to provide a detailed and comprehensive understanding of the case being studied.

  2. Qualitative case study data analysis: an example from practice

    Data sources: The research example used is a multiple case study that explored the role of the clinical skills laboratory in preparing students for the real world of practice. Data analysis was conducted using a framework guided by the four stages of analysis outlined by Morse ( 1994 ): comprehending, synthesising, theorising and recontextualising.

  3. What Is a Case Study?

    Revised on November 20, 2023. A case study is a detailed study of a specific subject, such as a person, group, place, event, organization, or phenomenon. Case studies are commonly used in social, educational, clinical, and business research. A case study research design usually involves qualitative methods, but quantitative methods are ...

  4. Case Study Methodology of Qualitative Research: Key Attributes and

    A case study is one of the most commonly used methodologies of social research. This article attempts to look into the various dimensions of a case study research strategy, the different epistemological strands which determine the particular case study type and approach adopted in the field, discusses the factors which can enhance the effectiveness of a case study research, and the debate ...

  5. Case Study Method: A Step-by-Step Guide for Business Researchers

    Case study protocol is a formal document capturing the entire set of procedures involved in the collection of empirical material . It extends direction to researchers for gathering evidences, empirical material analysis, and case study reporting . This section includes a step-by-step guide that is used for the execution of the actual study.


    As case study research is a flexible research method, qualitative data analysis methods are commonly used [176]. The basic objective of the analysis is, as in any other analysis, to derive conclusions from the data, keeping a clear chain of evidence.

  7. Learning to Do Qualitative Data Analysis: A Starting Point

    In this article, we take up this open question as a point of departure and offer thematic analysis, an analytic method commonly used to identify patterns across language-based data (Braun & Clarke, 2006), as a useful starting point for learning about the qualitative analysis process.In doing so, we do not advocate for only learning the nuances of thematic analysis, but rather see it as a ...

  8. What Is Data Interpretation? Meaning & Analysis Examples

    7) The Use of Dashboards For Data Interpretation. 8) Business Data Interpretation Examples. Data analysis and interpretation have now taken center stage with the advent of the digital age… and the sheer amount of data can be frightening. In fact, a Digital Universe study found that the total data supply in 2012 was 2.8 trillion gigabytes!

  9. Qualitative Case Study Data Analysis: An Example from Practice

    Qualitative case study methodology is an appropriate strategy for exploring phenomena such as lived experiences, events, and the contexts in which they occur (Houghton et al. 2014;Miles and ...

  10. PDF Analyzing Case Study Evidence

    For case study analysis, one of the most desirable techniques is to use a pattern-matching logic. Such a logic (Trochim, 1989) compares an empiri-cally based pattern with a predicted one (or with several alternative predic-tions). If the patterns coincide, the results can help a case study to strengthen its internal validity. If the case study ...

  11. Data Analysis Techniques for Case Studies

    Qualitative analysis involves analyzing non-numerical data from sources like interviews, observations, documents, and images in a case study. It helps explore context, meaning, and patterns to ...

  12. The case study approach

    Crucially, each case should have a pre-defined boundary which clarifies the nature and time period covered by the case study (i.e. its scope, beginning and end), the relevant social group, organisation or geographical area of interest to the investigator, the types of evidence to be collected, and the priorities for data collection and analysis ...

  13. 10 Real World Data Science Case Studies Projects with Example

    A case study in data science is an in-depth analysis of a real-world problem using data-driven approaches. It involves collecting, cleaning, and analyzing data to extract insights and solve challenges, offering practical insights into how data science techniques can address complex issues across various industries.

  14. PDF Open Case Studies: Statistics and Data Science Education through Real

    question and to create an illustrative data analysis - and the domain expertise needed. As a result, case studies based on realistic challenges, not toy examples, are scarce. To address this, we developed the Open Case Studies (opencasestudies.org) project, which offers a new statistical and data science education case study model.

  15. Data Analysis Case Study: Learn From These Winning Data Projects

    Humana's Automated Data Analysis Case Study. The key thing to note here is that the approach to creating a successful data program varies from industry to industry. Let's start with one to demonstrate the kind of value you can glean from these kinds of success stories. Humana has provided health insurance to Americans for over 50 years.

  16. Four Steps to Analyse Data from a Case Study Method

    propose an approach to the analysis of case study data by logically linking the data to a series of propositions and then interpreting the subsequent information. Like the Yin (1994) strategy, the Miles and Huberman (1994) process of analysis of case study data, although quite detailed, may still be insufficient to guide the novice researcher.

  17. Design and data analysis case-controlled study in clinical research

    Introduction. Clinicians think of case-control study when they want to ascertain association between one clinical condition and an exposure or when a researcher wants to compare patients with disease exposed to the risk factors to non-exposed control group. In other words, case-control study compares subjects who have disease or outcome (cases ...

  18. Humanities Data Analysis: Case Studies with Python

    Humanities Data Analysis: Case Studies with Python is a practical guide to data-intensive humanities research using the Python programming language. The book, written by Folgert Karsdorp, Mike Kestemont and Allen Riddell, was originally published with Princeton University Press in 2021 (for a printed version of the book, see the publisher's website), and is now available as an Open Access ...

  19. Communications in Statistics: Case Studies, Data Analysis and

    The three journals in this series are: Communications in Statistics: Case Studies, Data Analysis and Applications. Communications in Statistics - Simulation and Computation. Communications in Statistics - Theory and Methods. The prestigious and experienced members of our international Editorial Board will guide you from submission to publication.

  20. Data Analysis and Interpretation

    Summary This chapter contains sections titled: Introduction Analysis of Data in Flexible Research Process for Qualitative Data Analysis Validity Improving Validity Quantitative Data Analysis Conclusion

  21. Qualitative case study data analysis: an example from practice

    Data sources The research example used is a multiple case study that explored the role of the clinical skills laboratory in preparing students for the real world of practice. Data analysis was conducted using a framework guided by the four stages of analysis outlined by Morse ( 1994 ): comprehending, synthesising, theorising and recontextualising.

  22. 'The Real Data Set': A Case of Challenging Power Dynamics and

    Analysis. As an ethnographic study, multiple methods of analysis—including memos, mapping, and identifying key events and examples that illustrated themes—happened concurrently (Fetterman, 2010). The first step of analysis included close reading of field notes and interview transcripts and ongoing memo writing to document patterns.

  23. Multi-source data-driven estimation of urban net primary productivity

    The vector boundary data of each administrative district in Wuhan was utilized as auxiliary data in our analysis. 2.1.3. Field sampling data. The study referred to the vegetation type map and land use status map of Wuhan, selected a typical urban vegetation type, and established a multi-scale sample plot for urban vegetation (Zhang and Shao ...

  24. The economic commitment of climate change

    In summary, our estimates develop on previous studies by incorporating the latest data and empirical insights 7,8, as well as in providing a robust empirical lower bound on the persistence of ...

  25. Capturing Lived Experience: Methodological Considerations for

    Data analysis is an interpretive process, and imprinting lived experience in writing opens up the possibility for interpretation (Ricoeur, 1981). ... In the case of the exemplar study, the principal researcher wrote a preliminary analysis and a synthesis for each data collection episode (see Figure 2). Figure 2. Participant observation template.

  26. Food additive emulsifiers and the risk of type 2 diabetes: analysis of

    We found direct associations between the risk of type 2 diabetes and exposures to various food additive emulsifiers widely used in industrial foods, in a large prospective cohort of French adults. Further research is needed to prompt re-evaluation of regulations governing the use of additive emulsifiers in the food industry for better consumer protection.

  27. A New Use for Wegovy Opens the Door to Medicare Coverage for ...

    We used the ICD10Data website to convert the ICD-9 codes used in the Hirsch and Jaff studies to corresponding ICD-10 codes for our analysis based on the 2020 data (ICD-9 codes were replaced by ICD ...

  28. Interpretation of Audio Forensic Information from the Shooting of

    User generated recordings (UGRs) are increasingly presented as evidence in forensic investigations due to the widespread use of handheld cameras, smartphones, and other portable field recording devices. A case study comes from the shooting death of well-known Al Jazeera television correspondent Shireen Abu Akleh on May 11, 2022. Ms.

  29. Few federal criminal defendants go to trial and even fewer are

    This analysis is based on the 2022 federal fiscal year, which began Oct. 1, 2021, and ended Sept. 30, 2022. In fiscal year 2022, only 290 of 71,954 defendants in federal criminal cases - about 0.4% - went to trial and were acquitted, according to a Pew Research Center analysis of the latest available statistics from the federal judiciary .

  30. Inappropriate use of proton pump inhibitors in clinical practice

    We read with interest the population-based cohort studies by Abrahami et al on proton pump inhibitors (PPI) and the risk of gastric and colon cancers.1 2 PPI are used at all levels of healthcare and across different subspecialties for various indications.3 4 A recent systematic review on the global trends and practices of PPI recognised 28 million PPI users from 23 countries, suggesting that ...