assignment in data analysis

6.894 : Interactive Data Visualization

Assignment 2: exploratory data analysis.

In this assignment, you will identify a dataset of interest and perform an exploratory analysis to better understand the shape & structure of the data, investigate initial questions, and develop preliminary insights & hypotheses. Your final submission will take the form of a report consisting of captioned visualizations that convey key insights gained during your analysis.

Step 1: Data Selection

First, you will pick a topic area of interest to you and find a dataset that can provide insights into that topic. To streamline the assignment, we've pre-selected a number of datasets for you to choose from.

However, if you would like to investigate a different topic and dataset, you are free to do so. If working with a self-selected dataset, please check with the course staff to ensure it is appropriate for the course. Be advised that data collection and preparation (also known as data wrangling ) can be a very tedious and time-consuming process. Be sure you have sufficient time to conduct exploratory analysis, after preparing the data.

After selecting a topic and dataset – but prior to analysis – you should write down an initial set of at least three questions you'd like to investigate.

Part 2: Exploratory Visual Analysis

Next, you will perform an exploratory analysis of your dataset using a visualization tool such as Tableau. You should consider two different phases of exploration.

In the first phase, you should seek to gain an overview of the shape & stucture of your dataset. What variables does the dataset contain? How are they distributed? Are there any notable data quality issues? Are there any surprising relationships among the variables? Be sure to also perform "sanity checks" for patterns you expect to see!

In the second phase, you should investigate your initial questions, as well as any new questions that arise during your exploration. For each question, start by creating a visualization that might provide a useful answer. Then refine the visualization (by adding additional variables, changing sorting or axis scales, filtering or subsetting data, etc. ) to develop better perspectives, explore unexpected observations, or sanity check your assumptions. You should repeat this process for each of your questions, but feel free to revise your questions or branch off to explore new questions if the data warrants.

Final Deliverable

Your final submission should take the form of a Google Docs report – similar to a slide show or comic book – that consists of 10 or more captioned visualizations detailing your most important insights. Your "insights" can include important surprises or issues (such as data quality problems affecting your analysis) as well as responses to your analysis questions. To help you gauge the scope of this assignment, see this example report analyzing data about motion pictures . We've annotated and graded this example to help you calibrate for the breadth and depth of exploration we're looking for.

Each visualization image should be a screenshot exported from a visualization tool, accompanied with a title and descriptive caption (1-4 sentences long) describing the insight(s) learned from that view. Provide sufficient detail for each caption such that anyone could read through your report and understand what you've learned. You are free, but not required, to annotate your images to draw attention to specific features of the data. You may perform highlighting within the visualization tool itself, or draw annotations on the exported image. To easily export images from Tableau, use the Worksheet > Export > Image... menu item.

The end of your report should include a brief summary of main lessons learned.

Recommended Data Sources

To get up and running quickly with this assignment, we recommend exploring one of the following provided datasets:

World Bank Indicators, 1960–2017 . The World Bank has tracked global human developed by indicators such as climate change, economy, education, environment, gender equality, health, and science and technology since 1960. The linked repository contains indicators that have been formatted to facilitate use with Tableau and other data visualization tools. However, you're also welcome to browse and use the original data by indicator or by country . Click on an indicator category or country to download the CSV file.

Chicago Crimes, 2001–present (click Export to download a CSV file). This dataset reflects reported incidents of crime (with the exception of murders where data exists for each victim) that occurred in the City of Chicago from 2001 to present, minus the most recent seven days. Data is extracted from the Chicago Police Department's CLEAR (Citizen Law Enforcement Analysis and Reporting) system.

Daily Weather in the U.S., 2017 . This dataset contains daily U.S. weather measurements in 2017, provided by the NOAA Daily Global Historical Climatology Network . This data has been transformed: some weather stations with only sparse measurements have been filtered out. See the accompanying weather.txt for descriptions of each column .

Social mobility in the U.S. . Raj Chetty's group at Harvard studies the factors that contribute to (or hinder) upward mobility in the United States (i.e., will our children earn more than we will). Their work has been extensively featured in The New York Times. This page lists data from all of their papers, broken down by geographic level or by topic. We recommend downloading data in the CSV/Excel format, and encourage you to consider joining multiple datasets from the same paper (under the same heading on the page) for a sufficiently rich exploratory process.

The Yelp Open Dataset provides information about businesses, user reviews, and more from Yelp's database. The data is split into separate files ( business , checkin , photos , review , tip , and user ), and is available in either JSON or SQL format. You might use this to investigate the distributions of scores on Yelp, look at how many reviews users typically leave, or look for regional trends about restaurants. Note that this is a large, structured dataset and you don't need to look at all of the data to answer interesting questions. In order to download the data you will need to enter your email and agree to Yelp's Dataset License .

Additional Data Sources

If you want to investigate datasets other than those recommended above, here are some possible sources to consider. You are also free to use data from a source different from those included here. If you have any questions on whether your dataset is appropriate, please ask the course staff ASAP!

data.boston.gov - City of Boston Open Data
MassData - State of Masachussets Open Data
data.gov - U.S. Government Open Datasets
U.S. Census Bureau - Census Datasets
IPUMS.org - Integrated Census & Survey Data from around the World
Federal Elections Commission - Campaign Finance & Expenditures
Federal Aviation Administration - FAA Data & Research
fivethirtyeight.com - Data and Code behind the Stories and Interactives
Buzzfeed News
Socrata Open Data
17 places to find datasets for data science projects

Visualization Tools

You are free to use one or more visualization tools in this assignment. However, in the interest of time and for a friendlier learning curve, we strongly encourage you to use Tableau . Tableau provides a graphical interface focused on the task of visual data exploration. You will (with rare exceptions) be able to complete an initial data exploration more quickly and comprehensively than with a programming-based tool.

Tableau - Desktop visual analysis software . Available for both Windows and MacOS; register for a free student license.
Data Transforms in Vega-Lite . A tutorial on the various built-in data transformation operators available in Vega-Lite.
Data Voyager , a research prototype from the UW Interactive Data Lab, combines a Tableau-style interface with visualization recommendations. Use at your own risk!
R , using the ggplot2 library or with R's built-in plotting functions.
Jupyter Notebooks (Python) , using libraries such as Altair or Matplotlib .

Data Wrangling Tools

The data you choose may require reformatting, transformation or cleaning prior to visualization. Here are tools you can use for data preparation. We recommend first trying to import and process your data in the same tool you intend to use for visualization. If that fails, pick the most appropriate option among the tools below. Contact the course staff if you are unsure what might be the best option for your data!

Graphical Tools

Tableau Prep - Tableau provides basic facilities for data import, transformation & blending. Tableau prep is a more sophisticated data preparation tool
Trifacta Wrangler - Interactive tool for data transformation & visual profiling.
OpenRefine - A free, open source tool for working with messy data.

Programming Tools

JavaScript data utilities and/or the Datalib JS library .
Pandas - Data table and manipulation utilites for Python.
dplyr - A library for data manipulation in R.
Or, the programming language and tools of your choice...

The assignment score is out of a maximum of 10 points. Submissions that squarely meet the requirements will receive a score of 8. We will determine scores by judging the breadth and depth of your analysis, whether visualizations meet the expressivenes and effectiveness principles, and how well-written and synthesized your insights are.

We will use the following rubric to grade your assignment. Note, rubric cells may not map exactly to specific point scores.

Submission Details

This is an individual assignment. You may not work in groups.

Your completed exploratory analysis report is due by noon on Wednesday 2/19 . Submit a link to your Google Doc report using this submission form . Please double check your link to ensure it is viewable by others (e.g., try it in an incognito window).

Resubmissions. Resubmissions will be regraded by teaching staff, and you may earn back up to 50% of the points lost in the original submission. To resubmit this assignment, please use this form and follow the same submission process described above. Include a short 1 paragraph description summarizing the changes from the initial submission. Resubmissions without this summary will not be regraded. Resubmissions will be due by 11:59pm on Saturday, 3/14. Slack days may not be applied to extend the resubmission deadline. The teaching staff will only begin to regrade assignments once the Final Project phase begins, so please be patient.

Due: 12pm, Wed 2/19
Recommended Datasets
Example Report
Visualization & Data Wrangling Tools
Submission form

Python Data Analysis Example: A Step-by-Step Guide for Beginners

data analysis

Doing real data analysis exercises is a great way to learn. But data analysis is a broad topic, and knowing how to proceed can be half the battle. In this step-by-step guide, we’ll show you a Python data analysis example and demonstrate how to analyze a dataset.

A great way to get practical experience in Python and accelerate your learning is by doing data analysis challenges. This will expose you to several key Python concepts, such as working with different file types, manipulating various data types (e.g. integers and strings), looping, and data visualization. Furthermore, you’ll also learn important data analysis techniques like cleaning data, smoothing noisy data, performing statistical tests and correlation analyses, and more. Along the way, you’ll also learn many built-in functions and Python libraries which make your work easier.

Knowing what steps to take in the data analysis process requires a bit of experience. For those wanting to explore data analysis, this article will show you a step-by-step guide to data analysis using Python. We’ll download a dataset, read it in, and start some exploratory data analysis to understand what we’re working with. Then we’ll be able to choose the best analysis technique to answer some interesting questions about the data.

This article is aimed at budding data analysts who already have a little experience in programming and analysis. If you’re looking for some learning material to get up-to-speed, consider our Introduction to Python for Data Science course, which contains 141 interactive exercises. For more in-depth material, our Python for Data Science track includes 5 interactive courses.

Python for Data Analysis

The process of examining, cleansing, transforming, and modeling data to discover useful information plays a crucial role in business, finance, academia, and other fields. Whether it's understanding customer behavior, optimizing business processes, or making informed decisions, data analysis provides you with the tools to unlock valuable insights from data.

Python has emerged as a preferred tool for data analysis due to its simplicity, versatility, and many o pen-source libraries . With its intuitive syntax and large online community, Python enables both beginners and experts to perform complex data analysis tasks efficiently. Libraries such as pandas, NumPy, and Matplotlib make this possible by providing essential functionalities for all aspects of the data analysis process.

The pandas library simplifies the process of working with structured data (e.g. tabular data, time series). NumPy , which is used for scientific computing in Python, provides powerful array objects and functions for numerical operations. It is essential for the mathematical computations involved in data analysis. It’s particularly useful for working with B ig D ata , as it is very efficient. Matplotlib is a comprehensive library for creating visualizations in Python ; it facilitates the exploration and communication of data insights.

In the following sections, we’ll leverage these libraries to analyze a real-world dataset and demonstrate the process of going from raw data to useful conclusions.

The Sunspots Dataset

For this Python data analysis example, we’ll be working with the Sunspots dataset, which can be downloaded from Kaggle . The data includes a row number, a date, and an observation of the total number of sunspots for each month from 1749 to 2021.

Sunspots are regions of the sun's photosphere that are temporarily cooler than the surrounding material due to a reduction in convective transport of energy. As such, they appear darker and can be relatively easily observed – which accounts for the impressively long time period of the dataset. Sunspots can last anywhere from a few days to a few months, and have diameters ranging from around 16 km to 160,000 km. They can also be associated with solar flares and coronal mass ejections, which makes understanding them important for life on Earth.

Some interesting questions that could be investigated are:

What is the period of sunspot activity?
When can we expect the next peak in solar activity?

Python Data Analysis Example

Step 1: import data.

Once you have downloaded the Sunspots dataset, the next step is to import the data into Python. There are several ways to do this; the one you choose depends on the format of your data.

If you have data in a text file, you may need to read the data in line-by-line using a for loop . As an example, take a look at how we imported the atmospheric sounding dataset in the article 7 Datasets to Practice Data Analysis in Python .

Alternatively, the data could be in the JSON format . In this case, you can use Python’s json library . This is covered in the How to Read and Write JSON Files in Python course.

A common way to store data is in either Excel (.xlsx) or comma-separated-value (.csv) files. In both of these cases, you can read the data directly into a pandas DataFrame. This is a useful way to parse data, since you can directly use many helpful pandas functions to manipulate and process the data. The How to Read and Write CSV Files in Python and How to Read and Write Excel Files in Python courses include interactive exercises to demonstrate this functionality.

Since the Sunspots dataset is in the CSV format, we can read it in using pandas. If you haven’t installed pandas yet, you can do so with a quick command:

Now, you can import the data into a DataFrame:

The read_csv() function automatically parses the data. It comes with many arguments to customize how the data is imported. For example, the index_col argument defines which column to use as the row label. The parse_dates argument defines which column holds dates. Our DataFrame, called df, holds our sunspots data with the variable name Monthly Mean Total Sunspot Number and the date of observation with the variable name Date .

Step 2: Data Cleaning and Preparation

Cleaning the data involves handling missing values, converting variables into the correct data types, and applying any filters.

If your data has missing values, there are a number of possible ways to handle them. You could simply just convert them to NaN (not a number). Alternatively, you could do a forward (backward) fill, which copies the previous (next) value into the missing position. Or you could also interpolate by using neighboring values to extrapolate a value into the missing position. The method you choose depends on your use case.

You should also check to see that numerical data is stored as a float or integer; if not, you need to convert it to the correct data type. If there are outliers in your data, you may consider removing them so as not to bias your results.

Or maybe you’re working with text data and you need to remove punctuation and numbers from your text and convert everything to lowercase. All these considerations fall under the umbrella of data cleaning. For some concrete examples, see our article Python Data Cleaning: A How-to Guide for Beginners .

Let’s start by getting an overview of our dataset:

The df.head() function prints the first 5 rows of data. You can see the row number (starting from zero), the date (in yyyy-mm-dd format), and the observation of the number of sunspots for the month. To check the datatypes of the variables, execute the following command:

The date has the datatype datetime64 , which is used to store dates in pandas, and the number of sunspots variable is a float.

Next, here's how to check if there are any missing data points in the Monthly Mean Total Sunspot Number variable:

This takes advantage of the built-in isna() function, which checks to see if there are any missing values. It returns a series of booleans – True if a value is missing, False if not. Then, we use the built-in function any() to check if any of the booleans are True. This returns False , which indicates there are no missing values in our data. You can find more details about this important step in The Most Helpful Python Data Cleaning Modules .

Step 3: Exploratory Data Analysis

The next stage is to start analyzing your data by calculating summary statistics, plotting histograms and scatter plots, or performing statistical tests. The goal is to gain a better understanding of the variables, and then use this understanding to guide the rest of the analysis. After performing exploratory data analysis, you will have a better understanding of what your data looks like and how to use it to answer questions. Our article Python Exploratory Data Analysis Cheat Sheet contains many more details, examples, and ideas about how to proceed.

A good starting point is to do a basic statistical analysis to determine the mean, median, standard deviation, etc. This can easily be achieved by using the df.describe() function:

We have a total of 3,265 observations and a mean of over 81 sunspots per month. The minimum is zero and the maximum is 398. This gives us an idea of the range of typical values. The standard deviation is about 67, which gives us an idea about how much the number of sunspots varies.

Notice the 50% percentile is less than the mean. This implies the data is skewed to lower values. This is very useful information if we want to do more advanced statistics since some tests assume a normal distribution.

We can confirm this by plotting a histogram of the number of sunspots per month. Visualization is an important skill in Python data analysis. Check out our article The Top 5 Python Libraries for Data Visualization . For our purposes, we’ll use matplotlib . This too can easily be installed with a quick pip install command. The code to plot a histogram looks like this:

Now we can see the most common value is less than 20 sunspots for the month, and numbers above 200 are quite rare. Finally, let’s plot the time series to see the full dataset:

We can see from the above plot there is a periodic increase and decrease in the number of sunspots. It looks like the maximum occurs roughly every 9 – 12 years. A natural question arises as to exactly how long that period is.

Signal processing is a detailed topic, so we’ll skim over some of the hairy details. To keep it simple, we need to decompose the above signal into a frequency spectrum, then find the dominant frequency. From this we can then compute the period. To compute the frequency spectrum, the Fourier Transform can be used, which is implemented in NumPy :

Try plotting the frequency spectrum and you’ll notice many peaks. One of those hairy details of signal processing is the presence of peaks at the start and end of the array np.abs(fft_result) . We can see from the time series we plotted above the period should be somewhere between 9 – 12 years, so we can safely exclude these peaks by slicing the magnitude array to filter out unwanted frequencies:

The output is as follows:

We used NumPy’s argmax() function to find the index of the maximum frequency, used this to find the frequency, and then converted this to a period. We finally print the results as a period of years.

This is a great example of using the understanding gained from exploratory data analysis to inform our data processing so we get a result that makes sense.

Step 4: Drawing Conclusions from Data

We were able to learn that the average number of sunspots per month is around 81, but the distribution is highly skewed to lower numbers. Indeed, the most common number of sunspots per month is less than 20, but in a period of high solar activity (75 th percentile), there could be over 120.

By plotting the time series, we could see the signal is periodic and get an idea that there is a regular maximum and minimum in the number of sunspots. By doing some signal processing, we determined the maximum number of sunspots is about every 11 years. From the timeseries plot we can see the last maximum was around 2014, meaning the next should be around 2025.

Further Python Data Analysis Examples

Working with the Sunspots dataset presents some unique advantages – e.g. it’s not a common dataset. We discuss this in our article 11 Tips for Building a Strong Data Science Portfolio with Python . This example of Python data analysis can also teach us a lot about programming in Python. We learnt how to read data into a pandas DataFrame and summarize our data using built-in functions. We did some plotting with Matplotlib and got a taste of signal processing with NumPy. We also did a little array slicing to get results that make sense. You’ll learn many of these important topics in the Introduction to Python for Data Science course and the Python for Data Science track.

We just scratched the surface of this analysis of sunspot data in Python. There are many more interesting questions which could be answered. For example, is there a trend in the number of sunspots over the 272 years of data? How long does the maximum last? How many sunspots should there be during our predicted next maximum? These questions can all be answered with Python.

There’s always more to learn on your Python data analysis journey, and books are a great resource. Our article The Best Python Books for Data Science has some great suggestions for your next trip to a bookstore. All the suggestions there will give you the tools to delve deeper into Python and data analysis techniques. Then, it’s a matter of practicing what you learn by starting a new data science project. Here are some Python Data Science Project Ideas . Happy coding!

Introduction to Data Analytics

This course is part of multiple programs. Learn more

This course is part of multiple programs

Taught in English

Some content may not be translated

Instructor: Rav Ahuja

Financial aid available

515,636 already enrolled

(14,880 reviews)

Recommended experience

Beginner level

All you need to get started is basic computer literacy, high school level math, and access to a modern web browser such as Chrome or Firefox.

What you'll learn

Explain what Data Analytics is and the key steps in the Data Analytics process

Differentiate between different data roles such as Data Engineer, Data Analyst, Data Scientist, Business Analyst, and Business Intelligence Analyst

Describe the different types of data structures, file formats, and sources of data

Describe the data analysis process involving collecting, wrangling, mining, and visualizing data

Skills you'll gain

Data Science
Spreadsheet
Data Analysis
Microsoft Excel
Data Visualization

Details to know

Add to your LinkedIn profile

See how employees at top companies are mastering in-demand skills

Build your subject-matter expertise

Learn new concepts from industry experts
Gain a foundational understanding of a subject or tool
Develop job-relevant skills with hands-on projects
Earn a shareable career certificate

Earn a career certificate

Add this credential to your LinkedIn profile, resume, or CV

Share it on social media and in your performance review

There are 5 modules in this course

Ready to start a career in Data Analysis but don’t know where to begin? This course presents you with a gentle introduction to Data Analysis, the role of a Data Analyst, and the tools used in this job. You will learn about the skills and responsibilities of a data analyst and hear from several data experts sharing their tips & advice to start a career. This course will help you to differentiate between the roles of Data Analysts, Data Scientists, and Data Engineers.

You will familiarize yourself with the data ecosystem, alongside Databases, Data Warehouses, Data Marts, Data Lakes and Data Pipelines. Continue this exciting journey and discover Big Data platforms such as Hadoop, Hive, and Spark. By the end of this course you’ll be able to understand the fundamentals of the data analysis process including gathering, cleaning, analyzing and sharing data and communicating your insights with the use of visualizations and dashboard tools. This all comes together in the final project where it will test your knowledge of the course material, and provide a real-world scenario of data analysis tasks. This course does not require any prior data analysis, spreadsheet, or computer science experience.

What is Data Analytics

In this module, you will learn about the different types of data analysis and the key steps in a data analysis process. You will gain an understanding of the different components of a modern data ecosystem, and the role Data Engineers, Data Analysts, Data Scientists, Business Analysts, and Business Intelligence Analysts play in this ecosystem. You will also learn about the role, responsibilities, and skillsets required to be a Data Analyst, and what a typical day in the life of a Data Analyst looks like.

What's included

9 videos 3 readings 4 quizzes 1 discussion prompt

9 videos • Total 39 minutes

Course Introduction • 2 minutes • Preview module
Modern Data Ecosystem • 4 minutes
Key Players in the Data Ecosystem • 5 minutes
Defining Data Analysis • 5 minutes
Viewpoints: What is Data Analytics? • 3 minutes
Responsibilities of a Data Analyst • 4 minutes
Viewpoints: Qualities and Skills to be a Data Analyst • 4 minutes
A Day in the Life of a Data Analyst • 5 minutes
Viewpoints: Applications of Data Analytics • 2 minutes

3 readings • Total 22 minutes

Data Analytics vs. Data Analysis • 2 minutes
Summary and Highlights • 10 minutes

4 quizzes • Total 45 minutes

Practice Quiz • 9 minutes
Practice Quiz • 6 minutes
Graded Quiz • 15 minutes

1 discussion prompt • Total 5 minutes

Introduce yourself • 5 minutes

The Data Ecosystem

In this module, you will learn about the different types of data structures, file formats, sources of data, and the languages data professionals use in their day-to-day tasks. You will gain an understanding of various types of data repositories such as Databases, Data Warehouses, Data Marts, Data Lakes, and Data Pipelines. In addition, you will learn about the Extract, Transform, and Load (ETL) Process, which is used to extract, transform, and load data into data repositories. You will gain a basic understanding of Big Data and Big Data processing tools such as Hadoop, Hadoop Distributed File System (HDFS), Hive, and Spark.

11 videos 2 readings 4 quizzes

11 videos • Total 67 minutes

Overview of the Data Analyst Ecosystem • 3 minutes • Preview module
Types of Data • 4 minutes
Understanding Different Types of File Formats • 4 minutes
Sources of Data • 7 minutes
Languages for Data Professionals • 8 minutes
Overview of Data Repositories • 4 minutes
RDBMS • 7 minutes
NoSQL • 7 minutes
Data Marts, Data Lakes, ETL, and Data Pipelines • 6 minutes
Foundations of Big Data • 5 minutes
Big Data Processing Tools • 6 minutes

2 readings • Total 20 minutes

4 quizzes • total 66 minutes.

Practice Quiz • 15 minutes
Practice Quiz • 18 minutes
Graded Quiz • 18 minutes

Gathering and Wrangling Data

In this module, you will learn about the process and steps involved in identifying, gathering, and importing data from disparate sources. You will learn about the tasks involved in wrangling and cleaning data in order to make it ready for analysis. In addition, you will gain an understanding of the different tools that can be used for gathering, importing, wrangling, and cleaning data, along with some of their characteristics, strengths, limitations, and applications.

7 videos 2 readings 4 quizzes

7 videos • Total 39 minutes

Identifying Data for Analysis • 5 minutes • Preview module
Data Sources • 4 minutes
How to Gather and Import Data • 6 minutes
What is Data Wrangling? • 6 minutes
Tools for Data Wrangling • 5 minutes
Data Cleaning • 6 minutes
Viewpoints: Data Preparation and Reliability • 4 minutes

4 quizzes • Total 48 minutes

Mining & visualizing data and communicating results.

In this module, you will learn about the role of Statistical Analysis in mining and visualizing data. You will learn about the various statistical and analytical tools and techniques you can use in order to gain a deeper understanding of your data. These tools help you to understand the patterns, trends, and correlations that exist in data. In addition, you will learn about the various types of data visualizations that can help you communicate and tell a compelling story with your data. You will also gain an understanding of the different tools that can be used for mining and visualizing data, along with some of their characteristics, strengths, limitations, and applications.

8 videos 2 readings 4 quizzes

8 videos • Total 44 minutes

Overview of Statistical Analysis • 8 minutes • Preview module
What is Data Mining? • 5 minutes
Tools for Data Mining • 6 minutes
Overview of Communicating and Sharing Data Analysis Findings • 5 minutes
Viewpoints: Storytelling in Data Analysis • 3 minutes
Introduction to Data Visualization • 5 minutes
Introduction to Visualization and Dashboarding Software • 7 minutes
Viewpoints: Visualization Tools • 3 minutes

Career Opportunities and Data Analysis in Action

In this module, you will learn about the different career opportunities in the field of Data Analysis and the different paths that you can take for getting skilled as a Data Analyst. At the end of the module, you will demonstrate your understanding of some of the basic tasks involved in gathering, wrangling, mining, analyzing, and visualizing data.

7 videos 4 readings 2 quizzes 1 peer review

7 videos • Total 28 minutes

Career Opportunities in Data Analysis • 5 minutes • Preview module
Viewpoints: Get into Data Profession • 3 minutes
Viewpoints: What do Employers look for in a Data Analyst? • 5 minutes
The Many Paths to Data Analysis • 4 minutes
Viewpoints: Career Options for Data Professionals • 3 minutes
Viewpoints: Advice for aspiring Data Analysts • 3 minutes
Viewpoints: Women in Data Professions • 3 minutes

4 readings • Total 32 minutes

Using Data Analysis for Detecting Credit Card Fraud • 10 minutes
Congratulations and Next Steps • 2 minutes
Course Credits and Acknowledgements • 10 minutes

2 quizzes • Total 21 minutes

1 peer review • total 60 minutes.

Peer-Graded Final Assignment • 60 minutes

Instructor ratings

We asked all learners to give feedback on our instructors based on the quality of their teaching style.

IBM is the global leader in business transformation through an open hybrid cloud platform and AI, serving clients in more than 170 countries around the world. Today 47 of the Fortune 50 Companies rely on the IBM Cloud to run their business, and IBM Watson enterprise AI is hard at work in more than 30,000 engagements. IBM is also one of the world’s most vital corporate research organizations, with 28 consecutive years of patent leadership. Above all, guided by principles for trust and transparency and support for a more inclusive society, IBM is committed to being a responsible technology innovator and a force for good in the world. For more information about IBM visit: www.ibm.com

Recommended if you're interested in Data Analysis

Excel Basics for Data Analysis

Data Visualization and Dashboards with Excel and Cognos

IBM Data Analyst

Make progress toward a degree

Professional Certificate

IBM Data Analyst Capstone Project

Why people choose coursera for their career.

Learner reviews

Showing 3 of 14880

14,880 reviews

Reviewed on Sep 3, 2022

Good informative course, could be a little more interactive. While each section had quick test at the end, it would've been nice to have had more engaging questions and activities throughout.

Reviewed on May 11, 2021

I must say as a Coursera learner, this course is unmatched in it's rigor, vividness of concepts and precision of demonstration and layout. Thanks to all those who put this piece of art together.

Reviewed on Mar 13, 2021

Great general and broad information on data analytics. Gives good ideas and examples of career paths that can be followed. I especially liked how it ranked the various careers and specializations.

New to Data Analysis? Start here.

Open new doors with Coursera Plus

Unlimited access to 7,000+ world-class courses, hands-on projects, and job-ready certificate programs - all included in your subscription

Advance your career with an online degree

Earn a degree from world-class universities - 100% online

Join over 3,400 global companies that choose Coursera for Business

Upskill your employees to excel in the digital economy

Frequently asked questions

When will i have access to the lectures and assignments.

Access to lectures and assignments depends on your type of enrollment. If you take a course in audit mode, you will be able to see most course materials for free. To access graded assignments and to earn a Certificate, you will need to purchase the Certificate experience, during or after your audit. If you don't see the audit option:

The course may not offer an audit option. You can try a Free Trial instead, or apply for Financial Aid.

The course may offer 'Full Course, No Certificate' instead. This option lets you see all course materials, submit required assessments, and get a final grade. This also means that you will not be able to purchase a Certificate experience.

What will I get if I subscribe to this Certificate?

When you enroll in the course, you get access to all of the courses in the Certificate, and you earn a certificate when you complete the work. Your electronic Certificate will be added to your Accomplishments page - from there, you can print your Certificate or add it to your LinkedIn profile. If you only want to read and view the course content, you can audit the course for free.

What is the refund policy?

If you subscribed, you get a 7-day free trial during which you can cancel at no penalty. After that, we don’t give refunds, but you can cancel your subscription at any time. See our full refund policy Opens in a new tab .

A Step-by-Step Guide to the Data Analysis Process

Like any scientific discipline, data analysis follows a rigorous step-by-step process. Each stage requires different skills and know-how. To get meaningful insights, though, it’s important to understand the process as a whole. An underlying framework is invaluable for producing results that stand up to scrutiny.

In this post, we’ll explore the main steps in the data analysis process. This will cover how to define your goal, collect data, and carry out an analysis. Where applicable, we’ll also use examples and highlight a few tools to make the journey easier. When you’re done, you’ll have a much better understanding of the basics. This will help you tweak the process to fit your own needs.

Here are the steps we’ll take you through:

Defining the question
Collecting the data
Cleaning the data
Analyzing the data
Sharing your results
Embracing failure

On popular request, we’ve also developed a video based on this article. Scroll further along this article to watch that.

Ready? Let’s get started with step one.

1. Step one: Defining the question

The first step in any data analysis process is to define your objective. In data analytics jargon, this is sometimes called the ‘problem statement’.

Defining your objective means coming up with a hypothesis and figuring how to test it. Start by asking: What business problem am I trying to solve? While this might sound straightforward, it can be trickier than it seems. For instance, your organization’s senior management might pose an issue, such as: “Why are we losing customers?” It’s possible, though, that this doesn’t get to the core of the problem. A data analyst’s job is to understand the business and its goals in enough depth that they can frame the problem the right way.

Let’s say you work for a fictional company called TopNotch Learning. TopNotch creates custom training software for its clients. While it is excellent at securing new clients, it has much lower repeat business. As such, your question might not be, “Why are we losing customers?” but, “Which factors are negatively impacting the customer experience?” or better yet: “How can we boost customer retention while minimizing costs?”

Now you’ve defined a problem, you need to determine which sources of data will best help you solve it. This is where your business acumen comes in again. For instance, perhaps you’ve noticed that the sales process for new clients is very slick, but that the production team is inefficient. Knowing this, you could hypothesize that the sales process wins lots of new clients, but the subsequent customer experience is lacking. Could this be why customers don’t come back? Which sources of data will help you answer this question?

Tools to help define your objective

Defining your objective is mostly about soft skills, business knowledge, and lateral thinking. But you’ll also need to keep track of business metrics and key performance indicators (KPIs). Monthly reports can allow you to track problem points in the business. Some KPI dashboards come with a fee, like Databox and DashThis . However, you’ll also find open-source software like Grafana , Freeboard , and Dashbuilder . These are great for producing simple dashboards, both at the beginning and the end of the data analysis process.

2. Step two: Collecting the data

Once you’ve established your objective, you’ll need to create a strategy for collecting and aggregating the appropriate data. A key part of this is determining which data you need. This might be quantitative (numeric) data, e.g. sales figures, or qualitative (descriptive) data, such as customer reviews. All data fit into one of three categories: first-party, second-party, and third-party data. Let’s explore each one.

What is first-party data?

First-party data are data that you, or your company, have directly collected from customers. It might come in the form of transactional tracking data or information from your company’s customer relationship management (CRM) system. Whatever its source, first-party data is usually structured and organized in a clear, defined way. Other sources of first-party data might include customer satisfaction surveys, focus groups, interviews, or direct observation.

What is second-party data?

To enrich your analysis, you might want to secure a secondary data source. Second-party data is the first-party data of other organizations. This might be available directly from the company or through a private marketplace. The main benefit of second-party data is that they are usually structured, and although they will be less relevant than first-party data, they also tend to be quite reliable. Examples of second-party data include website, app or social media activity, like online purchase histories, or shipping data.

What is third-party data?

Third-party data is data that has been collected and aggregated from numerous sources by a third-party organization. Often (though not always) third-party data contains a vast amount of unstructured data points (big data). Many organizations collect big data to create industry reports or to conduct market research. The research and advisory firm Gartner is a good real-world example of an organization that collects big data and sells it on to other companies. Open data repositories and government portals are also sources of third-party data .

Tools to help you collect data

Once you’ve devised a data strategy (i.e. you’ve identified which data you need, and how best to go about collecting them) there are many tools you can use to help you. One thing you’ll need, regardless of industry or area of expertise, is a data management platform (DMP). A DMP is a piece of software that allows you to identify and aggregate data from numerous sources, before manipulating them, segmenting them, and so on. There are many DMPs available. Some well-known enterprise DMPs include Salesforce DMP , SAS , and the data integration platform, Xplenty . If you want to play around, you can also try some open-source platforms like Pimcore or D:Swarm .

Want to learn more about what data analytics is and the process a data analyst follows? We cover this topic (and more) in our free introductory short course for beginners. Check out tutorial one: An introduction to data analytics .

3. Step three: Cleaning the data

Once you’ve collected your data, the next step is to get it ready for analysis. This means cleaning, or ‘scrubbing’ it, and is crucial in making sure that you’re working with high-quality data . Key data cleaning tasks include:

Removing major errors, duplicates, and outliers —all of which are inevitable problems when aggregating data from numerous sources.
Removing unwanted data points —extracting irrelevant observations that have no bearing on your intended analysis.
Bringing structure to your data —general ‘housekeeping’, i.e. fixing typos or layout issues, which will help you map and manipulate your data more easily.
Filling in major gaps —as you’re tidying up, you might notice that important data are missing. Once you’ve identified gaps, you can go about filling them.

A good data analyst will spend around 70-90% of their time cleaning their data. This might sound excessive. But focusing on the wrong data points (or analyzing erroneous data) will severely impact your results. It might even send you back to square one…so don’t rush it! You’ll find a step-by-step guide to data cleaning here . You may be interested in this introductory tutorial to data cleaning, hosted by Dr. Humera Noor Minhas.

Carrying out an exploratory analysis

Another thing many data analysts do (alongside cleaning data) is to carry out an exploratory analysis. This helps identify initial trends and characteristics, and can even refine your hypothesis. Let’s use our fictional learning company as an example again. Carrying out an exploratory analysis, perhaps you notice a correlation between how much TopNotch Learning’s clients pay and how quickly they move on to new suppliers. This might suggest that a low-quality customer experience (the assumption in your initial hypothesis) is actually less of an issue than cost. You might, therefore, take this into account.

Tools to help you clean your data

Cleaning datasets manually—especially large ones—can be daunting. Luckily, there are many tools available to streamline the process. Open-source tools, such as OpenRefine , are excellent for basic data cleaning, as well as high-level exploration. However, free tools offer limited functionality for very large datasets. Python libraries (e.g. Pandas) and some R packages are better suited for heavy data scrubbing. You will, of course, need to be familiar with the languages. Alternatively, enterprise tools are also available. For example, Data Ladder , which is one of the highest-rated data-matching tools in the industry. There are many more. Why not see which free data cleaning tools you can find to play around with?

4. Step four: Analyzing the data

Finally, you’ve cleaned your data. Now comes the fun bit—analyzing it! The type of data analysis you carry out largely depends on what your goal is. But there are many techniques available. Univariate or bivariate analysis, time-series analysis, and regression analysis are just a few you might have heard of. More important than the different types, though, is how you apply them. This depends on what insights you’re hoping to gain. Broadly speaking, all types of data analysis fit into one of the following four categories.

Descriptive analysis

Descriptive analysis identifies what has already happened . It is a common first step that companies carry out before proceeding with deeper explorations. As an example, let’s refer back to our fictional learning provider once more. TopNotch Learning might use descriptive analytics to analyze course completion rates for their customers. Or they might identify how many users access their products during a particular period. Perhaps they’ll use it to measure sales figures over the last five years. While the company might not draw firm conclusions from any of these insights, summarizing and describing the data will help them to determine how to proceed.

Learn more: What is descriptive analytics?

Diagnostic analysis

Diagnostic analytics focuses on understanding why something has happened . It is literally the diagnosis of a problem, just as a doctor uses a patient’s symptoms to diagnose a disease. Remember TopNotch Learning’s business problem? ‘Which factors are negatively impacting the customer experience?’ A diagnostic analysis would help answer this. For instance, it could help the company draw correlations between the issue (struggling to gain repeat business) and factors that might be causing it (e.g. project costs, speed of delivery, customer sector, etc.) Let’s imagine that, using diagnostic analytics, TopNotch realizes its clients in the retail sector are departing at a faster rate than other clients. This might suggest that they’re losing customers because they lack expertise in this sector. And that’s a useful insight!

Predictive analysis

Predictive analysis allows you to identify future trends based on historical data . In business, predictive analysis is commonly used to forecast future growth, for example. But it doesn’t stop there. Predictive analysis has grown increasingly sophisticated in recent years. The speedy evolution of machine learning allows organizations to make surprisingly accurate forecasts. Take the insurance industry. Insurance providers commonly use past data to predict which customer groups are more likely to get into accidents. As a result, they’ll hike up customer insurance premiums for those groups. Likewise, the retail industry often uses transaction data to predict where future trends lie, or to determine seasonal buying habits to inform their strategies. These are just a few simple examples, but the untapped potential of predictive analysis is pretty compelling.

Prescriptive analysis

Prescriptive analysis allows you to make recommendations for the future. This is the final step in the analytics part of the process. It’s also the most complex. This is because it incorporates aspects of all the other analyses we’ve described. A great example of prescriptive analytics is the algorithms that guide Google’s self-driving cars. Every second, these algorithms make countless decisions based on past and present data, ensuring a smooth, safe ride. Prescriptive analytics also helps companies decide on new products or areas of business to invest in.

Learn more: What are the different types of data analysis?

5. Step five: Sharing your results

You’ve finished carrying out your analyses. You have your insights. The final step of the data analytics process is to share these insights with the wider world (or at least with your organization’s stakeholders!) This is more complex than simply sharing the raw results of your work—it involves interpreting the outcomes, and presenting them in a manner that’s digestible for all types of audiences. Since you’ll often present information to decision-makers, it’s very important that the insights you present are 100% clear and unambiguous. For this reason, data analysts commonly use reports, dashboards, and interactive visualizations to support their findings.

How you interpret and present results will often influence the direction of a business. Depending on what you share, your organization might decide to restructure, to launch a high-risk product, or even to close an entire division. That’s why it’s very important to provide all the evidence that you’ve gathered, and not to cherry-pick data. Ensuring that you cover everything in a clear, concise way will prove that your conclusions are scientifically sound and based on the facts. On the flip side, it’s important to highlight any gaps in the data or to flag any insights that might be open to interpretation. Honest communication is the most important part of the process. It will help the business, while also helping you to excel at your job!

Tools for interpreting and sharing your findings

There are tons of data visualization tools available, suited to different experience levels. Popular tools requiring little or no coding skills include Google Charts , Tableau , Datawrapper , and Infogram . If you’re familiar with Python and R, there are also many data visualization libraries and packages available. For instance, check out the Python libraries Plotly , Seaborn , and Matplotlib . Whichever data visualization tools you use, make sure you polish up your presentation skills, too. Remember: Visualization is great, but communication is key!

You can learn more about storytelling with data in this free, hands-on tutorial . We show you how to craft a compelling narrative for a real dataset, resulting in a presentation to share with key stakeholders. This is an excellent insight into what it’s really like to work as a data analyst!

6. Step six: Embrace your failures

The last ‘step’ in the data analytics process is to embrace your failures. The path we’ve described above is more of an iterative process than a one-way street. Data analytics is inherently messy, and the process you follow will be different for every project. For instance, while cleaning data, you might spot patterns that spark a whole new set of questions. This could send you back to step one (to redefine your objective). Equally, an exploratory analysis might highlight a set of data points you’d never considered using before. Or maybe you find that the results of your core analyses are misleading or erroneous. This might be caused by mistakes in the data, or human error earlier in the process.

While these pitfalls can feel like failures, don’t be disheartened if they happen. Data analysis is inherently chaotic, and mistakes occur. What’s important is to hone your ability to spot and rectify errors. If data analytics was straightforward, it might be easier, but it certainly wouldn’t be as interesting. Use the steps we’ve outlined as a framework, stay open-minded, and be creative. If you lose your way, you can refer back to the process to keep yourself on track.

In this post, we’ve covered the main steps of the data analytics process. These core steps can be amended, re-ordered and re-used as you deem fit, but they underpin every data analyst’s work:

Define the question —What business problem are you trying to solve? Frame it as a question to help you focus on finding a clear answer.
Collect data —Create a strategy for collecting data. Which data sources are most likely to help you solve your business problem?
Clean the data —Explore, scrub, tidy, de-dupe, and structure your data as needed. Do whatever you have to! But don’t rush…take your time!
Analyze the data —Carry out various analyses to obtain insights. Focus on the four types of data analysis: descriptive, diagnostic, predictive, and prescriptive.
Share your results —How best can you share your insights and recommendations? A combination of visualization tools and communication is key.
Embrace your mistakes —Mistakes happen. Learn from them. This is what transforms a good data analyst into a great one.

What next? From here, we strongly encourage you to explore the topic on your own. Get creative with the steps in the data analysis process, and see what tools you can find. As long as you stick to the core principles we’ve described, you can create a tailored technique that works for you.

To learn more, check out our free, 5-day data analytics short course . You might also be interested in the following:

These are the top 9 data analytics tools
10 great places to find free datasets for your next project
How to build a data analytics portfolio

DATA 275 Introduction to Data Analytics

Getting Started with SPSS
Variable View
Option Suggestions
SPSS Viewer
Entering Data
Cleaning & Checking Your SPSS Database
Recoding Data: Collapsing Continuous Data
Constructing Scales and Checking Their Reliability
Formatting Tables in APA style
Creating a syntax
Public Data Sources

Data Analytics Project Assignment

Literature Review This link opens in a new window

For your research project you will conduct data analysis and right a report summarizing your analysis and the findings from your analysis. You will accomplish this by completing a series of assignments.

Data 275 Research Project Assignment

In this week’s assignment, you are required accomplish the following tasks:

1. Propose a topic for you project

The topic you select for your capstone depends on your interest and the data problem you want to address. Try to pick a topic that you would enjoy researching and writing about.

Your topic selection will also be influenced by data availability. Because, this is a data analytics project, you will need to have access to data. If you have access to your organization’s data, you are free to use it. If you choose to do so, all information presented must be in secure form because Davenport University does not assume any responsibility for the security of corporate data. Otherwise, you can select a topic that is amenable to publicly available data.

Click the link for some useful suggestions: Project Proposal Suggestions

2. Find a data set of your interest and download it

There are many publicly available data sets that you can use for your project. The library has compiled a list of many possible sources of data. Click on the link below to explore these sources.

Public Data Sources

The data set you select must have:

At least 50 observations (50 rows) and at least 4 variables (columns) excluding identification variables At least one dependent variable

You must provide:

A proper citation of the data source using APA style format A discussion on how the data was collected and by whom The number of variables in the data set The number of observations/subjects in the data set A description of each variable together with an explanation of how it is measured (e.g. the unit of measurement).

Deliverable

A minimum of one page description of your data analytics project which must include the following:

A title for your project A brief description of the project Major stakeholders who would use the information that would be generated from your analysis and how they would use/benefit from that information A description of the dataset you will use for your project

<< Previous: Public Data Sources
Next: Literature Review >>
Last Updated: Mar 15, 2024 10:33 AM
URL: https://davenport.libguides.com/data275

Comprehensive Learning Paths
150+ Hours of Videos
Complete Access to Jupyter notebooks, Datasets, References.

101 Pandas Exercises for Data Analysis

April 27, 2018
Selva Prabhakaran

101 python pandas exercises are designed to challenge your logical muscle and to help internalize data manipulation with python’s favorite package for data analysis. The questions are of 3 levels of difficulties with L1 being the easiest to L3 being the hardest.

You might also like to practice the 101 NumPy exercises , they are often used together.

1. How to import pandas and check the version?

2. how to create a series from a list, numpy array and dict.

Create a pandas series from each of the items below: a list, numpy and a dictionary

3. How to convert the index of a series into a column of a dataframe?

Difficulty Level: L1

Convert the series ser into a dataframe with its index as another column on the dataframe.

4. How to combine many series to form a dataframe?

Combine ser1 and ser2 to form a dataframe.

5. How to assign name to the series’ index?

Give a name to the series ser calling it ‘alphabets’.

6. How to get the items of series A not present in series B?

Difficulty Level: L2

From ser1 remove items present in ser2 .

7. How to get the items not common to both series A and series B?

Get all items of ser1 and ser2 not common to both.

8. How to get the minimum, 25th percentile, median, 75th, and max of a numeric series?

Difficuty Level: L2

Compute the minimum, 25th percentile, median, 75th, and maximum of ser .

9. How to get frequency counts of unique items of a series?

Calculte the frequency counts of each unique value ser .

10. How to keep only top 2 most frequent values as it is and replace everything else as ‘Other’?

From ser , keep the top 2 most frequent items as it is and replace everything else as ‘Other’.

11. How to bin a numeric series to 10 groups of equal size?

Bin the series ser into 10 equal deciles and replace the values with the bin name.

Desired Output

12. How to convert a numpy array to a dataframe of given shape? (L1)

Reshape the series ser into a dataframe with 7 rows and 5 columns

13. How to find the positions of numbers that are multiples of 3 from a series?

Find the positions of numbers that are multiples of 3 from ser .

14. How to extract items at given positions from a series

From ser , extract the items at positions in list pos .

15. How to stack two series vertically and horizontally ?

Stack ser1 and ser2 vertically and horizontally (to form a dataframe).

16. How to get the positions of items of series A in another series B?

Get the positions of items of ser2 in ser1 as a list.

17. How to compute the mean squared error on a truth and predicted series?

Compute the mean squared error of truth and pred series.

18. How to convert the first character of each element in a series to uppercase?

Change the first character of each word to upper case in each word of ser .

19. How to calculate the number of characters in each word in a series?

20. how to compute difference of differences between consequtive numbers of a series.

Difference of differences between the consequtive numbers of ser .

21. How to convert a series of date-strings to a timeseries?

Difficiulty Level: L2

22. How to get the day of month, week number, day of year and day of week from a series of date strings?

Get the day of month, week number, day of year and day of week from ser .

Desired output

23. How to convert year-month string to dates corresponding to the 4th day of the month?

Change ser to dates that start with 4th of the respective months.

24. How to filter words that contain atleast 2 vowels from a series?

Difficiulty Level: L3

From ser , extract words that contain atleast 2 vowels.

25. How to filter valid emails from a series?

Extract the valid emails from the series emails . The regex pattern for valid emails is provided as reference.

26. How to get the mean of a series grouped by another series?

Compute the mean of weights of each fruit .

27. How to compute the euclidean distance between two series?

Compute the euclidean distance between series (points) p and q, without using a packaged formula.

28. How to find all the local maxima (or peaks) in a numeric series?

Get the positions of peaks (values surrounded by smaller values on both sides) in ser .

29. How to replace missing spaces in a string with the least frequent character?

Replace the spaces in my_str with the least frequent character.

30. How to create a TimeSeries starting ‘2000-01-01’ and 10 weekends (saturdays) after that having random numbers as values?

31. how to fill an intermittent time series so all missing dates show up with values of previous non-missing date.

ser has missing dates and values. Make all missing dates appear and fill up with value from previous date.

32. How to compute the autocorrelations of a numeric series?

Compute autocorrelations for the first 10 lags of ser . Find out which lag has the largest correlation.

33. How to import only every nth row from a csv file to create a dataframe?

Import every 50th row of BostonHousing dataset as a dataframe.

34. How to change column values when importing csv to a dataframe?

Import the boston housing dataset , but while importing change the 'medv' (median house value) column so that values < 25 becomes ‘Low’ and > 25 becomes ‘High’.

35. How to create a dataframe with rows as strides from a given series?

36. how to import only specified columns from a csv file.

Import ‘crim’ and ‘medv’ columns of the BostonHousing dataset as a dataframe.

37. How to get the n rows, n columns, datatype, summary stats of each column of a dataframe? Also get the array and list equivalent.

Get the number of rows, columns, datatype and summary statistics of each column of the Cars93 dataset. Also get the numpy array and list equivalent of the dataframe.

38. How to extract the row and column number of a particular cell with given criterion?

Which manufacturer, model and type has the highest Price ? What is the row and column number of the cell with the highest Price value?

39. How to rename a specific columns in a dataframe?

Rename the column Type as CarType in df and replace the ‘.’ in column names with ‘_’.

Desired Solution

40. How to check if a dataframe has any missing values?

Check if df has any missing values.

41. How to count the number of missing values in each column?

Count the number of missing values in each column of df . Which column has the maximum number of missing values?

42. How to replace missing values of multiple numeric columns with the mean?

Replace missing values in Min.Price and Max.Price columns with their respective mean.

43. How to use apply function on existing columns with global variables as additional arguments?

Difficulty Level: L3

In df , use apply method to replace the missing values in Min.Price with the column’s mean and those in Max.Price with the column’s median.

Use Hint from StackOverflow

44. How to select a specific column from a dataframe as a dataframe instead of a series?

Get the first column ( a ) in df as a dataframe (rather than as a Series).

45. How to change the order of columns of a dataframe?

Actually 3 questions.

Create a generic function to interchange two columns, without hardcoding column names.

Sort the columns in reverse alphabetical order, that is colume 'e' first through column 'a' last.

46. How to set the number of rows and columns displayed in the output?

Change the pamdas display settings on printing the dataframe df it shows a maximum of 10 rows and 10 columns.

47. How to format or suppress scientific notations in a pandas dataframe?

Suppress scientific notations like ‘e-03’ in df and print upto 4 numbers after decimal.

48. How to format all the values in a dataframe as percentages?

Format the values in column 'random' of df as percentages.

49. How to filter every nth row in a dataframe?

From df , filter the 'Manufacturer' , 'Model' and 'Type' for every 20th row starting from 1st (row 0).

50. How to create a primary key index by combining relevant columns?

In df , Replace NaN s with ‘missing’ in columns 'Manufacturer' , 'Model' and 'Type' and create a index as a combination of these three columns and check if the index is a primary key.

51. How to get the row number of the nth largest value in a column?

Find the row position of the 5th largest value of column 'a' in df .

52. How to find the position of the nth largest value greater than a given value?

In ser , find the position of the 2nd largest value greater than the mean.

53. How to get the last n rows of a dataframe with row sum > 100?

Get the last two rows of df whose row sum is greater than 100.

54. How to find and cap outliers from a series or dataframe column?

Replace all values of ser in the lower 5%ile and greater than 95%ile with respective 5th and 95th %ile value.

55. How to reshape a dataframe to the largest possible square after removing the negative values?

Reshape df to the largest possible square with negative values removed. Drop the smallest values if need be. The order of the positive numbers in the result should remain the same as the original.

56. How to swap two rows of a dataframe?

Swap rows 1 and 2 in df .

57. How to reverse the rows of a dataframe?

Reverse all the rows of dataframe df .

58. How to create one-hot encodings of a categorical variable (dummy variables)?

Get one-hot encodings for column 'a' in the dataframe df and append it as columns.

59. Which column contains the highest number of row-wise maximum values?

Obtain the column name with the highest number of row-wise maximum’s in df .

60. How to create a new column that contains the row number of nearest column by euclidean distance?

Create a new column such that, each row contains the row number of nearest row-record by euclidean distance.

61. How to know the maximum possible correlation value of each column against other columns?

Compute maximum possible absolute correlation value of each column against other columns in df .

62. How to create a column containing the minimum by maximum of each row?

Compute the minimum-by-maximum for every row of df .

63. How to create a column that contains the penultimate value in each row?

Create a new column 'penultimate' which has the second largest value of each row of df .

64. How to normalize all columns in a dataframe?

Normalize all columns of df by subtracting the column mean and divide by standard deviation.
Range all columns of df such that the minimum value in each column is 0 and max is 1.

Don’t use external packages like sklearn.

65. How to compute the correlation of each row with the suceeding row?

Compute the correlation of each row of df with its succeeding row.

66. How to replace both the diagonals of dataframe with 0?

Replace both values in both diagonals of df with 0.

67. How to get the particular group of a groupby dataframe by key?

This is a question related to understanding of grouped dataframe. From df_grouped , get the group belonging to 'apple' as a dataframe.

68. How to get the n’th largest value of a column when grouped by another column?

In df , find the second largest value of 'taste' for 'banana'

69. How to compute grouped mean on pandas dataframe and keep the grouped column as another column (not index)?

In df , Compute the mean price of every fruit , while keeping the fruit as another column instead of an index.

70. How to join two dataframes by 2 columns so they have only the common rows?

Join dataframes df1 and df2 by ‘fruit-pazham’ and ‘weight-kilo’.

71. How to remove rows from a dataframe that are present in another dataframe?

From df1 , remove the rows that are present in df2 . All three columns must be the same.

72. How to get the positions where values of two columns match?

73. how to create lags and leads of a column in a dataframe.

Create two new columns in df , one of which is a lag1 (shift column a down by 1 row) of column ‘a’ and the other is a lead1 (shift column b up by 1 row).

74. How to get the frequency of unique values in the entire dataframe?

Get the frequency of unique values in the entire dataframe df .

75. How to split a text column into two separate columns?

Split the string column in df to form a dataframe with 3 columns as shown.

To be continued . .

How to convert python code to cython (and speed up 100x), how to convert python to cython inside jupyter notebooks, install opencv python – a comprehensive guide to installing “opencv-python”, install pip mac – how to install pip in macos: a comprehensive guide, scrapy vs. beautiful soup: which is better for web scraping, add python to path – how to add python to the path environment variable in windows, similar articles, complete introduction to linear regression in r, how to implement common statistical significance tests and find the p value, logistic regression – a complete tutorial with examples in r.

Subscribe to Machine Learning Plus for high value data science content

Machine Learning A-Z™: Hands-On Python & R In Data Science

Free sample videos:.

Data Selection
Deliverables
Data Sources
Data Wrangling
<< Back to home

Assignment 2: Exploratory Data Analysis

In this assignment, you will identify a dataset of interest and perform exploratory analysis to better understand the shape & structure of the data, identify data quality issues, investigate initial questions, and develop preliminary insights & hypotheses. Your final submission will take the form of a report consisting of annotated and/or captioned visualizations that convey key insights gained during your analysis process.

Step 1: Data Selection

First, pick a topic area of interest to you and find a dataset that can provide insights into that topic. To streamline the assignment, we've pre-selected a number of datasets included below for you to choose from (see the Recommended Data Sources section below).

However, if you would like to investigate a different topic and dataset, you are free to do so. If working with a self-selected dataset and you have doubts about its appropriateness for the course, please check with the course staff. Be advised that data collection and preparation (also known as data wrangling ) can be a very tedious and time-consuming process. Be sure you have sufficient time to conduct exploratory analysis, after preparing the data.

After selecting a topic and dataset – but prior to analysis – you should write down an initial set of at least three questions you'd like to investigate. These questions should be clearly listed at the top of your final submission report.

Part 2: Exploratory Visual Analysis

Next, you will perform an exploratory analysis of your dataset using a visualization tool such as Vega-Lite or Tableau. You should consider two different phases of exploration.

In the first phase, you should seek to gain an overview of the shape & stucture of your dataset. What variables does the dataset contain? How are they distributed? Are there any notable data quality issues? Are there any surprising relationships among the variables? Be sure to perform "sanity checks" for any patterns you expect the data to contain.

Final Deliverable

Your final submission should take the form of a sequence of images – similar to a comic book – that consists of 8 or more visualizations detailing your most important insights.

Your "insights" can include surprises or issues (such as data quality problems affecting your analysis) as well as responses to your analysis questions. Where appropriate, we encourage you to include annotated visualizations to guide viewers' attention and provide interpretive context. (See this page for some examples of what we mean by "annotated visualizations.")

Each image should be a visualization, including any titles or descriptive annotations highlighting the insight(s) shown in that view. For example, annotations could take the form of guidelines and text labels, differential coloring, and/or fading of non-focal elements. You are also free to include a short caption for each image, though no more than 2 sentences: be concise! You may create annotations using the visualization tools of your choice (see our tool recommendations below), or by adding them using image editing or vector graphics tools.

Provide sufficient detail such that anyone can read your report and understand what you've learned without already being familiar with the dataset. For example, be sure to provide a clear overview of what data is being visualized and what the data variables mean. To help gauge the scope of this assignment, see this example report analyzing motion picture data .

You must write up your report as an Observable notebook, similar to the example above. From a private notebook, click the "..." menu button in the upper right and select "Enable link sharing". Then submit the URL of your notebook on the Canvas A2 submission page .

Be sure to enable link sharing, otherwise the course staff will not be able to view your submission and you may face a late submission penalty! Also note that if you make changes to the page after link sharing is enabled, you must reshare the link from Observable.

To export a Vega-Lite visualization, be sure you are using the "canvas" renderer, right click the image, and select "Save Image As...".
To export images from Tableau, use the Worksheet > Export > Image... menu item.
To add an image to an Observable notebook, first add your image as a notebook file attachment: click the "..." menu button and select "File attachments". Then load the image in a new notebook cell: FileAttachment("your-file-name.png").image() .

Recommended Data Sources

To get up and running quickly with this assignment, we recommend using one of the following provided datasets or sources, but you are free to use any dataset of your choice.

The World Bank Data, 1960-2017

The World Bank has tracked global human development by indicators such as climate change, economy, education, environment, gender equality, health, and science and technology since 1960. We have 20 indicators from the World Bank for you to explore . Alternatively, you can browse the original data by indicators or by countries . Click on an indicator category or country to download the CSV file.

Data: https://github.com/ZeningQu/World-Bank-Data-by-Indicators

Daily Weather in the U.S., 2017

This dataset contains daily U.S. weather measurements in 2017, provided by the NOAA Daily Global Historical Climatology Network . This data has been transformed: some weather stations with only sparse measurements have been filtered out. See the accompanying weather.txt for descriptions of each column .

Data: weather.csv.gz (gzipped CSV)

Yelp Open Dataset

This dataset provides information about businesses, user reviews, and more from Yelp's database. The data is split into separate files ( business , checkin , photos , review , tip , and user ), and is available in either JSON or SQL format. You might use this to investigate the distributions of scores on yelp, look at how many reviews users typically leave, or look for regional trends about restaurants. Note that this is a large, structured dataset and you don't need to look at all of the data to answer interesting questions.

Important Note: In order to download and use this data you will need to enter your email and agree to Yelp's Dataset License .

Data: Yelp Access Page (data available in JSON & SQL formats)

Additional Data Sources

Here are some other possible sources to consider. You are also free to use data from a source different from those included here. If you have any questions on whether a dataset is appropriate, please ask the course staff ASAP!

data.seattle.gov - City of Seattle Open Data
data.wa.gov - State of Washington Open Data
nwdata.org - Open Data & Civic Tech Resources for the Pacific Northwest
data.gov - U.S. Government Open Datasets
U.S. Census Bureau - Census Datasets
IPUMS.org - Integrated Census & Survey Data from around the World
Federal Elections Commission - Campaign Finance & Expenditures
Federal Aviation Administration - FAA Data & Research
fivethirtyeight.com - Data and Code behind the Stories and Interactives
Buzzfeed News - Open-source data from BuzzFeed's newsroom
Kaggle Datasets - Datasets for Kaggle contests
List of datasets useful for course projects - curated by Mike Freeman

Data Wrangling Tools

The data you choose may require reformatting, transformation, or cleaning prior to visualization. Here are tools you can use for data preparation. We recommend first trying to import and process your data in the same tool you intend to use for visualization. If that fails, pick the most appropriate option among the tools below. Contact the course staff if you are unsure what might be the best option for your data!

Graphical Tools

Tableau - Tableau provides basic facilities for data import, transformation & blending.
Trifacta Wrangler - Interactive tool for data transformation & visual profiling.
OpenRefine - A free, open source tool for working with messy data.

Programming Tools

Arquero : JavaScript library for wrangling and transforming data tables.
JavaScript basics for manipulating data in the browser .
Pandas - Data table and manipulation utilites for Python.
dplyr - A library for data manipulation in R.
Or, the programming language and tools of your choice...

Visualization Tools

You are free to use one or more visualization tools in this assignment. However, in the interest of time and for a friendlier learning curve, we strongly encourage you to use Tableau and/or Vega-Lite . To help you get up and running, the quiz sections on Thursday 1/14 will provide an introductory Tableau tutorial, so you may want to come prepared with questions.

Tableau - Desktop visual analysis software . Available for both Windows and MacOS; you can register for a free student license from the Tableau website.
Vega-Lite is a high-level grammar of interactive graphics. It provides a concise, declarative JSON syntax to create a large range of visualizations for data analysis and presentation.
Jupyter Notebooks (Python) , using libraries such as Altair or Matplotlib .
Voyager - Research prototype from the UW Interactive Data Lab . Voyager combines a Tableau-style interface with visualization recommendations. Use at your own risk!

Grading Criteria

Each submission will be graded based on both the analysis process and included visualizations. Here are our grading criteria:

Poses clear questions applicable to the chosen dataset.
Appropriate data quality assessment and transformation.
Sufficient breadth of analysis, exploring multiple questions.
Sufficient depth of analysis, with appropriate follow-up questions.
Expressive & effective visualizations crafted to investigate analysis questions.
Clearly written, understandable annotations that communicate primary insights.

Submission Details

Your completed exploratory analysis report is due Monday 1/25, 11:59pm . As described above, your report should take the form of an Observable notebook. Submit the URL of your notebook ( with link sharing enabled! ) on the Canvas A2 page .

Note: If you enabled link sharing and/or submitted your link early, be sure to reshare the link from Observable when you are finished making changes to ensure all the changes are visible.

Browse Course Material

Course info, instructors.

Prof. Cynthia Rudin
Allison Chang
Dimitrios Bisias

Departments

Sloan School of Management
Institute for Data, Systems, and Society

As Taught In

Data Mining
Probability and Statistics

Learning Resource Types

Statistical thinking and data analysis, assignments, computer assignments.

There are three computer homework assignments, which comprise a total of 20% of the course grade and should be completed independently. The MATLAB ® tutorial below covers the basics of MATLAB.

MATLAB Tutorial ( PDF )

Optional Assignments

The optional homework assignments correspond to problems from the course textbook:

Tamhane, Ajit C., and Dorothy D. Dunlop. Statistics and Data Analysis: From Elementary to Intermediate . Prentice Hall, 1999. ISBN: 9780137444267.

You are leaving MIT OpenCourseWare

How to Write Data Analysis Reports in 9 Easy Steps

Enjoy reading this blog post written by our experts or partners.

If you want to see what Databox can do for you, click here .

Imagine a bunch of bricks. They don’t have a purpose until you put them together into a house, do they?

In business intelligence, data is your building material, and a quality data analysis report is what you want to see as the result.

But if you’ve ever tried to use the collected data and assemble it into an insightful report, you know it’s not an easy job to do. Data is supposed to tell a story about your performance, but there’s a long way from unprocessed, raw data to a meaningful narrative that you can use to create an actionable plan for making steady progress towards your goals.

This article will help you improve the quality of your data analysis reports and build them effortlessly and fast. Let’s jump right in.

What Is a Data Analysis Report?

Why is data analysis reporting important, how to write a data analysis report 9 simple steps, data analysis report examples.

marketing_overview_hubspot_ga_dashboard_databox

A data analysis report is a type of business report in which you present quantitative and qualitative data to evaluate your strategies and performance. Based on this data, you give recommendations for further steps and business decisions while using the data as evidence that backs up your evaluation.

Today, data analysis is one of the most important elements of business intelligence strategies as companies have realized the potential of having data-driven insights at hand to help them make data-driven decisions.

Just like you’ll look at your car’s dashboard if something’s wrong, you’ll pull your data to see what’s causing drops in website traffic, conversions, or sales – or any other business metric you may be following. This unprocessed data still doesn’t give you a diagnosis – it’s the first step towards a quality analysis. Once you’ve extracted and organized your data, it’s important to use graphs and charts to visualize it and make it easier to draw conclusions.

Once you add meaning to your data and create suggestions based on it, you have a data analysis report.

A vital detail everyone should know about data analysis reports is their accessibility for everyone in your team, and the ability to innovate. Your analysis report will contain your vital KPIs, so you can see where you’re reaching your targets and achieving goals, and where you need to speed up your activities or optimize your strategy. If you can uncover trends or patterns in your data, you can use it to innovate and stand out by offering even more valuable content, services, or products to your audience.

Data analysis is vital for companies for several reasons.

A reliable source of information

Trusting your intuition is fine, but relying on data is safer. When you can base your action plan on data that clearly shows that something is working or failing, you won’t only justify your decisions in front of the management, clients, or investors, but you’ll also be sure that you’ve taken appropriate steps to fix an issue or seize an important opportunity.

A better understanding of your business

According to Databox’s State of Business Reporting , most companies stated that regular monitoring and reporting improved progress monitoring, increased team effectiveness, allowed them to identify trends more easily, and improved financial performance. Data analysis makes it easier to understand your business as a whole, and each aspect individually. You can see how different departments analyze their workflow and how each step impacts their results in the end, by following their KPIs over time. Then, you can easily conclude what your business needs to grow – to boost your sales strategy, optimize your finances, or up your SEO game, for example.

An additional way to understand your business better is to compare your most important metrics and KPIs against companies that are just like yours. With Databox Benchmarks , you will need only one spot to see how all of your teams stack up against your peers and competitors.

Instantly and Anonymously Benchmark Your Company’s Performance Against Others Just Like You

If you ever asked yourself:

How does our marketing stack up against our competitors?
Are our salespeople as productive as reps from similar companies?
Are our profit margins as high as our peers?

Databox Benchmark Groups can finally help you answer these questions and discover how your company measures up against similar companies based on your KPIs.

When you join Benchmark Groups, you will:

Get instant, up-to-date data on how your company stacks up against similar companies based on the metrics most important to you. Explore benchmarks for dozens of metrics, built on anonymized data from thousands of companies and get a full 360° view of your company’s KPIs across sales, marketing, finance, and more.
Understand where your business excels and where you may be falling behind so you can shift to what will make the biggest impact. Leverage industry insights to set more effective, competitive business strategies. Explore where exactly you have room for growth within your business based on objective market data.
Keep your clients happy by using data to back up your expertise. Show your clients where you’re helping them overperform against similar companies. Use the data to show prospects where they really are… and the potential of where they could be.
Get a valuable asset for improving yearly and quarterly planning . Get valuable insights into areas that need more work. Gain more context for strategic planning.

The best part?

Benchmark Groups are free to access.
The data is 100% anonymized. No other company will be able to see your performance, and you won’t be able to see the performance of individual companies either.

When it comes to showing you how your performance compares to others, here is what it might look like for the metric Average Session Duration:

And here is an example of an open group you could join:

And this is just a fraction of what you’ll get. With Databox Benchmarks, you will need only one spot to see how all of your teams stack up — marketing, sales, customer service, product development, finance, and more.

Choose criteria so that the Benchmark is calculated using only companies like yours
Narrow the benchmark sample using criteria that describe your company
Display benchmarks right on your Databox dashboards

Sounds like something you want to try out? Join a Databox Benchmark Group today!

It makes data accessible to everyone

Data doesn’t represent a magical creature reserved for data scientists only anymore. Now that you have streamlined and easy-to-follow data visualizations and tools that automatically show the latest figures, you can include everyone in the decision-making process as they’ll understand what means what in the charts and tables. The data may be complex, but it becomes easy to read when combined with proper illustrations. And when your teams gain such useful and accessible insight, they will feel motivated to act on it immediately.

Better collaboration

Data analysis reports help teams collaborate better, as well. You can apply the SMART technique to your KPIs and goals, because your KPIs become assignable. When they’re easy to interpret for your whole team, you can assign each person with one or multiple KPIs that they’ll be in charge of. That means taking a lot off a team leader’s plate so they can focus more on making other improvements in the business. At the same time, removing inaccurate data from your day-to-day operations will improve friction between different departments, like marketing and sales, for instance.

More productivity

You can also expect increased productivity, since you’ll be saving time you’d otherwise spend on waiting for specialists to translate data for other departments, etc. This means your internal procedures will also be on a top level.

Want to give value with your data analysis report? It’s critical to master the skill of writing a quality data analytics report. Want to know how to report on data efficiently? We’ll share our secret in the following section.

Start with an Outline
Make a Selection of Vital KPIs
Pick the Right Charts for Appealing Design
Use a Narrative
Organize the Information
Include a Summary
Careful with Your Recommendations
Double-Check Everything
Use Interactive Dashboards

1. Start with an Outline

If you start writing without having a clear idea of what your data analysis report is going to include, it may get messy. Important insights may slip through your fingers, and you may stray away too far from the main topic. To avoid this, start the report by writing an outline first. Plan the structure and contents of each section first to make sure you’ve covered everything, and only then start crafting the report.

2. Make a Selection of Vital KPIs

Don’t overwhelm the audience by including every single metric there is. You can discuss your whole dashboard in a meeting with your team, but if you’re creating data analytics reports or marketing reports for other departments or the executives, it’s best to focus on the most relevant KPIs that demonstrate the data important for the overall business performance.

PRO TIP: How Well Are Your Marketing KPIs Performing?

Like most marketers and marketing managers, you want to know how well your efforts are translating into results each month. How much traffic and new contact conversions do you get? How many new contacts do you get from organic sessions? How are your email campaigns performing? How well are your landing pages converting? You might have to scramble to put all of this together in a single report, but now you can have it all at your fingertips in a single Databox dashboard.

Our Marketing Overview Dashboard includes data from Google Analytics 4 and HubSpot Marketing with key performance metrics like:

Sessions . The number of sessions can tell you how many times people are returning to your website. Obviously, the higher the better.
New Contacts from Sessions . How well is your campaign driving new contacts and customers?
Marketing Performance KPIs . Tracking the number of MQLs, SQLs, New Contacts and similar will help you identify how your marketing efforts contribute to sales.
Email Performance . Measure the success of your email campaigns from HubSpot. Keep an eye on your most important email marketing metrics such as number of sent emails, number of opened emails, open rate, email click-through rate, and more.
Blog Posts and Landing Pages . How many people have viewed your blog recently? How well are your landing pages performing?

Now you can benefit from the experience of our Google Analytics and HubSpot Marketing experts, who have put together a plug-and-play Databox template that contains all the essential metrics for monitoring your leads. It’s simple to implement and start using as a standalone dashboard or in marketing reports, and best of all, it’s free!

marketing_overview_hubspot_ga_dashboard_preview

You can easily set it up in just a few clicks – no coding required.

To set up the dashboard, follow these 3 simple steps:

Step 1: Get the template

Step 2: Connect your HubSpot and Google Analytics 4 accounts with Databox.

Step 3: Watch your dashboard populate in seconds.

3. Pick the Right Charts for Appealing Design

If you’re showing historical data – for instance, how you’ve performed now compared to last month – it’s best to use timelines or graphs. For other data, pie charts or tables may be more suitable. Make sure you use the right data visualization to display your data accurately and in an easy-to-understand manner.

4. Use a Narrative

Do you work on analytics and reporting ? Just exporting your data into a spreadsheet doesn’t qualify as either of them. The fact that you’re dealing with data may sound too technical, but actually, your report should tell a story about your performance. What happened on a specific day? Did your organic traffic increase or suddenly drop? Why? And more. There are a lot of questions to answer and you can put all the responses together in a coherent, understandable narrative.

5. Organize the Information

Before you start writing or building your dashboard, choose how you’re going to organize your data. Are you going to talk about the most relevant and general ones first? It may be the best way to start the report – the best practices typically involve starting with more general information and then diving into details if necessary.

6. Include a Summary

Some people in your audience won’t have the time to read the whole report, but they’ll want to know about your findings. Besides, a summary at the beginning of your data analytics report will help the reader get familiar with the topic and the goal of the report. And a quick note: although the summary should be placed at the beginning, you usually write it when you’re done with the report. When you have the whole picture, it’s easier to extract the key points that you’ll include in the summary.

7. Careful with Your Recommendations

Your communication skills may be critical in data analytics reports. Know that some of the results probably won’t be satisfactory, which means that someone’s strategy failed. Make sure you’re objective in your recommendations and that you’re not looking for someone to blame. Don’t criticize, but give suggestions on how things can be improved. Being solution-oriented is much more important and helpful for the business.

8. Double-Check Everything

The whole point of using data analytics tools and data, in general, is to achieve as much accuracy as possible. Avoid manual mistakes by proofreading your report when you finish, and if possible, give it to another person so they can confirm everything’s in place.

9. Use Interactive Dashboards

Using the right tools is just as important as the contents of your data analysis. The way you present it can make or break a good report, regardless of how valuable the data is. That said, choose a great reporting tool that can automatically update your data and display it in a visually appealing manner. Make sure it offers streamlined interactive dashboards that you can also customize depending on the purpose of the report.

To wrap up the guide, we decided to share nine excellent examples of what awesome data analysis reports can look like. You’ll learn what metrics you should include and how to organize them in logical sections to make your report beautiful and effective.

Marketing Data Analysis Report Example

SEO Data Analysis Report Example

Sales data analysis report example.

Customer Support Data Analysis Report Example

Help Desk Data Analysis Report Example

Ecommerce data analysis report example, project management data analysis report example, social media data analysis report example, financial kpi data analysis report example, marketing data report example.

If you need an intuitive dashboard that allows you to track your website performance effortlessly and monitor all the relevant metrics such as website sessions, pageviews, or CTA engagement, you’ll love this free HubSpot Marketing Website Overview dashboard template .

Tracking the performance of your SEO efforts is important. You can easily monitor relevant SEO KPIs like clicks by page, engaged sessions, or views by session medium by downloading this Google Organic SEO Dashboard .

How successful is your sales team? It’s easy to analyze their performance and predict future growth if you choose this HubSpot CRM Sales Analytics Overview dashboard template and track metrics such as average time to close the deal, new deals amount, or average revenue per new client.

Customer Support Analysis Data Report Example

Customer support is one of the essential factors that impact your business growth. You can use this streamlined, customizable Customer Success dashboard template . In a single dashboard, you can monitor metrics such as customer satisfaction score, new MRR, or time to first response time.

Other than being free and intuitive, this HelpScout for Customer Support dashboard template is also customizable and enables you to track the most vital metrics that indicate your customer support agents’ performance: handle time, happiness score, interactions per resolution, and more.

Is your online store improving or failing? You can easily collect relevant data about your store and monitor the most important metrics like total sales, orders placed, and new customers by downloading this WooCommerce Shop Overview dashboard template .

Does your IT department need feedback on their project management performance? Download this Jira dashboard template to track vital metrics such as issues created or resolved, issues by status, etc. Jira enables you to gain valuable insights into your teams’ productivity.

Project Management Data Analysis Report Example

Need to know if your social media strategy is successful? You can find that out by using this easy-to-understand Social Media Awareness & Engagement dashboard template . Here you can monitor and analyze metrics like sessions by social source, track the number of likes and followers, and measure the traffic from each source.

Social Media Data Analysis Report Example

Tracking your finances is critical for keeping your business profitable. If you want to monitor metrics such as the number of open invoices, open deals amount by stage by pipeline, or closed-won deals, use this free QuickBooks + HubSpot CRM Financial Performance dashboard template .

Financial KPI Data Analysis Report Example

Rely on Accurate Data with Databox

“I don’t have time to build custom reports from scratch.”

“It takes too long and becomes daunting very soon.”

“I’m not sure how to organize the data to make it effective and prove the value of my work.”

Does this sound like you?

Well, it’s something we all said at some point – creating data analytics reports can be time-consuming and tiring. And you’re still not sure if the report is compelling and understandable enough when you’re done.

That’s why we decided to create Databox dashboards – a world-class solution for saving your money and time. We build streamlined and easy-to-follow dashboards that include all the metrics that you may need and allow you to create custom ones if necessary. That way, you can use templates and adjust them to any new project or client without having to build a report from scratch.

You can skip the setup and get your first dashboard for free in just 24 hours, with our fantastic customer support team on the line to assist you with the metrics you should track and the structure you should use.

Enjoy crafting brilliant data analysis reports that will improve your business – it’s never been faster and more effortless. Sign up today and get your free dashboard in no time.

Databox Benchmarks
Future Value Calculator
ROI Calculator
Return On Ads Calculator
Percentage Growth Rate Calculator
Report Automation
Client Reporting
What is a KPI?
Google Sheets KPIs
Sales Analysis Report
Shopify Reports
Data Analysis Report
Google Sheets Dashboard
Best Dashboard Examples
Analysing Data
Marketing Agency KPIs
Automate Agency Google Ads Report
Marketing Research Report
Social Media Dashboard Examples
Ecom Dashboard Examples

Does Your Performance Stack Up?

Are you maximizing your business potential? Stop guessing and start comparing with companies like yours.

A Message From Our CEO

At Databox, we’re obsessed with helping companies more easily monitor, analyze, and report their results. Whether it’s the resources we put into building and maintaining integrations with 100+ popular marketing tools, enabling customizability of charts, dashboards, and reports, or building functionality to make analysis, benchmarking, and forecasting easier, we’re constantly trying to find ways to help our customers save time and deliver better results.

Do you want an All-in-One Analytics Platform?

Hey, we’re Databox. Our mission is to help businesses save time and grow faster. Click here to see our platform in action.

Stefana Zarić is a freelance writer & content marketer. Other than writing for SaaS and fintech clients, she educates future writers who want to build a career in marketing. When not working, Stefana loves to read books, play with her kid, travel, and dance.

Get practical strategies that drive consistent growth

12 Tips for Developing a Successful Data Analytics Strategy

What Is Data Reporting and How to Create Data Reports for Your Business

What is kpi reporting kpi report examples, tips, and best practices.

Build your first dashboard in 5 minutes or less

Latest from our blog

Landing Page Best Practices for B2B SaaS and Tech Companies May 31, 2024
How Databox University Supports Employee Personal Growth in 7 Key Areas of Life May 30, 2024
Metrics & KPIs
vs. Tableau
vs. Looker Studio
vs. Klipfolio
vs. Power BI
vs. Whatagraph
vs. AgencyAnalytics
Product & Engineering
Inside Databox
Terms of Service
Privacy Policy
Talent Resources
We're Hiring!
Help Center
API Documentation

Practice And Learn Excel Online For Free

Data analyst practice test number 1.

Post published: July 28, 2022

This is an Excel Data Analyst exam, you will be challenged to solve various data analysis issues that Excel Data Analysts face in their everyday work!

You will be using functions such as:

And more…

You can view the answers in the Solution tab! 🙂

Exam level – Intermediate-Advanced

If you prefer to work on this exam using regular Excel – click here to download the Data Analyst Practice exam no. 1

Looking to be a pro? Check Out Coursera’s Google Data Analytics Professional Certificate Here

Having an issue with the formulas' language? check out this post

Yes, that would be helpful!
I don't know
Intermediate

Terms and Conditions - Privacy Policy

An official website of the United States government

The .gov means it’s official. Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

The site is secure. The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Publications
Account settings

Preview improvements coming to the PMC website in October 2024. Learn More or Try it out now .

Advanced Search
Journal List
Can J Hosp Pharm
v.68(4); Jul-Aug 2015

Creating a Data Analysis Plan: What to Consider When Choosing Statistics for a Study

There are three kinds of lies: lies, damned lies, and statistics. – Mark Twain 1

INTRODUCTION

Statistics represent an essential part of a study because, regardless of the study design, investigators need to summarize the collected information for interpretation and presentation to others. It is therefore important for us to heed Mr Twain’s concern when creating the data analysis plan. In fact, even before data collection begins, we need to have a clear analysis plan that will guide us from the initial stages of summarizing and describing the data through to testing our hypotheses.

The purpose of this article is to help you create a data analysis plan for a quantitative study. For those interested in conducting qualitative research, previous articles in this Research Primer series have provided information on the design and analysis of such studies. 2 , 3 Information in the current article is divided into 3 main sections: an overview of terms and concepts used in data analysis, a review of common methods used to summarize study data, and a process to help identify relevant statistical tests. My intention here is to introduce the main elements of data analysis and provide a place for you to start when planning this part of your study. Biostatistical experts, textbooks, statistical software packages, and other resources can certainly add more breadth and depth to this topic when you need additional information and advice.

TERMS AND CONCEPTS USED IN DATA ANALYSIS

When analyzing information from a quantitative study, we are often dealing with numbers; therefore, it is important to begin with an understanding of the source of the numbers. Let us start with the term variable , which defines a specific item of information collected in a study. Examples of variables include age, sex or gender, ethnicity, exercise frequency, weight, treatment group, and blood glucose. Each variable will have a group of categories, which are referred to as values , to help describe the characteristic of an individual study participant. For example, the variable “sex” would have values of “male” and “female”.

Although variables can be defined or grouped in various ways, I will focus on 2 methods at this introductory stage. First, variables can be defined according to the level of measurement. The categories in a nominal variable are names, for example, male and female for the variable “sex”; white, Aboriginal, black, Latin American, South Asian, and East Asian for the variable “ethnicity”; and intervention and control for the variable “treatment group”. Nominal variables with only 2 categories are also referred to as dichotomous variables because the study group can be divided into 2 subgroups based on information in the variable. For example, a study sample can be split into 2 groups (patients receiving the intervention and controls) using the dichotomous variable “treatment group”. An ordinal variable implies that the categories can be placed in a meaningful order, as would be the case for exercise frequency (never, sometimes, often, or always). Nominal-level and ordinal-level variables are also referred to as categorical variables, because each category in the variable can be completely separated from the others. The categories for an interval variable can be placed in a meaningful order, with the interval between consecutive categories also having meaning. Age, weight, and blood glucose can be considered as interval variables, but also as ratio variables, because the ratio between values has meaning (e.g., a 15-year-old is half the age of a 30-year-old). Interval-level and ratio-level variables are also referred to as continuous variables because of the underlying continuity among categories.

As we progress through the levels of measurement from nominal to ratio variables, we gather more information about the study participant. The amount of information that a variable provides will become important in the analysis stage, because we lose information when variables are reduced or aggregated—a common practice that is not recommended. 4 For example, if age is reduced from a ratio-level variable (measured in years) to an ordinal variable (categories of < 65 and ≥ 65 years) we lose the ability to make comparisons across the entire age range and introduce error into the data analysis. 4

A second method of defining variables is to consider them as either dependent or independent. As the terms imply, the value of a dependent variable depends on the value of other variables, whereas the value of an independent variable does not rely on other variables. In addition, an investigator can influence the value of an independent variable, such as treatment-group assignment. Independent variables are also referred to as predictors because we can use information from these variables to predict the value of a dependent variable. Building on the group of variables listed in the first paragraph of this section, blood glucose could be considered a dependent variable, because its value may depend on values of the independent variables age, sex, ethnicity, exercise frequency, weight, and treatment group.

Statistics are mathematical formulae that are used to organize and interpret the information that is collected through variables. There are 2 general categories of statistics, descriptive and inferential. Descriptive statistics are used to describe the collected information, such as the range of values, their average, and the most common category. Knowledge gained from descriptive statistics helps investigators learn more about the study sample. Inferential statistics are used to make comparisons and draw conclusions from the study data. Knowledge gained from inferential statistics allows investigators to make inferences and generalize beyond their study sample to other groups.

Before we move on to specific descriptive and inferential statistics, there are 2 more definitions to review. Parametric statistics are generally used when values in an interval-level or ratio-level variable are normally distributed (i.e., the entire group of values has a bell-shaped curve when plotted by frequency). These statistics are used because we can define parameters of the data, such as the centre and width of the normally distributed curve. In contrast, interval-level and ratio-level variables with values that are not normally distributed, as well as nominal-level and ordinal-level variables, are generally analyzed using nonparametric statistics.

METHODS FOR SUMMARIZING STUDY DATA: DESCRIPTIVE STATISTICS

The first step in a data analysis plan is to describe the data collected in the study. This can be done using figures to give a visual presentation of the data and statistics to generate numeric descriptions of the data.

Selection of an appropriate figure to represent a particular set of data depends on the measurement level of the variable. Data for nominal-level and ordinal-level variables may be interpreted using a pie graph or bar graph . Both options allow us to examine the relative number of participants within each category (by reporting the percentages within each category), whereas a bar graph can also be used to examine absolute numbers. For example, we could create a pie graph to illustrate the proportions of men and women in a study sample and a bar graph to illustrate the number of people who report exercising at each level of frequency (never, sometimes, often, or always).

Interval-level and ratio-level variables may also be interpreted using a pie graph or bar graph; however, these types of variables often have too many categories for such graphs to provide meaningful information. Instead, these variables may be better interpreted using a histogram . Unlike a bar graph, which displays the frequency for each distinct category, a histogram displays the frequency within a range of continuous categories. Information from this type of figure allows us to determine whether the data are normally distributed. In addition to pie graphs, bar graphs, and histograms, many other types of figures are available for the visual representation of data. Interested readers can find additional types of figures in the books recommended in the “Further Readings” section.

Figures are also useful for visualizing comparisons between variables or between subgroups within a variable (for example, the distribution of blood glucose according to sex). Box plots are useful for summarizing information for a variable that does not follow a normal distribution. The lower and upper limits of the box identify the interquartile range (or 25th and 75th percentiles), while the midline indicates the median value (or 50th percentile). Scatter plots provide information on how the categories for one continuous variable relate to categories in a second variable; they are often helpful in the analysis of correlations.

In addition to using figures to present a visual description of the data, investigators can use statistics to provide a numeric description. Regardless of the measurement level, we can find the mode by identifying the most frequent category within a variable. When summarizing nominal-level and ordinal-level variables, the simplest method is to report the proportion of participants within each category.

The choice of the most appropriate descriptive statistic for interval-level and ratio-level variables will depend on how the values are distributed. If the values are normally distributed, we can summarize the information using the parametric statistics of mean and standard deviation. The mean is the arithmetic average of all values within the variable, and the standard deviation tells us how widely the values are dispersed around the mean. When values of interval-level and ratio-level variables are not normally distributed, or we are summarizing information from an ordinal-level variable, it may be more appropriate to use the nonparametric statistics of median and range. The first step in identifying these descriptive statistics is to arrange study participants according to the variable categories from lowest value to highest value. The range is used to report the lowest and highest values. The median or 50th percentile is located by dividing the number of participants into 2 groups, such that half (50%) of the participants have values above the median and the other half (50%) have values below the median. Similarly, the 25th percentile is the value with 25% of the participants having values below and 75% of the participants having values above, and the 75th percentile is the value with 75% of participants having values below and 25% of participants having values above. Together, the 25th and 75th percentiles define the interquartile range .

PROCESS TO IDENTIFY RELEVANT STATISTICAL TESTS: INFERENTIAL STATISTICS

One caveat about the information provided in this section: selecting the most appropriate inferential statistic for a specific study should be a combination of following these suggestions, seeking advice from experts, and discussing with your co-investigators. My intention here is to give you a place to start a conversation with your colleagues about the options available as you develop your data analysis plan.

There are 3 key questions to consider when selecting an appropriate inferential statistic for a study: What is the research question? What is the study design? and What is the level of measurement? It is important for investigators to carefully consider these questions when developing the study protocol and creating the analysis plan. The figures that accompany these questions show decision trees that will help you to narrow down the list of inferential statistics that would be relevant to a particular study. Appendix 1 provides brief definitions of the inferential statistics named in these figures. Additional information, such as the formulae for various inferential statistics, can be obtained from textbooks, statistical software packages, and biostatisticians.

What Is the Research Question?

The first step in identifying relevant inferential statistics for a study is to consider the type of research question being asked. You can find more details about the different types of research questions in a previous article in this Research Primer series that covered questions and hypotheses. 5 A relational question seeks information about the relationship among variables; in this situation, investigators will be interested in determining whether there is an association ( Figure 1 ). A causal question seeks information about the effect of an intervention on an outcome; in this situation, the investigator will be interested in determining whether there is a difference ( Figure 2 ).

An external file that holds a picture, illustration, etc.
Object name is cjhp-68-311f1.jpg

Decision tree to identify inferential statistics for an association.

An external file that holds a picture, illustration, etc.
Object name is cjhp-68-311f2.jpg

Decision tree to identify inferential statistics for measuring a difference.

What Is the Study Design?

When considering a question of association, investigators will be interested in measuring the relationship between variables ( Figure 1 ). A study designed to determine whether there is consensus among different raters will be measuring agreement. For example, an investigator may be interested in determining whether 2 raters, using the same assessment tool, arrive at the same score. Correlation analyses examine the strength of a relationship or connection between 2 variables, like age and blood glucose. Regression analyses also examine the strength of a relationship or connection; however, in this type of analysis, one variable is considered an outcome (or dependent variable) and the other variable is considered a predictor (or independent variable). Regression analyses often consider the influence of multiple predictors on an outcome at the same time. For example, an investigator may be interested in examining the association between a treatment and blood glucose, while also considering other factors, like age, sex, ethnicity, exercise frequency, and weight.

When considering a question of difference, investigators must first determine how many groups they will be comparing. In some cases, investigators may be interested in comparing the characteristic of one group with that of an external reference group. For example, is the mean age of study participants similar to the mean age of all people in the target group? If more than one group is involved, then investigators must also determine whether there is an underlying connection between the sets of values (or samples ) to be compared. Samples are considered independent or unpaired when the information is taken from different groups. For example, we could use an unpaired t test to compare the mean age between 2 independent samples, such as the intervention and control groups in a study. Samples are considered related or paired if the information is taken from the same group of people, for example, measurement of blood glucose at the beginning and end of a study. Because blood glucose is measured in the same people at both time points, we could use a paired t test to determine whether there has been a significant change in blood glucose.

What Is the Level of Measurement?

As described in the first section of this article, variables can be grouped according to the level of measurement (nominal, ordinal, or interval). In most cases, the independent variable in an inferential statistic will be nominal; therefore, investigators need to know the level of measurement for the dependent variable before they can select the relevant inferential statistic. Two exceptions to this consideration are correlation analyses and regression analyses ( Figure 1 ). Because a correlation analysis measures the strength of association between 2 variables, we need to consider the level of measurement for both variables. Regression analyses can consider multiple independent variables, often with a variety of measurement levels. However, for these analyses, investigators still need to consider the level of measurement for the dependent variable.

Selection of inferential statistics to test interval-level variables must include consideration of how the data are distributed. An underlying assumption for parametric tests is that the data approximate a normal distribution. When the data are not normally distributed, information derived from a parametric test may be wrong. 6 When the assumption of normality is violated (for example, when the data are skewed), then investigators should use a nonparametric test. If the data are normally distributed, then investigators can use a parametric test.

ADDITIONAL CONSIDERATIONS

What is the level of significance.

An inferential statistic is used to calculate a p value, the probability of obtaining the observed data by chance. Investigators can then compare this p value against a prespecified level of significance, which is often chosen to be 0.05. This level of significance represents a 1 in 20 chance that the observation is wrong, which is considered an acceptable level of error.

What Are the Most Commonly Used Statistics?

In 1983, Emerson and Colditz 7 reported the first review of statistics used in original research articles published in the New England Journal of Medicine . This review of statistics used in the journal was updated in 1989 and 2005, 8 and this type of analysis has been replicated in many other journals. 9 – 13 Collectively, these reviews have identified 2 important observations. First, the overall sophistication of statistical methodology used and reported in studies has grown over time, with survival analyses and multivariable regression analyses becoming much more common. The second observation is that, despite this trend, 1 in 4 articles describe no statistical methods or report only simple descriptive statistics. When inferential statistics are used, the most common are t tests, contingency table tests (for example, χ 2 test and Fisher exact test), and simple correlation and regression analyses. This information is important for educators, investigators, reviewers, and readers because it suggests that a good foundational knowledge of descriptive statistics and common inferential statistics will enable us to correctly evaluate the majority of research articles. 11 – 13 However, to fully take advantage of all research published in high-impact journals, we need to become acquainted with some of the more complex methods, such as multivariable regression analyses. 8 , 13

What Are Some Additional Resources?

As an investigator and Associate Editor with CJHP , I have often relied on the advice of colleagues to help create my own analysis plans and review the plans of others. Biostatisticians have a wealth of knowledge in the field of statistical analysis and can provide advice on the correct selection, application, and interpretation of these methods. Colleagues who have “been there and done that” with their own data analysis plans are also valuable sources of information. Identify these individuals and consult with them early and often as you develop your analysis plan.

Another important resource to consider when creating your analysis plan is textbooks. Numerous statistical textbooks are available, differing in levels of complexity and scope. The titles listed in the “Further Reading” section are just a few suggestions. I encourage interested readers to look through these and other books to find resources that best fit their needs. However, one crucial book that I highly recommend to anyone wanting to be an investigator or peer reviewer is Lang and Secic’s How to Report Statistics in Medicine (see “Further Reading”). As the title implies, this book covers a wide range of statistics used in medical research and provides numerous examples of how to correctly report the results.

CONCLUSIONS

When it comes to creating an analysis plan for your project, I recommend following the sage advice of Douglas Adams in The Hitchhiker’s Guide to the Galaxy : Don’t panic! 14 Begin with simple methods to summarize and visualize your data, then use the key questions and decision trees provided in this article to identify relevant statistical tests. Information in this article will give you and your co-investigators a place to start discussing the elements necessary for developing an analysis plan. But do not stop there! Use advice from biostatisticians and more experienced colleagues, as well as information in textbooks, to help create your analysis plan and choose the most appropriate statistics for your study. Making careful, informed decisions about the statistics to use in your study should reduce the risk of confirming Mr Twain’s concern.

Appendix 1. Glossary of statistical terms * (part 1 of 2)

1-way ANOVA: Uses 1 variable to define the groups for comparing means. This is similar to the Student t test when comparing the means of 2 groups.
Kruskall–Wallis 1-way ANOVA: Nonparametric alternative for the 1-way ANOVA. Used to determine the difference in medians between 3 or more groups.
n -way ANOVA: Uses 2 or more variables to define groups when comparing means. Also called a “between-subjects factorial ANOVA”.
Repeated-measures ANOVA: A method for analyzing whether the means of 3 or more measures from the same group of participants are different.
Freidman ANOVA: Nonparametric alternative for the repeated-measures ANOVA. It is often used to compare rankings and preferences that are measured 3 or more times.
Fisher exact: Variation of chi-square that accounts for cell counts < 5.
McNemar: Variation of chi-square that tests statistical significance of changes in 2 paired measurements of dichotomous variables.
Cochran Q: An extension of the McNemar test that provides a method for testing for differences between 3 or more matched sets of frequencies or proportions. Often used as a measure of heterogeneity in meta-analyses.
1-sample: Used to determine whether the mean of a sample is significantly different from a known or hypothesized value.
Independent-samples t test (also referred to as the Student t test): Used when the independent variable is a nominal-level variable that identifies 2 groups and the dependent variable is an interval-level variable.
Paired: Used to compare 2 pairs of scores between 2 groups (e.g., baseline and follow-up blood pressure in the intervention and control groups).

Lang TA, Secic M. How to report statistics in medicine: annotated guidelines for authors, editors, and reviewers. 2nd ed. Philadelphia (PA): American College of Physicians; 2006.

Norman GR, Streiner DL. PDQ statistics. 3rd ed. Hamilton (ON): B.C. Decker; 2003.

Plichta SB, Kelvin E. Munro’s statistical methods for health care research . 6th ed. Philadelphia (PA): Wolters Kluwer Health/ Lippincott, Williams & Wilkins; 2013.

This article is the 12th in the CJHP Research Primer Series, an initiative of the CJHP Editorial Board and the CSHP Research Committee. The planned 2-year series is intended to appeal to relatively inexperienced researchers, with the goal of building research capacity among practising pharmacists. The articles, presenting simple but rigorous guidance to encourage and support novice researchers, are being solicited from authors with appropriate expertise.

Previous articles in this series:

Bond CM. The research jigsaw: how to get started. Can J Hosp Pharm . 2014;67(1):28–30.
Tully MP. Research: articulating questions, generating hypotheses, and choosing study designs. Can J Hosp Pharm . 2014;67(1):31–4.
Loewen P. Ethical issues in pharmacy practice research: an introductory guide. Can J Hosp Pharm. 2014;67(2):133–7.
Tsuyuki RT. Designing pharmacy practice research trials. Can J Hosp Pharm . 2014;67(3):226–9.
Bresee LC. An introduction to developing surveys for pharmacy practice research. Can J Hosp Pharm . 2014;67(4):286–91.
Gamble JM. An introduction to the fundamentals of cohort and case–control studies. Can J Hosp Pharm . 2014;67(5):366–72.
Austin Z, Sutton J. Qualitative research: getting started. C an J Hosp Pharm . 2014;67(6):436–40.
Houle S. An introduction to the fundamentals of randomized controlled trials in pharmacy research. Can J Hosp Pharm . 2014; 68(1):28–32.
Charrois TL. Systematic reviews: What do you need to know to get started? Can J Hosp Pharm . 2014;68(2):144–8.
Sutton J, Austin Z. Qualitative research: data collection, analysis, and management. Can J Hosp Pharm . 2014;68(3):226–31.
Cadarette SM, Wong L. An introduction to health care administrative data. Can J Hosp Pharm. 2014;68(3):232–7.

Competing interests: None declared.

Data Analysis in Excel (A Comprehensive Guideline)

In this article, we will learn how to analyze data in Excel , including:

Different Excel functions, such as VLOOKUP , INDEX-MATCH , SUMIFS , CONCAT , and LEN functions.
Using Excel charts – learn how to create various chart types, customize them, and interpret the insights they offer, and how to apply conditional formatting effectively for data analysis purposes.
Creating pivot tables, performing calculations, and generating insightful reports.
Using Excel’s sorting and filtering capabilities.
The What-If Analysis feature in Excel and explore different scenarios by changing input values and observing the resulting outputs.
Implementing data validation techniques to maintain data accuracy.
The benefits of using tables and the built-in Analyze Data feature in Excel , which provides insights and recommendations based on your data.
Introducing the Analysis ToolPak add-in, which offers a wide range of statistical functions and tools, including descriptive analysis and ANOVA ( Analysis of Variance ).

Let’s use the following dataset as a demonstration of analyzing data in Excel.

Download Practice Workbook

Download the workbook and practice.

Analyze Data in Excel.xlsx

How to Analyze Data in Excel

Method 1 – use excel functions to analyze data.

Case 1.1 – The VLOOKUP Function

The VLOOKUP function is a frequently used function for looking up any particular data from a dataset. In the following example, we want to know how many goals an individual (for instance, Alex ) has scored.

The formula in cell F5 is

Here, Excel is looking for the value in cell E5 within the range B5:C14 and retrieving the corresponding value from the second column of that range.

Case 1.2 – INDEX and MATCH Functions

The formula in this case is:

Formula Breakdown

MATCH(E5, B5:B14, 0) → The MATCH function searches for the value in cell E5 within the range B5:B14 . The 0 as the third argument indicates an exact match. Output: 1

Case 1.3 – The SUMIFS Function

The SUMIFS function gets the sum of a range of cells with a set of conditions.

If you want to get the goals scored by the players from Group A and Group B separately, the formula you can use in cell G5 is:

The formula sums the values in the range $D$5:$D$14 but only includes values where the corresponding cells in the range $C$5:$C$14 match the value in cell F5 .

Case 1.4 – The CONCAT Function

Let’s join the first and last names of certain individuals here using the CONCAT function in Excel .

The formula in cell D5 is:

The formula joins the values in cells B5 and C5 , with a space between them, resulting in a single combined text string.

Case 1.5 – The LEN Function

You can count the number of characters of a cell or an array using the LEN function .

The formula in cell E5 is:

Method 2 – Data Analysis Using Excel Charts

Select the range F4:G6.
Go to the Insert tab and select any column chart .

Excel will create a column chart for you.

Method 3 – Apply Conditional Formatting to Analyze Data

Select the dataset in the range C5:C14.
Go to the Home tab and choose Conditional Formatting, then select a set of Data Bars .

Excel will add data bars.

Method 4 – A Pivot Table

Let’s calculate the number of goals scored by Group 1 and Group 2 players using the Pivot Table .

Select the dataset in range B4:B14.
Go to the Insert tab and select PivotTable .

A box will appear. We have chosen a New Worksheet as the destination of the Pivot Table .

Drag the fields in the areas ( Group in Rows and Goal in Values ) shown in the image.

Excel calculates the sum of goals.

Method 5 – Sorting Data in Excel

Suppose you want to sort the dataset in a descending order ( Largest to Smallest ).

Select the range C5:C14.
Go to the Data tab and select the Sort Z to A icon for descending order.

Select Expand the selection option from the warning window.

Your data will be sorted.

Method 6 – Filtering Data in Excel

Suppose you want to see the performance of the players of Group A .

Select range B4:D14 .
Go to the Data tab and activate the Filter option.

Filter your dataset from the drop-down icon in the column heading. We have selected Group A in the Group column.

Excel will get the list of all Group A players and their performance.

Method 7 – Excel What-If Analysis Feature

What-If Analysis in Excel refers to a set of tools and techniques that allow you to explore different scenarios and observe the potential impact on the results of your formulas or models. Excel provides several features for performing what-if analysis, including:

Data Tables: Data Tables allow you to create a table displaying multiple results based on input values. You can perform either one-variable or two-variable data tables to see how changing inputs affect the final results.
Goal Seek: Goal Seek helps you determine the input value needed to achieve a specific result. You specify a target value, and Excel automatically adjusts the input value until it reaches the desired outcome.
Scenario Manager: Scenario Manager enables you to create and compare different sets of input values for your model. You can define multiple scenarios with varying inputs and switch between them to see the impact on the calculated results.

We will show an example of the Goal Seek feature. Suppose you have 100 units of a product to be sold. You want to see the necessary unit price if you want to get a revenue of $200 .

The formula in C6 is:

This is very simple as we all know that the unit price will have to be $2 . However, the fun with this Goal Seek feature is that you do not have to manually put the unit price. Rather, Excel will find it for you.

Go to the Data tab and select What-If Analysis , then select Goal Seek .

You want the revenue ( To value ) to be $ 200 and get the unit price in cell C5 . So, the Set cell is C6 and the cell for By changing cell is C5 . Put those values in the dialog box and click OK .

Excel will put the unit price in C5 . Put the Revenue in the currency format if you want.
Modify the Units Sold value and repeat the process to see how it affects the result.

Read More: How to Perform Case Study Using Excel Data Analysis

Method 8 – Data Validation

Let’s get back to our previous example (from the VLOOKUP section). We want to select a player’s name from all the available options rather than manually typing their names.

Select cell E5.
Go to the Data tab and select the Data Validation option.

A Data Validation box will pop up. Choose List in the Allow field.
Set the source to =$B$5:$B$14 .

You can now select the names from the drop-down icon.

Once you select a name, you will get the number of goals the player scored.

Method 9 – Excel Table

Select the dataset in range D5:D14.
Press CTRL + T.

Excel will create a table.

Let’s see how you can get the total goals scored by these players without using any Excel Function .

Click on any cell of the table.
Go to the Table Design tab (this tab will be seen only if you select a cell of the table first).
Select Table Style Options and check the Total Row box.

Excel shows the total goals scored.

Read More: How to Analyse Qualitative Data from a Questionnaire in Excel

Method 10 – The Analyze Data Feature

Add this feature to your ribbon. Put the cursor on the Home ribbon and right-click, then select Customize the Ribbon .

Select New Group and set its position on the Home ribbon.
Select All Commands and add Analyze Data to this newly created group.

Go to the Home tab and select Analyze Data .

Excel-recommended options in Analyzing Data feature

Excel will recommend several options for data analysis.

Method 11 – Using the Analysis ToolPak Add-in

Go to the File tab and select Options . The Excel Options box will open.
Go to Add-ins and select Excel Add-ins in the Manage field, then click Go .

Check the box for Analysis ToolPak and click OK .

Let’s do some analysis using this add-in.

Read More: How to Convert Qualitative Data to Quantitative Data in Excel

Descriptive Analysis with the ToolPak

Select range C5:C14.
Go to the Data tab and select Data Analysis (This will be available once you activate the Analysis ToolPak add-in).

A Data Analysis box will pop up. Select the Descriptive Statistics option and click OK .

Set the input range and the output range and click OK . Check Summary statistics .

You will get the descriptive statistics of the selected input range in your Excel workbook.

Read More: How to Make Histogram Using Analysis ToolPak

ANOVA Analysis in Excel with ToolPak

ANOVA stands for Analysis of Variance . It is a statistical method used to compare the means of two or more groups to determine if there are any significant differences between them.

Go to the Data tab and select Data Analysis .
Select ANOVA from the Data Analysis box and click on OK .

Set the input and output ranges.

Excel will perform the analysis for you.

Read More: How to Analyze Data in Excel Using Pivot Tables

Things to Remember

Data Validation ensures accuracy.
The INDEX-MATCH function is better than the VLOOKUP function.
You need to refresh the Pivot Table when you change your dataset.

Frequently Asked Questions

1. What are the advantages of using the Analyze Data feature in Excel over manual analysis techniques?

Advantages of using the Analyze Data feature in Excel over manual analysis techniques include saving time by automating tasks, an easy-to-use interface, lots of helpful tools and functions, the ability to customize, and working well with other Excel features.

2. What is the difference between descriptive and inferential statistics?

Descriptive statistics help describe data by summarizing it while inferential statistics help make predictions about a larger group based on a smaller sample.

3. What are the uses of ANOVA?

ANOVA is used to compare the averages of different groups, see how categorical variables affect outcomes, analyze experiments, and understand different sources of variation in data.

Analyze Data in Excel: Knowledge Hub

How to Install Data Analysis in Excel
How to Use Data Analysis Toolpak in Excel
How to Enter Data for Analysis in Excel
How to Use Analyze Data in Excel
[Fixed!] Data Analysis Not Showing in Excel
How to Analyze Raw Data in Excel
How to Analyze Large Data Sets in Excel
How to Analyze Text Data in Excel
How to Analyze Time Series Data in Excel
How to Analyze Sales Data in Excel
How to Analyze Likert Scale Data in Excel
How to Analyze qPCR Data in Excel
How to Analyze Time-Scaled Data in Excel
How to Analyze Quantitative Data in Excel
How to Analyze Qualitative Data in Excel
Organize Data in Excel: A Complete Guide
Rearranging in Excel
How to Add Tags in Excel?
How to Summarize Data in Excel
Quick Analysis Tool Excel

<< Go Back to Learn Excel

What is ExcelDemy?

Tags: Learn Excel

AKIB BIN RASHID, a materials and metallurgical engineer, is passionate about delving into Excel and VBA programming. To him, programming is a valuable time-saving tool for managing data, files, and internet-related tasks. Proficient in MS Office, AutoCAD, Excel, and VBA, he goes beyond the fundamentals. Holding a B.Sc in Materials and Metallurgical Engineering from Bangladesh University of Engineering and Technology, MD AKIB has transitioned into a content development role. Specializing in creating technical content centred around Excel and... Read Full Bio

Saved searches

Use saved searches to filter your results more quickly.

To see all available qualifiers, see our documentation .

Notifications You must be signed in to change notification settings

Find all ExcelR Data Analyst Assignment Solution Here 1. Advanced Excel 2. MySQL 3. Python 4. Tableau 5. Power BI

shanuhalli/Data-Analyst-Assignment

Folders and files, repository files navigation, excelr data analyst assignments, 1. advanced excel.

For more details on the Advanced Excel assignment, check out the Advanced Excel Assignment section.

For more details on the MySQL assignment, check out the MySQL Assignment section.

For more details on the Python assignment, check out the Advanced Excel Assignment section.

For more details on the Tableau assignment, check out the Tableau Public Profile .

Coming soon...

Jupyter Notebook 100.0%

Thrive On: The Campaign for Utica University → Delays Due to Weather → -->

Female data analyst discussing insights with colleagues in a meeting room.

What Does a Data Analyst Do? A Job Breakdown

April 22, 2024

The surge of digitalization across industries has sparked a revolution in how companies understand and use their data. At the heart of this revolution are data analysts, the professionals who meticulously explore, process, and interpret data to unearth valuable insights.

If you’re looking to ride this wave, understanding the ins and outs of a data analyst's role is crucial. In this article, we’ll shed light on what a data analyst does, the skills they need to be effective, and the industries in need of their services.

First, what is data analysis?

Data analysis is the foundation upon which modern business strategies are built. It involves scrutinizing, cleansing, transforming, and modeling data with the goal of discovering useful information to support decision-making.

In the current business landscape, data analysis acts as a compass, guiding companies through complex market dynamics, customer preferences, and operational efficiency. The insights gleaned from data analysis enable businesses to innovate and stay competitive in their respective markets.

What does a data analyst do?

In the simplest terms, a data analyst's core responsibility is to make sense of data. They collect, process, and model data to discover trends and patterns, which ultimately guide strategic business decisions. The data analyst role is interdisciplinary in that it sits at the intersection of statistics, technology, and business acumen.

Common data analyst duties:

Data collection: Data analysts often develop and manage databases, collecting raw data from various internal and external sources.
Data cleaning and transformation: The gathered data is often plagued with errors and redundancies. Data analysts clean and transform it into a reliable format for analysis.
Data modeling: Using statistical methods, analysts build models that can predict outcomes or uncover trends in the data.
Data visualization: Once meaningful trends are discovered, analysts create reports and dashboards to visualize them for stakeholders using tools like Tableau or Power BI.
Data insights: The final and arguably most important step is to extract actionable insights from the data and present these findings in a business context.

What skills do data analysts need to succeed?

The data analyst job description demands a unique combination of skills that span both technical prowess and interpersonal abilities. To excel in this position, one must have a deep understanding of data manipulation tools and programming languages. Equally important are the soft skills that support the technical aspects of the job.

Let's take a closer look at some important data analyst skills:

Technical skills for data analysts:

Statistics: A firm grasp of statistical techniques is non-negotiable. It includes understanding distributions, hypothesis testing, and regression analysis.
Data tools: Mastering tools like R, Python, and SQL is pivotal for managing, analyzing, and reporting on data.
Database knowledge: Familiarity with database technologies (relational or NoSQL) is important for extracting and managing data.
Machine learning: An understanding of machine learning models can elevate data predictions and insights.

Transferrable skills for data analysts:

Analytical thinking: The ability to break down complex problems and execute projects with precise logic is important for this role.
Communication: Data analysts must be adept at translating technical findings into actionable insights for non-technical stakeholders.
Teamwork: Collaborating with team members on various aspects of data projects is common, so strong collaboration skills are key.
Curiosity: A thirst for understanding the 'why' behind the data trends is essential for continuous learning and improved analysis.

Where do data analysts work?

An appealing aspect of a career in data analytics is its extreme versatility and broad applicability. Virtually every organization has access to some type of data that could help inform business decisions.

This universal need for data-driven insights opens up a variety of opportunities for professionals in the field. In fact, jobs in this sector are projected to grow at a rate much faster than average, according to the U.S. Bureau of Labor Statistics.

While every field could benefit from this role, there are some that are especially reliant on data analyst skills. Here’s a look at five industries where data analysts are in high demand.

Top industries for data analysts:

Finance: In banking and investment firms, data analysts assess financial risks and predict market trends. They utilize advanced statistical tools and models to help make data-driven decisions about where to allocate resources and how to mitigate potential financial losses.
Healthcare: Data analysts mine health records to identify patterns and trends that can significantly improve patient outcomes and increase operational efficiencies. This analytical approach enables healthcare providers to offer more personalized and effective care, ultimately enhancing the overall quality of healthcare services.
Retail: Analytics play a crucial role in deciphering customer behavior patterns, optimizing inventory levels, and accurately forecasting future sales. By analyzing vast amounts of data, retailers can make informed decisions about product placement, promotional strategies, and stock management.
Supply Chain: Data analysts are critical for optimizing logistics and operations. They use data to forecast demand, manage inventory levels, streamline shipping and delivery processes, and minimize costs. By analyzing supplier performance, transportation patterns, and warehouse efficiency, they help ensure that products move efficiently from point A to point B.
Energy: In the energy sector, data analysts help predict consumption patterns, optimize resource allocation, and enhance sustainability efforts. By analyzing data from a variety of sources, they inform strategies for renewable energy development, efficient grid management, and reducing environmental impact.

Build a career in data analytics

The role of a data analyst is vital in today’s landscape where data is an indispensable asset. It's a career that offers both high demand and high job satisfaction for those with an analytical mind and appetite for continual learning.

If this describes you and you’re interested in capitalizing on the demand for data analysts, take the first step on this promising career path by exploring education opportunities. Utica University offers industry-relevant online programs that blend academic coursework with practical, hands-on learning experiences – all administered by faculty who are experts in the industry.

Learn more by exploring our Business Data Analytics program today!

Ready to Take the Next Step?

Request Information

We're excited you're interested in learning more about Utica University. Please fill out and submit the form, and we'll be in touch shortly!

I authorize Utica University and its representatives to contact me via SMS or phone. By submitting this form, I am providing my consent. Message and data rates may apply.

I would like to see logins and resources for:

For a general list of frequently used logins, you can also visit our logins page .

An official website of the United States government Here's how you know

Official websites use .gov A .gov website belongs to an official government organization in the United States.

Secure .gov websites use HTTPS A lock ( Lock A locked padlock ) or https:// means you’ve safely connected to the .gov website. Share sensitive information only on official, secure websites.

Freight Analysis Framework Modernization: Overview & Feedback*

June 25, 2024 at 3:00 PM Eastern

The Bureau of Transportation Statistics (BTS) is developing two new products as part of the Freight Analysis Framework (FAF) commodity flow data program: a multimodal assignment and flow visualization tool and a county-level origin-destination flow database.

BTS will share updates on these products and on its plans to modernize the FAF forecasting process. Presenters will seek input from attendees to shape future product features for the benefit of freight data users across the United States.

*The webinar will cover similar concepts as what BTS presented at Transportation Review Board in January 2024. **Click here to sign up for future BTS freight data announcements.

This presentation will be moderated by Stephanie Lawrence, Director of the Office of Statistical and Economic Analysis.

To register for the webinar, please click HERE .

Presentations:

Ongoing Improvements to the FAF by Monique Stinson
FAF Multimodal Network Development and Flow Visualization Tools by Laura Dods

Snapshot of Freight flows by highway, railway, and waterway map; and the routing tool

Assigning Analysis Authorization 0BI_ALL for All Data

After completing this lesson, you will be able to:

Assign analysis authorization 0BI_ALL for all data

Business Example

Person holding tablet thinking about the customer need to integrate analysis authorizations and the need to granting users access to all data particularly for testing purposes.

This lesson covers a common scenario where a client integrates analysis authorizations but needs a straightforward method for granting users access to all data, such as for testing purposes.

Analysis Authorization for All Data

When a query is initially executed, the check by the authorization objects S_RS_COMP and S_RS_COMP1 must be successful.

If the InfoProvider on which the query is based contains characteristics flagged as authorization-relevant, the user needs analysis authorizations for the related authorization-relevant data.

For simple analysis authorization requirements, or when a user needs analysis authorizations for all data, for example for test purposes, there is the Special Analysis Authorization: 0BI_ALL.

Screenshot of the configuration of the authorization 0BI_ALL is expanded into 2 seperate screenshots 1. A screenshof of the characteristic 0CO_AREA which is expanded with authorization-relevant setting highlighted. 2.A screenshot of the navigation attribute P_CATEG is expanded with authorization-relevant setting highlighted.

In the figure shown, we can see the details of the analysis authorization of 0BI_ALL .

The analysis authorization 0BI_ALL consists of all characteristics and navigation attributes flagged as authorization relevant and authorized for all values (*), grants access to all authorization-relevant data.
0BI_ALL is a generated analysis authorization that cannot be edited manually. It is automatically updated whenever a characteristic or navigation attribute is set to authorization relevant.

Occasionally, the automatic update fails. Then you can perform the update manually. On the Management of Analysis Authorizations screen (transaction code RSECADMIN ), choose Extras → Update Authorization 0BI_ALL .

A screenshot of the 0BI_ALL Display with a process diagram expanded to show 2 ways in which the 0BI_ALL can be assigned to users

As shown here, analysis authorizations, and thus also special analysis authorization 0BI_ALL , can be assigned to the users in one the following ways:

Directly with the user assignment in transaction RSECADMIN
Recommended Method: Indirectly by using the authorization object, S_RS_AUTH and assigning it to a profile and then the profile to a role.

Analysis Authorization Assignment by Role

Screenshots showing the authorization object S_RS_AUTH and an example of its use in a role

As shown in this figure the authorization object S_RS_AUTH can include analysis authorizations in roles. It consists of only one field BIAUTH where previously defined analysis authorizations are to be entered. The generated special analysis authorization 0BI_ALL can also be entered here. The following are the steps to assign an analysis authorization to a role:

Use the transaction PFCG ( Role Maintenance ), create a new role or enhance an existing one, and add the authorization object S_RS_AUTH into the profile of the role.
In the field BIAUTH of this newly created authorization object, you can use the input help button to select the relevant previously defined or generated analysis authorizations.
The role must be assigned to users using the PFCG transaction (Role Maintenance) on the User tab or the SU01 transaction (User Maintenance) on the Roles tab.

Creating a Basic Role for a Reporting User

In the following three videos, a new reporting user is created and gradually provided with authorizations so that this reporting user can execute a BW query and display data.

The Student11 user has all authorizations in our SAP BW/4HANA system and acts as a user and authorization administrator. It therefore creates new users and roles and assigns authorizations.

1. Granting Authorization to Connect to SAP BW/4HANA

In this video, you learn how to grant authorization to connect to SAP BW/4HANA using SAP Analysis for Microsoft Excel to a reporting user.

2. Maintaining the Authorization Settings for Executing Queries

In this video, you watch a demonstration that shows how to maintain the authorization settings for executing queries.

3. Addressing and Troubleshooting Authorization Settings

In this video, you watch a demonstration that shows how to troubleshoot and address the error message shown in video 2.

America’s best decade, according to data

One simple variable, more than anything, determines when you think the nation peaked.

How do you define the good old days?

Department of Data

The plucky poll slingers at YouGov, who are consistently willing to use their elite-tier survey skills in service of measuring the unmeasurable, asked 2,000 adults which decade had the best and worst music, movies, economy and so forth, across 20 measures . But when we charted them, no consistent pattern emerged.

We did spot some peaks: When asked which decade had the most moral society, the happiest families or the closest-knit communities, White people and Republicans were about twice as likely as Black people and Democrats to point to the 1950s. The difference probably depends on whether you remember that particular decade for “Leave it to Beaver,” drive-in theaters and “12 Angry Men” — or the Red Scare, the murder of Emmett Till and massive resistance to school integration.

“This was a time when Repubs were pretty much running the show and had reason to be happy,” pioneering nostalgia researcher Morris Holbrook told us via email. “Apparently, you could argue that nostalgia is colored by political preferences. Surprise, surprise.”

And he’s right! But any political, racial or gender divides were dwarfed by what happened when we charted the data by generation. Age, more than anything, determines when you think America peaked.

So, we looked at the data another way, measuring the gap between each person’s birth year and their ideal decade. The consistency of the resulting pattern delighted us: It shows that Americans feel nostalgia not for a specific era, but for a specific age.

The good old days when America was “great” aren’t the 1950s. They’re whatever decade you were 11, your parents knew the correct answer to any question, and you’d never heard of war crimes tribunals, microplastics or improvised explosive devices. Or when you were 15 and athletes and musicians still played hard and hadn’t sold out.

Not every flavor of nostalgia peaks as sharply as music does. But by distilling them to the most popular age for each question, we can chart a simple life cycle of nostalgia.

The closest-knit communities were those in our childhood, ages 4 to 7. The happiest families, most moral society and most reliable news reporting came in our early formative years — ages 8 through 11. The best economy, as well as the best radio, television and movies, happened in our early teens — ages 12 through 15.

GET CAUGHT UP Summarized stories to quickly stay informed

Gaza pier suspension another blow to troubled U.S. aid mission

Mother of Jan. 6 officer ‘swatted’ at Va. home hours after son criticized Trump

D.C. tourism surges past pre-pandemic levels, report shows

Icelandic volcano spews lava in what could be its fiercest eruption

These science-based tips can help you manage conflicts in relationships

Slightly spendier activities such as fashion, music and sporting events peaked in our late teens — ages 16 through 19 — matching research from the University of South Australia’s Ehrenberg-Bass Institute, which shows music nostalgia centers on age 17 .

YouGov didn’t just ask about the best music and the best economy. The pollsters also asked about the worst music and the worst economy. But almost without exception, if you ask an American when times were worst, the most common response will be “right now!”

This holds true even when “now” is clearly not the right answer. For example, when we ask which decade had the worst economy, the most common answer is today. The Great Depression — when, for much of a decade, unemployment exceeded the what we saw in the worst month of pandemic shutdowns — comes in a grudging second.

To be sure, other forces seem to be at work. Democrats actually thought the current economy wasn’t as bad as the Great Depression. Republicans disagreed. In fact, measure after measure, Republicans were more negative about the current decade than any other group — even low-income folks in objectively difficult situations.

So, we called the brilliant Joanne Hsu, director of the University of Michigan’s Surveys of Consumers who regularly wrestles with partisan bias in polling.

Hsu said that yes, she sees a huge partisan split in the economy, and yes, Republicans are far more negative than Democrats. But it hasn’t always been that way.

“People whose party is in the White House always have more favorable sentiment than people who don’t,” she told us. “And this has widened over time.”

In a recent analysis , Hsu — who previously worked on some of our favorite surveys at the Federal Reserve — found that while partisanship drove wider gaps in economic expectations than did income, age or education even in the George W. Bush and Barack Obama years, they more than doubled under Donald Trump as Republicans’ optimism soared and Democrats’ hopes fell.

Our attitudes reversed almost the instant President Biden took office, but the gap remains nearly as wide. That is to say, if we’d asked the same questions about the worst decades during the Trump administration, Hsu’s work suggests the partisan gap could have shriveled or even flipped eyeglasses over teakettle.

To understand the swings, Hsu and her friends spent the first part of 2024 asking 2,400 Americans where they get their information about the economy. In a new analysis , she found Republicans who listen to partisan outlets are more likely to be negative, and Democrats who listen to their own version of such news are more positive — and that Republicans are a bit more likely to follow partisan news.

But while Fox and friends drive some negativity, only a fifth of Republicans get their economic news from partisan outlets. And Democrats and independents give a thumbs down to the current decade, too, albeit at much lower rates.

There’s clearly something more fundamental at work. As YouGov’s Carl Bialik points out, when Americans were asked last year which decade they’d most want to live in, the most common answer was now. At some level then, it seems unlikely that we truly believe this decade stinks by almost every measure.

A deeper explanation didn’t land in our laps until halfway through a Zoom call with four well-caffeinated Australian marketing and consumer-behavior researchers: the Ehrenberg-Bass folks behind the music study we cited above. (Their antipodean academic institute has attracted massive sponsorships by replacing typical corporate marketing fluffery with actual evidence.)

Their analysis began when Callum Davies needed to better understand the demographics of American music tastes to interpret streaming data for his impending dissertation. Since they were already asking folks about music, Davies and his colleagues decided they might as well seize the opportunity to update landmark research from Holbrook and Robert Schindler about music nostalgia.

Building on the American scholars’ methods, they asked respondents to listen to a few seconds each of 34 songs , including Justin Timberlake’s “Sexy Back” and Johnny Preston’s “ Running Bear .” Then respondents were asked to rate each song on a zero-to-10 scale. (In the latter case, we can’t imagine the high end of the scale got much use, especially if the excerpt included that song’s faux-tribal “hooga-hooga” chant and/or its climactic teen drownings.)

Together, the songs represented top-10 selections from every even-numbered year from 1950 (Bing and Gary Crosby’s “Play a Simple Melody”) to 2016 (Rihanna’s “Work”), allowing researchers to gather our preferences for music released throughout our lives.

Like us, they found that you’ll forever prefer the music of your late teens. But their results show one big difference: There’s no sudden surge of negative ratings for the most recent music.

Marketing researcher Bill Page said that by broadly asking when music, sports or crime were worst, instead of getting ratings for specific years or items, YouGov got answers to a question they didn’t ask.

“When you ask about ‘worst,’ you’re not asking for an actual opinion,” Page said. “You’re asking, ‘Are you predisposed to think things get worse?’”

“There’s plenty of times surveys unintentionally don’t measure what they claim to,” his colleague Zac Anesbury added.

YouGov actually measured what academics call “declinism,” his bigwig colleague Carl Driesener explained. He looked a tiny bit offended when we asked if that was a real term or slang they’d coined on the spot. But in our defense, only a few minutes had passed since they had claimed “cozzie livs” was Australian for “the cost of living crisis.”

Declinists believe the world keeps getting worse. It’s often the natural result of rosy retrospection, or the idea that everything — with the possible exception of “Running Bear” — looks better in memory than it did at the time. This may happen in part because remembering the good bits of the past can help us through difficult times, Page said.

It’s a well-established phenomenon in psychology, articulated by Leigh Thompson, Terence Mitchell and their collaborators in a set of analyses . They found that when asked to rate a trip mid-vacation, we often sound disappointed. But after we get home — when the lost luggage has been found and the biting-fly welts have stopped itching — we’re as positive about the trip as we were in the early planning stage. Sometimes even more so.

So saying the 2020s are the worst decade ever is akin to sobbing about “the worst goldang trip ever” at 3 a.m . in a sketchy flophouse full of Russian-speaking truckers after you’ve run out of cash and spent three days racing around Urumqi looking for the one bank in Western China that takes international cards.

A few decades from now, our memories shaped by grainy photos of auroras and astrolabes, we’ll recall only the bread straight from streetside tandoor-style ovens and the locals who went out of their way to bail out a couple of distraught foreigners.

In other words, the 2020s will be the good old days.

Greetings! The Department of Data curates queries. What are you curious about: How many islands have been completely de-ratted? Where is America’s disc-golf heartland? Who goes to summer camp? Just ask!

If your question inspires a column, we’ll send you an official Department of Data button and ID card. This week’s buttons go to YouGov’s Taylor Orth, who correctly deduced we’d be fascinated by decade-related polls, and Stephanie Killian in Kennesaw, Ga., who also got a button for our music column , with her questions about how many people cling to the music of their youth.

Create an account

Create a free IEA account to download our reports or subcribe to a paid service.

SUVs are setting new sales records each year – and so are their emissions

Cite commentary

IEA (2024), SUVs are setting new sales records each year – and so are their emissions , IEA, Paris https://www.iea.org/commentaries/suvs-are-setting-new-sales-records-each-year-and-so-are-their-emissions, Licence: CC BY 4.0

Share this commentary

Share on Twitter Twitter
Share on Facebook Facebook
Share on LinkedIn LinkedIn
Share on Email Email
Share on Print Print

The large, heavy passenger vehicles were responsible for over 20% of the growth in global energy-related CO2 emissions last year

SUVs accounted for 48% of global car sales in 2023, reaching a new record and further strengthening the defining automobile trend of the early 21 st century – the shift towards ever larger and heavier cars. There are various driving forces behind this trend, from the appeal of SUVs as a status symbol and their potential enhancements in comfort, to the marketing strategies of leading automakers.

New car registrations by size and powertrain, 2010-2023

In advanced economies, SUV sales reached around 20 million last year, surpassing a market share of 50% for the first time. This preference for larger vehicles extends to emerging and developing economies, too, where the share of SUVs in total car sales mirrors this trend. Today, more than one in four cars on the road worldwide are SUVs, the majority of which are conventional internal combustion engine (ICE) vehicles. While only 5% of SUVs currently on the road are electric, they account for a growing share of electric car sales. In 2023, more than 55% of new electric car registrations were SUVs.

If SUVs were a country, they would be the world’s fifth largest emitter of CO2

SUVs weigh 200-300 kg more than an average medium-sized car, and typically take up nearly 0.3 m 2 more space – emitting roughly 20% more carbon dioxide (CO 2 ) emissions. The trend towards heavier and less fuel-efficient cars increases energy demand, including oil and electricity use, as well as demand for basic metals and critical minerals needed for battery production. Over the course of 2022 and 2023, global oil consumption directly related to SUVs rose by a total of over 600 000 barrels per day, accounting for more than a quarter of the overall annual growth in oil demand.

Combustion-related CO2 emissions from SUVs and the 10 highest-emitting countries, 2023

In 2023, there were more than 360 million SUVs on the roads worldwide, resulting in combustion-related CO 2 emissions of 1 billion tonnes, an increase of around 100 million tonnes from the previous year. This accounted for more than 20% of the growth in global energy-related CO 2 emissions last year. The annual increase in CO 2 emissions attributed to the rise of SUVs is equivalent to about half of the emissions growth stemming from the global electricity sector. Compared with smaller cars, SUVs are also associated with higher indirect emissions resulting from producing the materials used to manufacture them. If ranked among countries, the global fleet of SUVs would be the world’s fifth largest emitter of CO 2 , exceeding the emissions of Japan and various other major economies.

Global electric vehicles fleet and sales of SUVs, 2023

Around 30 million SUVs with internal combustion engines were added to the global fleet in 2023, comparable to the total number of electric cars on the roads today. In 2023, there were 500 electric car models available worldwide, of which 60% fell under the SUV category, marking a significant increase from previous years. This trend is further reinforced as automakers plan to introduce a greater number of electrified SUV models in the near future.

Globally, SUVs now account for approximately 45% of the electric car fleet, a share that would be even higher were it not for the strong growth of small electric cars in urban areas in China. In advanced economies, the share of SUVs among electric cars is even higher at 55%. This is due to limited availability of smaller and more affordable compact models.

Despite advances in fuel efficiency and electrification, the trend toward heavier and less efficient vehicles such as SUVs, which emit roughly 20% more emissions than an average medium-sized car, has largely nullified the improvements in energy consumption and emissions achieved elsewhere in the world’s passenger car fleet in recent decades. Larger vehicles also pose challenges related to their increased use of critical minerals, as they are equipped with larger batteries. Additionally, they raise questions around pedestrian safety in constrained urban environments due to their higher front ends. On top of this, their greater parking space requirements, approximately 10% more than for medium-sized cars, can limit the use of valuable space in dense urban areas for other purposes.

To respond to some of these challenges, countries such as France, Norway, and Ireland have either established or are exploring legislative frameworks to reign in demand for SUVs. Major cities like Paris and Lyon are implementing higher parking charges specifically targeting SUVs in urban areas.

Shifting from fossil-fuelled cars to electric vehicles is a key strategy for reaching international energy and climate goals. However, measures such as rightsizing EV battery packs, tailoring fuel efficiency standards based on car size, and investing in innovative battery technologies with enhanced performance and durability, as well as lower material demand requirements, are also essential for a sustainable future.

This analysis was supported by the work of IEA Energy Analysts Mathilde Huismans and Jules Sery.

Subscription successful

Thank you for subscribing. You can unsubscribe at any time by clicking the link at the bottom of any IEA newsletter.

An official website of the United States government

Personal Income and Outlays, April 2024

News Release
Related Materials
Additional Information

Personal income increased $65.3 billion (0.3 percent at a monthly rate) in April, according to estimates released today by the Bureau of Economic Analysis (tables 2 and 3). Disposable personal income (DPI) —personal income less personal current taxes—increased $40.2 billion (0.2 percent) and personal consumption expenditures (PCE) increased $39.1 billion (0.2 percent).

The PCE price index increased 0.3 percent. Excluding food and energy, the PCE price index increased 0.2 percent (table 5). Real DPI decreased 0.1 percent in April and real PCE decreased 0.1 percent; goods decreased 0.4 percent and services increased 0.1 percent (tables 3 and 4).

The increase in current-dollar personal income in April primarily reflected increases in compensation, personal income receipts on assets, and government social benefits to persons (table 2).

The $39.1 billion increase in current-dollar PCE in April reflected an increase of $49.1 billion in spending for services that was partly offset by a $10.0 billion decrease in spending for goods (table 2). Within services, the largest contributors to the increase were housing and utilities (led by housing), health care (both outpatient services and hospitals), and financial services and insurance (led by financial service charges, fees, and commissions). These increases were partly offset by a decrease in transportation services (led by air transportation). Within goods, the largest contributors to the decrease were spending for recreational goods and vehicles (led by information processing equipment) and other nondurable goods (led by recreational items). Detailed information on monthly PCE spending can be found on Table 2.4.5U .

Personal outlays —the sum of PCE, personal interest payments, and personal current transfer payments—increased $42.8 billion in April (table 2). Personal saving was $744.5 billion in April and the personal saving rate —personal saving as a percentage of disposable personal income—was 3.6 percent (table 1).

From the preceding month, the PCE price index for April increased 0.3 percent (table 5). Prices for goods increased 0.2 percent, and prices for services increased 0.3 percent. Food prices decreased 0.2 percent and energy prices increased 1.2 percent. Excluding food and energy, the PCE price index increased 0.2 percent. Detailed monthly PCE price indexes can be found on Table 2.4.4U .

From the same month one year ago, the PCE price index for April increased 2.7 percent (table 7). Prices for goods increased 0.1 percent and prices for services increased 3.9 percent. Food prices increased 1.3 percent and energy prices increased 3.0 percent. Excluding food and energy, the PCE price index increased 2.8 percent from one year ago.

The 0.1 percent decrease in real PCE in April reflected a decrease of 0.4 percent in spending on goods and an increase of 0.1 percent in spending on services (table 4). Within goods, the largest contributors to the decrease were gasoline and other energy goods (led by motor vehicle fuels, lubricants, and fluids), recreational goods and vehicles (led by information processing equipment), and other nondurable goods (led by recreational items). Within services, the largest contributor to the increase was health care (led by outpatient services). Detailed information on monthly real PCE spending can be found on Table 2.4.6U .

Updates to Personal Income and Outlays

Estimates have been updated for October through March. For October through December, estimates for compensation, personal taxes, and contributions for government social insurance reflect the incorporation of updated fourth-quarter wage and salary data from the Bureau of Labor Statistics Quarterly Census of Employment and Wages program. Revised and previously published changes from the preceding month for current-dollar personal income, and for current-dollar and chained (2017) dollar DPI and PCE, are provided below for February and March.

* * *

Next release: June 28, 2024, at 8:30 a.m. EDT Personal Income and Outlays, May 2024

Full Release & Tables (PDF)

Tables only (excel), release highlights (pdf), historical comparisons (pdf).

Personal Income Lisa Mataloni 301-278-9083 [email protected]
PCE Goods Kyle Brown 301-278-9086 [email protected]
PCE Services Harvey Davis 301-278-9719 [email protected]
Media Connie O’Connell 301-278-9003 [email protected]

Additional Resources available at www.bea.gov :

Stay informed about BEA developments by reading The BEA Wire , signing up for BEA's email subscription service , or following BEA on X, formerly known as Twitter @BEA_News .
Historical time series for these estimates can be accessed in BEA's Interactive Data Application .
Access BEA data by registering for BEA's Data Application Programming Interface (API).
For more on BEA's statistics, see BEA's online journal, the Survey of Current Business .
BEA's news release schedule
NIPA Handbook : Concepts and Methods of the U.S. National Income and Product Accounts

Definitions

Personal income is the income received by, or on behalf of, all persons from all sources: from participation as laborers in production, from owning a home or business, from the ownership of financial assets, and from government and business in the form of transfers. It includes income from domestic sources as well as the rest of world. It does not include realized or unrealized capital gains or losses.

Disposable personal income is the income available to persons for spending or saving. It is equal to personal income less personal current taxes.

Personal consumption expenditures (PCE) is the value of the goods and services purchased by, or on the behalf of, "persons" who reside in the United States.

Personal outlays is the sum of PCE, personal interest payments, and personal current transfer payments.

Personal saving is personal income less personal outlays and personal current taxes.

The personal saving rate is personal saving as a percentage of disposable personal income.

Current-dollar estimates are valued in the prices of the period when the transactions occurred—that is, at "market value." Also referred to as "nominal estimates" or as "current-price estimates."

Real values are inflation-adjusted estimates—that is, estimates that exclude the effects of price changes.

For more definitions, refer to the Glossary: National Income and Product Accounts .

Statistical conventions

Annual rates. Monthly and quarterly values are expressed at seasonally-adjusted annual rates (SAAR). Dollar changes are calculated as the difference between these SAAR values. For detail, refer to the FAQ " Why does BEA publish estimates at annual rates? "

Month-to-month percent changes are calculated from unrounded data and are not annualized.

Quarter-to-quarter percent changes are calculated from unrounded data and are displayed at annual rates. For detail, refer to the FAQ " How is average annual growth calculated? " and " Why does BEA publish percent changes in quarterly series at annual rates? "

Quantities and prices. Quantities, or "real" volume measures, and prices are expressed as index numbers with a specified reference year equal to 100 (currently 2017). Quantity and price indexes are calculated using a Fisherchained weighted formula that incorporates weights from two adjacent periods (months for monthly data, quarters for quarterly data and annuals for annual data). For details on the calculation of quantity and price indexes, refer to Chapter 4: Estimating Methods in the NIPA Handbook.

Chained-dollar values are calculated by multiplying the quantity index by the current dollar value in the reference year (2017) and then dividing by 100. Percent changes calculated from real quantity indexes and chained-dollar levels are conceptually the same; any differences are due to rounding. Chained-dollar values are not additive because the relative weights for a given period differ from those of the reference year. In tables that display chained-dollar values, a "residual" line shows the difference between the sum of detailed chained-dollar series and its corresponding aggregate.

IMAGES

What is Data Analysis in Research
FREE 8+ Sample Data Analysis Reports in MS Word
FREE 10+ Sample Data Analysis Templates in PDF
Data analysis assignment help
What is Data Analysis in Research
Data-Analysis-Project.pdf

VIDEO

Coursera Data Analysis with R Programming
Business Analytics For Management Decision Week 4 Quiz Assignment Solution NPTEL 2024| Probable Ans|
Business Analytics For Management Decision Week 5 Quiz Assignment Solution NPTEL 2024| Probable Ans|
Business Analytics For Management Decision Week 8 Quiz Assignment Solution NPTEL 2024| Probable Ans|
Business Analytics For Management Decision Week 1 Quiz Assignment Solution NPTEL 2024| Probable Ans|
Business Analytics For Management Decision Week 9 Quiz Assignment Solution NPTEL 2024| Probable Ans|

COMMENTS

5 Data Analytics Projects for Beginners
3. Exploratory data analysis (EDA) Data analysis is all about answering questions with data. Exploratory data analysis, or EDA for short, helps you explore what questions to ask. This could be done separately from or in conjunction with data cleaning. Either way, you'll want to accomplish the following during these early investigations.
What Is Data Analysis? (With Examples)
Written by Coursera Staff • Updated on Apr 19, 2024. Data analysis is the practice of working with data to glean useful information, which can then be used to make informed decisions. "It is a capital mistake to theorize before one has data. Insensibly one begins to twist facts to suit theories, instead of theories to suit facts," Sherlock ...
Assignment 2: Exploratory Data Analysis
Assignment 2: Exploratory Data Analysis. In this assignment, you will identify a dataset of interest and perform an exploratory analysis to better understand the shape & structure of the data, investigate initial questions, and develop preliminary insights & hypotheses. Your final submission will take the form of a report consisting of ...
Python Data Analysis Example: A Step-by-Step Guide for Beginners
Step 3: Exploratory Data Analysis. The next stage is to start analyzing your data by calculating summary statistics, plotting histograms and scatter plots, or performing statistical tests. The goal is to gain a better understanding of the variables, and then use this understanding to guide the rest of the analysis.
Introduction to Data Analytics Course by IBM
In this module, you will learn about the different types of data analysis and the key steps in a data analysis process. You will gain an understanding of the different components of a modern data ecosystem, and the role Data Engineers, Data Analysts, Data Scientists, Business Analysts, and Business Intelligence Analysts play in this ecosystem.
A Step-by-Step Guide to the Data Analysis Process
1. Step one: Defining the question. The first step in any data analysis process is to define your objective. In data analytics jargon, this is sometimes called the 'problem statement'. Defining your objective means coming up with a hypothesis and figuring how to test it.
Data Analytics Project Assignment
For your research project you will conduct data analysis and right a report summarizing your analysis and the findings from your analysis. You will accomplish this by completing a series of assignments. Data 275 Research Project Assignment. In this week's assignment, you are required accomplish the following tasks: 1. Propose a topic for you ...
Statistical Thinking and Data Analysis
This course is an introduction to statistical data analysis. Topics are chosen from applied probability, sampling, estimation, hypothesis testing, linear regression, analysis of variance, categorical data analysis, and nonparametric statistics. ... assignment_turned_in Programming Assignments with Examples. In this course, you will learn about ...
Exploratory Data Analysis (EDA)
Exploratory Data Analysis, or EDA, is an important step in any Data Analysis or Data Science project. EDA is the process of investigating the dataset to discover patterns, and anomalies (outliers), and form hypotheses based on our understanding of the dataset. EDA involves generating summary statistics for numerical data in the dataset and ...
Homework 3: Data Analysis
Call this method percent_change_bachelors_2000s and return the difference (the percent in 2010 minus the percent in 2000) as a float. For example, assuming we have parsed hw3-nces-ed-attainment.csv and stored it in a variable called data, then the call percent_change_bachelors_2000s(data) will return 2.599999999999998.
101 Pandas Exercises for Data Analysis
101 python pandas exercises are designed to challenge your logical muscle and to help internalize data manipulation with python's favorite package for data analysis. The questions are of 3 levels of difficulties with L1 being the easiest to L3 being the hardest. 101 Pandas Exercises. Photo by Chester Ho. You might also like to practice … 101 Pandas Exercises for Data Analysis Read More »
CSE412: Intro to Data Visualization
Assignment 2: Exploratory Data Analysis. In this assignment, you will identify a dataset of interest and perform exploratory analysis to better understand the shape & structure of the data, identify data quality issues, investigate initial questions, and develop preliminary insights & hypotheses. Your final submission will take the form of a ...
Assignments
Optional Assignments. The optional homework assignments correspond to problems from the course textbook: Tamhane, Ajit C., and Dorothy D. Dunlop. Statistics and Data Analysis: From Elementary to Intermediate. Prentice Hall, 1999. ISBN: 9780137444267.
How to Write Data Analysis Reports in 9 Easy Steps
1. Start with an Outline. If you start writing without having a clear idea of what your data analysis report is going to include, it may get messy. Important insights may slip through your fingers, and you may stray away too far from the main topic. To avoid this, start the report by writing an outline first.
Data Analyst Practice Test number 1
Data Analyst Practice Test number 1. July 28, 2022. This is an Excel Data Analyst exam, you will be challenged to solve various data analysis issues that Excel Data Analysts face in their everyday work! You will be using functions such as: COUNTIF. TRIM.
Creating a Data Analysis Plan: What to Consider When Choosing
The first step in a data analysis plan is to describe the data collected in the study. This can be done using figures to give a visual presentation of the data and statistics to generate numeric descriptions of the data. Selection of an appropriate figure to represent a particular set of data depends on the measurement level of the variable.
Data Analysis in Excel (A Comprehensive Guideline)
Method 7 - Excel What-If Analysis Feature. What-If Analysis in Excel refers to a set of tools and techniques that allow you to explore different scenarios and observe the potential impact on the results of your formulas or models.Excel provides several features for performing what-if analysis, including:. Data Tables: Data Tables allow you to create a table displaying multiple results based ...
shanuhalli/Data-Analyst-Assignment
Find all ExcelR Data Analyst Assignment Solution Here 1. Advanced Excel 2. MySQL 3. Python 4. Tableau 5. Power BI - shanuhalli/Data-Analyst-Assignment
What Does a Data Analyst Do? A Job Breakdown
Common data analyst duties: Data collection: Data analysts often develop and manage databases, collecting raw data from various internal and external sources. Data cleaning and transformation: The gathered data is often plagued with errors and redundancies. Data analysts clean and transform it into a reliable format for analysis.
Freight Analysis Framework Modernization: Overview & Feedback*
The Bureau of Transportation Statistics (BTS) is developing two new products as part of the Freight Analysis Framework (FAF) commodity flow data program: a multimodal assignment and flow visualization tool and a county-level origin-destination flow database. BTS will share updates on these products and on its plans to modernize the FAF forecasting process.
Presenting Data Analysis to Diverse Audiences: A Guide
In humanitarian sector, when presenting data analysis findings to a diverse audience in, I focus on clear, easy-to-understand charts, graphs, and maps tailored to the audience's familiarity with data.
Assigning Analysis Authorization 0BI_ALL for All Data
As shown here, analysis authorizations, and thus also special analysis authorization 0BI_ALL, can be assigned to the users in one the following ways: Directly with the user assignment in transaction RSECADMIN. Recommended Method: Indirectly by using the authorization object, S_RS_AUTH and assigning it to a profile and then the profile to a role.
Brandon Crawford: Home Run Statcast Analysis
Brandon Crawford: Home Run Statcast Analysis. Cardinals @ Phillies. May 31, 2024 | 00:00:13. Reels. The data behind Brandon Crawford's home run. data visualization.
Assignment-4-Mashrur-A-Wazed-Mahir-2011681030
Page 1 of 15. English document from North South University, 15 pages, ENG 105 - Advanced Composition Section 3 Assignment: Data presentation & analysis and Summary Prepared by: Mashrur A Wazed Mahir ID: 2011681030 Submitted to: Mr. Atanu Bhuiyan Lecturer. Department of English and Modern Languages fData Presentation and Ana.
Analysis
America's best decade, according to data. One simple variable, more than anything, determines when you think the nation peaked. Mike Lee and his daughter, Zoey, play at Alethia Tanner Park in D ...
Importance of Data Analysis for Business Success
2 Module Four Assignment Explanation of the Importance of Data Analysis Analyzing data is crucial for a business owner or management team to understand their company's inner workings. By analyzing trends and creating data visualizations, a business can understand what areas they are doing well in and what areas need to be further analyzed and improved upon.
SUVs are setting new sales records each year
SUVs accounted for 48% of global car sales in 2023, reaching a new record and further strengthening the defining automobile trend of the early 21 st century - the shift towards ever larger and heavier cars. There are various driving forces behind this trend, from the appeal of SUVs as a status symbol and their potential enhancements in comfort, to the marketing strategies of leading automakers.
U.S. History
Study U.S. History online free by downloading OpenStax's United States History textbook and using our accompanying online resources.
Inflation seems to be cooling again. Americans aren't buying it
The Assignment with Audie Cornish ... Analysis by Krystal Hur, CNN 5 minute read ... Recent economic data has also suggested that inflation is cooling, after data earlier this year spurred fears ...
Personal Income and Outlays, April 2024
Personal income increased $65.3 billion (0.3 percent at a monthly rate) in April, according to estimates released today by the Bureau of Economic Analysis (tables 2 and 3). Disposable personal income (DPI) —personal income less personal current taxes—increased $40.2 billion (0.2 percent) and personal consumption expenditures (PCE) increased $39.1 billion (0.2 percent).

6.894 : Interactive Data Visualization

Step 1: Data Selection

Part 2: Exploratory Visual Analysis

Recommended Data Sources

Additional Data Sources

Visualization Tools

Data Wrangling Tools

Graphical Tools

Programming Tools

Submission Details

Python Data Analysis Example: A Step-by-Step Guide for Beginners

Python for Data Analysis

Python Data Analysis Example

Step 2: Data Cleaning and Preparation

Step 3: Exploratory Data Analysis

Step 4: Drawing Conclusions from Data

Further Python Data Analysis Examples

You may also like

Introduction to Data Analytics

This course is part of multiple programs

Recommended experience

What you'll learn

Skills you'll gain

Details to know

See how employees at top companies are mastering in-demand skills

Build your subject-matter expertise

Earn a career certificate

There are 5 modules in this course

What is Data Analytics

What's included

9 videos • Total 39 minutes

3 readings • Total 22 minutes

4 quizzes • Total 45 minutes

1 discussion prompt • Total 5 minutes

The Data Ecosystem

11 videos • Total 67 minutes

2 readings • Total 20 minutes

Gathering and Wrangling Data

7 videos • Total 39 minutes

4 quizzes • Total 48 minutes

8 videos • Total 44 minutes

Career Opportunities and Data Analysis in Action

7 videos • Total 28 minutes

4 readings • Total 32 minutes

2 quizzes • Total 21 minutes

Instructor ratings

Recommended if you're interested in Data Analysis

Excel Basics for Data Analysis

Data Visualization and Dashboards with Excel and Cognos

IBM Data Analyst

IBM Data Analyst Capstone Project

Learner reviews

New to Data Analysis? Start here.

Open new doors with Coursera Plus

Advance your career with an online degree

Join over 3,400 global companies that choose Coursera for Business

Frequently asked questions

What will I get if I subscribe to this Certificate?

What is the refund policy?

More questions

A Step-by-Step Guide to the Data Analysis Process

1. Step one: Defining the question

Tools to help define your objective

2. Step two: Collecting the data

What is first-party data?

What is second-party data?

What is third-party data?

Tools to help you collect data

3. Step three: Cleaning the data

Carrying out an exploratory analysis

Tools to help you clean your data

4. Step four: Analyzing the data

Descriptive analysis

Diagnostic analysis

Predictive analysis

Prescriptive analysis

5. Step five: Sharing your results

Tools for interpreting and sharing your findings

6. Step six: Embrace your failures

DATA 275 Introduction to Data Analytics