9 Project Ideas for Your Data Analytics Portfolio
Finding projects for your data analytics portfolio can be tricky, especially when you’re new to the field. You might also think that your data projects need to be especially complex or showy, but that’s not the case. The most important thing is to demonstrate your skills, ideally using a dataset that interests you. And the good news? Data is everywhere—you just need to know where to find it and what to do with it.
In this post, we’ll highlight the key elements that your data analytics portfolio should demonstrate. We’ll then share nine project ideas that will help you build your portfolio from scratch, focusing on three key areas: Data scraping , exploratory analysis , and data visualization .
- What should you include in your data analytics portfolio?
Data scraping project ideas
Exploratory data analysis project ideas, data visualization project ideas.
- What’s next?
Ready to get inspired? Let’s go!
1. What should you include in your data analytics portfolio?
Data analytics is all about finding insights that inform decision-making. But that’s just the end goal. As any experienced data analyst will tell you, the insights we see as consumers are the result of a great deal of work. In fact, about 80% of all data analytics tasks involve preparing data for analysis. This makes sense when you think about it—after all, our insights are only as good as the quality of our data.
Yes, your portfolio needs to show that you can carry out different types of data analysis . But it also needs to show that you can collect data, clean it, and report your findings in a clear, visual manner. As your skills improve, your portfolio will grow in complexity. As a beginner though, you’ll need to show that you can:
- Scrape the web for data
- Carry out exploratory analyses
- Clean untidy datasets
- Communicate your results using visualizations
If you’re inexperienced, it can help to present each item as a mini-project of its own. This makes life easier since you can learn the individual skills in a controlled way. With that in mind, we’ll keep it nice and simple with some basic ideas, and a few tools you might want to explore to help you along the way.
2. Data scraping project ideas for your portfolio
What is data scraping.
Data scraping is the first step in any data analytics project. It involves pulling data (usually from the web) and compiling it into a usable format. While there’s no shortage of great data repositories available online, scraping and cleaning data yourself is a great way to show off your skills.
The process of web scraping can be automated using tools like Parsehub , ScraperAPI , or Octoparse (for non-coders) or by using libraries like Beautiful Soup or Scrapy (for developers). Whichever tool you use, the important thing is to show that you understand how it works and can apply it effectively.
Before scraping a website, be sure that you have permission to do so. If you’re not certain, you can always search for a dataset on a repository site like Kaggle . If it exists there, it’s a good bet you can go straight to the source and scrape it yourself. Bear in mind though—data scraping can be challenging if you’re mining complex, dynamic websites. We recommend starting with something easy—a mostly-static site. Here are some ideas to get you started.
The Internet Movie Database
A good beginner’s project is to extract data from IMDb. You can collect details about popular TV shows, movie reviews and trivia, the heights and weights of various actors, and so on. Data on IMDb is stored in a consistent format across all its pages, making the task a lot easier. There’s also a lot of potential here for further analysis.
Many beginners like scraping data from job portals since they often contain standard data types. You can also find lots of online tutorials explaining how to proceed. To keep it interesting, why not focus on your local area? Collect job titles, companies, salaries, locations, required skills, and so on. This offers great potential for later visualization, such as graphing skillsets against salaries.
Another popular one is to scrape product and pricing data from e-commerce sites. For instance, extract product information about Bluetooth speakers on Amazon, or collect reviews and prices on various tablets and laptops. Once again, this is relatively straightforward to do, and it is scalable. This means you can start with a product that has a small number of reviews, and then upscale once you’re comfortable using the algorithms.
For something a bit less conventional, another option is to scrape a site like Reddit. You could search for particular keywords, upvotes, user data, and more. Reddit is a very static website, making the task nice and straightforward. Later, you can carry out interesting exploratory analyses, for instance, to see if there are any correlations between popular posts and particular keywords. Which brings us to our next section.
3. Exploratory data analysis project ideas
What is exploratory data analysis.
The next step in any data analyst’s skillset is the ability to carry out an exploratory data analysis (EDA). An EDA looks at the structure of data, allowing you to determine their patterns and characteristics. They also help you to clean your data. You can extract important variables, detect outliers and anomalies, and generally test your underlying assumptions.
While this process is one of the most time-consuming tasks for a data analyst, it can also be one of the most rewarding. Later modeling focuses on generating answers to specific questions. An EDA, meanwhile, helps you do one of the most exciting bits—generating those questions in the first place.
Languages like R and Python are often used to carry out these tasks. They have many pre-existing algorithms that you can use to carry out the work for you . The real skill lies in presenting your project and its results. How you decide to do this is up to you, but one popular method is to use an interactive documentation tool like Jupyter Notebook . This lets you capture elements of code, along with explanatory text and visualizations, all in one place. Here are some ideas for your portfolio.
Global suicide rates
This global suicide rates dataset covers suicide rates in various countries, with additional data including year, gender, age, population, GDP, and more. When carrying out your EDA, ask yourself: What patterns can you see? Are suicides rates climbing or falling in various countries? What variables (such as gender or age) can you find that might correlate to suicide rates?
World Happiness Report
On the other end of the scale, the World Happiness Report tracks six factors to measure happiness across the world’s citizens: life expectancy, economics, social support, absence of corruption, freedom, and generosity. So, which country is the happiest? Which continent? Which factor appears to have the greatest (or smallest) impact on a nation’s happiness? Overall, is happiness increasing or decreasing?
Aside from the two ideas above, you could also use your own datasets . After all, if you’ve already scraped your own data, why not use them? For instance, if you scraped a job portal, which locations or regions offer the best-paid jobs? Which offer the least well-paid ones? Why might that be? Equally, with e-commerce data, you could look at which prices and products offer the best value for money.
Ultimately, whichever dataset you’re using, it should grab your attention. If the data are too complex or don’t interest you, you’re likely to run out of steam before you get very far. Keep in mind what further probing you can do to spot interesting trends or patterns, and to extract the insights you need.
We’ve compiled a list of ten great places to find free datasets for your next project here .
4. Data visualization project ideas
What is data visualization.
Scraping, tidying, and analyzing data is one thing. Communicating your findings is another. Our brains don’t like looking at numbers and figures, but they love visuals. This is where the ability to create effective data visualizations comes in. Good visualizations—whether static or interactive—make a great addition to any data analytics portfolio. Showing that you can create visualizations that are both effective and visually appealing will go a long way towards impressing a potential employer.
Check out this video with Dr. Humera, where she explains how visualization helps tell a story with data.
Some free visualization tools include Google Charts , Canva Graph Maker (free), and Tableau Public . Meanwhile, if you want to show off your coding abilities, use a Python library such as Seaborn , or flex your R skills with Shiny . Needless to say, there are many tools available to help you. The one you choose depends on what you’re looking to achieve. Here’s a bit of inspiration…
Topical subject matter looks great on any portfolio, and the pandemic is nothing if not topical! What’s more, sites like Kaggle already have thousands of Covid-19 data sets available . How can you represent the data? Could you use a global heatmap to show where cases have spiked, versus where there are very few? Perhaps you could create two overlapping bar charts to show known infections versus predicted infections. Here’s a handy tutorial to help you visualize Covid-19 data using R, Shiny, and Plotly .
Most followed on Instagram
Whether you’re interested in social media, or celebrity and brand culture, this dataset of the most-followed people on Instagram has great potential for visualization. You could create an interactive bar chart that tracks changes in the most followed accounts over time. Or you could explore whether brand or celebrity accounts are more effective at influencer marketing. Otherwise, why not find another social media dataset to create a visualization? For instance, this map of the USA by data scientist Greg Rafferty nicely highlights the geographical source of trending topics on Instagram.
Another topic that lends itself well to visualization is transport data. Here’s a great project by Chen Chen on github , using Python to visualize the top tourist destinations worldwide, and the correlation between inbound/outbound tourists with gross domestic product (GDP).
5. What’s next?
In this post, we’ve explored which skills every beginner needs to demonstrate in their data analytics portfolio. Regardless of the dataset you’re using, you should be able to demonstrate the following abilities:
- Web scraping —using tools like Parsehub, Beautiful Soup, or Scrapy to extract data from websites (remember: static ones are easier!)
- Exploratory data analysis and data cleaning —manipulating data with tools like R and Python, before drawing some initial insights.
- Data visualization —utilizing tools like Tableau, Shiny, or Plotly to create crisp, compelling dashboards, and visualizations.
Once you’ve mastered the basics, you can start getting more ambitious with your data analytics projects. For example, why not introduce some machine learning projects, like sentiment analysis or predictive analysis? The key thing is to start simple and to remember that a good data analytics portfolio needn’t be flashy, just competent.
To further develop your skills, there are loads of online courses designed to set you on the right track. To start with, why not try our free, five-day data analytics short course ?
And, if you’d like to learn more about becoming a data analyst and building your portfolio, check out the following:
- How to build a data analytics portfolio
- The best data analytics certification programs on the market right now
- These are the most common data analytics interview questions
- Oct 18, 2021
- 10 min read
Google Data Analytics Capstone Project
Updated: Apr 24
I worked on the Google Data Analytics Capstone Project, Track 1, Case Study 1. I will be diving into the background, my full process of cleaning, analyzing and visualizing the data, along with my final suggestions and summary of the data.
Quick Links :
Tableau Dashboard | Github R Code for Analysis | Github R Code for Tableau Visualization | LinkedIn Post
Below is a table of contents in case you want to go to a specific section.
Table of Contents:
Summary of Data
What I Learned
Cyclistic is a bike sharing program which features more than 5,800 bikes and 600 docking stations. It offers reclining bikes, hand tricycles, and cargo bikes, making it more inclusive to people with disabilities and riders who can't use a standard two-wheeled bike. It was founded in 2016 and has grown tremendously into a fleet of bicycles that are geotracked and locked into a network of 692 stations across Chicago. The bikes can be unlocked from one station and returned to any other station in the system anytime.
Previously, Cyclistic's marketing strategy tried to build the general awareness and appeal to broad consumers. It has flexible pricing plans: single-ride passes, full-day passes, and annual memberships. Those who purchase single-ride or full-day passes are referred to as casual riders while those who purchase annual memberships are Cyclistic members .
My Role : In this scenario I am a junior data analyst at Cyclistic and my team has been tasked with the overall goal (see below) of designing marketing strategies
Overall Goal : Design marketing strategies aimed at converting casual riders into annual members.
Business Question : "How do annual members and casual riders use Cyclistic bikes differently?"
Below I will describe step-by-step the process I used to for this project. If you want to skip ahead to the business suggestions move onto the section "Insights".
Overview : I first analyzed the data separately (each month) in Excel, then used R to analyze the data as a whole (one year). Finally I created a dashboard in Tableau and used Figma to support the design elements.
I initially wanted to gather and analyze my data in Excel because it was the tool I was most familiar with and I could get a general understanding of the data quicker. I did not combine all of the spreadsheets into one because that would've taken more processing power than my computer had.
I began downloading the data from divvy-tripdata , and turning the .csv files into excel spreadsheets. I downloaded the most recent year of data which was at the time of starting my project:
Added two columns to all of the months:
ride_length calculated the total ride length for each trip using the start_at column which was: ending time minus starting time.
day_of_week calculated the day of the week for each trip using the start_at column date.
Went over the business task and the information I had at hand and how that could be used to figure out how members and casual riders use the bike service differently
Came up with metrics to look at such as :
total number of rides per hour, per day of the month, per season, per day of the week, and for different bike types
Average ride length between members and casual
For every month in Excel created pivot tables and charts to go with the analysis on (this took the longest):
Total Rides per Weekday - calculated the total rides for members and casual and separated it by day of the week; used a cluster column chart
Average Ride Length - calculated the average ride length for members and casual and separated it by day of the week; used a cluster column chart
Total Rides per Hour - calculated the total rides for members and casual separated by the time of the day (24hr); used a line comparison chart
Total Rides per Day - calculated the total rides for members and casual separated by the day of the month; used a line comparison chart
Total Rides per Bike Type - calculated the total rides for members and casual separated by Bike type; used stacked column chart
I also created a Google docs Notes list where I wrote down the exact steps for each month (had a checklist) and included my insights for each month
535 minutes or just under 9 hours to complete.
I originally wanted to use SQL but the files were too big to upload and I couldn't figure out how to utilize Google Cloud Platform. Instead I used R to analyze the data because it could handle all of the information quicker than Excel, and I wanted to work on my R skills. Below is my general process in R, I didn't include my mistakes/missteps or errors for the sake of brevity. If you are interested in my full process including my mistakes, you can email me at: [email protected] and I would be happy to discuss it.
View my full code on my Github for this capstone project here .
Load all of the libraries I used: tidyverse, lubridate, hms, data.table
Uploaded all of the original data from the data source divytrip into R using read_csv function to upload all individual csv files and save them in separate data frames. For august 2020 data I saved it into aug08_df, september 2020 to sep09_df and so on.
Merged the 12 months of data together using rbind to create a one year view
Created a new data frame called cyclistic_date that would contain all of my new columns
Created new columns for:
Ride Length - did this by subtracting end_at time from start_at time
Day of the Week
Time - convert the time to HH:MM:SS format
Season - Spring, Summer, Winter or Fall
Time of Day - Night, Morning, Afternoon or Evening
Cleaned the data by:
Removing duplicate rows
Remove rows with NA values (blank rows)
Remove where ride_length is 0 or negative (ride_length should be a positive number)
Remove unnecessary columns: ride_id, start_station_id, end_station_id, start_lat, start_long, end_lat, end_lng
Calculated Total Rides for:
Total number of rides which was just the row count = 4,152,139
Member type - casual riders vs. annual members
Type of Bike - classic vs docked vs electric; separated by member type and total rides for each bike type
Hour - separated by member type and total rides for each hour in a day
Time of Day - separated by member type and total rides for each time of day (morning, afternoon, evening, night)
Day of the Week - separated by member type and total rides for each day of the week
Day of the Month - separated by member type and total rides for each day of the month
Month - separated by member type and total rides for each month
Season - separated by member type and total rides for each season (spring, summer, fall, winter)
Calculated Average Ride Length for:
Total average ride length
Type of Bike - separated by member type and average ride length for each bike type
Hour - separated by member type and average ride length for each hour in a day
Time of Day - separated by member type and average ride length for each time of day (morning, afternoon, evening, night)
Day of the Week - separated by member type and average ride length for each day of the week
Day of the Month - separated by member type and average ride length for each day of the month
Month - separated by member type and average ride length for each month
Season - separated by member type and average ride lengths for each season (spring, summer, fall, winter)
Then using all of this data I created my own summary in my case notes and took note of the: total rides for each variable, average ride lengths for each variable, and the difference between members versus casual riders. I originally wanted to create a report using R Markdown as well but for the sake of time (I had already spent over 20 hours on the project so far), I decided to skip this step, and write this article instead.
1045 minutes or about 17 and a half hours to complete.
While I learned the basics of Tableau in the Google Course I wanted more practice with visualizing data and creating dashboards.
To view my completed dashboard click here .
I created a separate R code (you can view it here on Github) that made some changes for specifically the Tableau portion.
For ride length I rounded the digits by 1, meaning my numbers were 29.8 or 12.5.
Revised how I created my "month" column. I used mutate() to create a column that had the month in ___ format and not number format. So instead of 01 it would say "January"
Cleaned the data: removed rows with NA values, removed duplicate rows, removed where ride_length was 0 or negative and removed unnecessary columns like: ride_id, start_station_id, end_station_id, start_lat, start_long, end_lat, end_lng
Created a new dataframe with this information so I could test the difference between the original data frame (cyclistic_date) that I used for my analysis and the data frame I would use for Tableau (cyclistic_tableau).
In this new data frame I removed more columns to make calculations quicker in Tableau. I removed: start_station_name, end_station_name, time, started_at, ended_at
Downloaded this data frame into a .csv file which I uploaded to Tableau
Created graphs similar to those I created in Excel but added a few:
Total Rides by Bike Type
Ride Length by Weekday
Total Rides by Weekday
Total rides by hour, total rides by month.
Then I created a basic dashboard with all of that information, a prototype for me to view while I was creating the final dashboard ( Figure 1 below).
Created a prototype mockup in Figma
Created a final version of the mockup in Figma
Edited Dashboard in Tableau to reflect design in Figma
Edited graphs in Tableau
Made bar graphs round
Highlights to specific important notes
Got rid of labels for visual purposes
Combined Figma and Tableau (used dashboard created in Figma as the background for my Tableau Dashboard) to create a final prototype ( Figure 2 below)
Made minor edits to design elements and created final dashboard ( Figure 3 - Cyclistic Dashboard V1 )
On April 24, 2023 I decided to update my dashboard (See Finished Project , image Final Dashboard - Cyclistic Dashboard V2 ). All of the analysis is the same. The only changes have been to the dashboard. Which include:
Adding horizontal grid lines to a few of the charts
Updating the tool tips.
Making all of the top metric values (e.g. Total Rides, Average Ride Length, etc.) interactive in Tableau instead of in Figma.
765 minutes or almost 13 hours to complete.
Below was my first draft of the dashboard only using Tableau.
Prototype using Figma Background
Combined Figma and Tableau (used dashboard created in Figma as the background for my Tableau Dashboard) to create a final prototype.
Final Dashboard V1
Made minor edits to design elements and created final dashboard. This was the original final dashboard.
I am including the other tools I used.
Figma to create my background and help develop the dashboard aesthetics.
Google Docs helped me keep track of all of my documents for this project like:
Date Log - I wrote down what I did that day related to my project
Resources - A list of resources I frequently used
Case Notes - Notes for the case study including the final insights, what I was looking for, and anything else having to do with the case
Evernote to draft this article before I uploaded it here.
Here is my finished project: Google Capstone Project (V2) . You can view the links to my R code on Github used for analysis here and the code for Tableau here .
Note: This is V2 with a few minor changes to the dashboard. Including:
SUMMARY OF DATA
Those who purchase single-ride or full-day passes are referred to as casual riders while those who purchase annual memberships are Cyclistic members .
Total Rides by User Type
Average Ride Length per User Type
Average Ride per Weekday
Members had more rides with 2,328,763 total rides or 56% and casual riders had 1,823,376 total rides or 43%.
Total Rides per Bike Type
Both casual riders and members used the classic bike the most with 1,777,593 rides or 43% of total rides, followed by docked bikes with 1,545,936 rides or 37% of total rides, and lastly with electric bikes at 828,610 rides or 20% of total rides.
Average Ride Length by User Type
The total average ride length was 24 minutes. For casual riders it was longer at 27 minutes while members was 14 minutes.
Average Ride Length per Weekday
For the average ride length per weekday both casual riders and members had an increase in the average ride length on the weekends. For both Sunday was the longest at 31 minutes.
Saturday was the most popular weekday combining casual riders and member rides with 784,239 rides or 19% of total rides. But for member rides only Wednesday was the most popular day with 356,060 rides, 5,407 rides more than Saturday.
5PM or 17:00 was the busiest hour for both members and casual riders with 426,685 rides or 10% of the total rides. Typically rides began increasing in the morning at 6AM and rose until 5PM then dropped afterwards. The afternoon was the busiest for both rider types with 1,905,797 rides or 45% of total rides. 4AM was the least popular hour.
July was the busiest month combining casual riders and member rides at 691,476 rides or 16% of total rides. While summer was the most popular season for both at 1,903,446 rides or 46% of total rides. Looking at just members August is actually the busiest month with 323,140 rides, 816 rides more than July. Winter is the least popular season and February is the least popular month.
The most popular bike among with riders was the classic.
Busiest time was afternoon and the peak time was at 5PM for both casual riders and members.
Busiest weekday was Saturday, casual riders used the service the most on the weekends.
Busiest season was Summer for both types of riders.
Most rides by User Type was members but casual riders weren't far behind.
The average ride length was 24 minutes but casual riders on average rode 23 minutes longer than members.
This was the hardest part for me for the whole project. I have never provided suggestions for a business nor worked in marketing. Any feedback here would be appreciated.
These are my suggestions for the marketing team to convert casual riders to annual members:
Personalize discounts and show perks in the membership program based on their preferences and riding habits.
Emphasize the benefits of memberships, including discounts during busy times of the year like during Summer, or on the weekends.
Have existing members to share their stories about how using Cyclistic's system has changed their life, to create a sense of community, offer a discount if they do so this will help encourage new riders to join the program.
WHAT I LEARNED
Below is what I learned/practiced from over 40 hours spent on this project:
Pivot Tables in Microsoft Excel
Practice using R for data analysis and cleaning specifically using the tidyverse package for data analysis
Graphs in Tableau, edited visual elements along with creating different charts and filters.
Design elements of an effective dashboard
Combining the design feature of Figma with the functionality of Tableau
R portion of my project I found Itamar's case study on Kaggle using R as well, a helpful resource.
Tableau portion I used Navneet Singh's Tableau Dashboard as inspiration.
- Data Analytics
- Portfolio Projects
Struggling on Coming Up with a Data Analytics Project? Try This
How to Get Experience in Data Analytics without a Full-Time Job
10 LinkedIn Data Analytics Post Ideas (with Examples)
- Skip to Content
- Online Services
Data Analytics Student Capstone Projects, 2020-2021
Bitcoin sentiment analysis, addison mcfeely, jordan fujioka, myles lewis, and ning li.
In this paper, we try to analyze how the sentiments of the public, remarks of some celebrities, and news reports influence the bitcoin price. We build models based on the dataset of bitcoin price and text dataset from the source of Kaggle, Twitter, and OpenBlender. At last, we conclude that perdition models with sentiment analysis perform better than onefold price prediction. The bitcoin price is very hard to predict, and the predictive models are not so reliable to do the forecast for bitcoin price considering the accuracy rate of the models.
Does Happiness Matter?
The aim of this project was to build a deep learning model, in order to predict employee turnover based on employee’s self-reported happiness and satisfaction. The data comes from 34 companies in Barcelona that utilize HappyForce, an informative system that aims to collect data on employee satisfaction and daily happiness.
Energy in California: The Path to Renewable Energy
Jenna vogelsang, christa copenhaver, peichen geng and nabil syed.
California is currently taking an initiative to adopt clean renewable energy resources by 2035. Our project aims to examine the future of energy usage in the state of California and how different methods of electricity generation can impact the state in upcoming years. Our data was extracted from the U.S. Energy Information Administration and cleaned using Python. The researchers chose to do a time series analysis in hopes of predicting how energy usage will change as more renewable methods are adopted by 2035. The model chosen was the prophet model created by Facebook in 2017. This model was chosen because it accounts for seasonality and holiday effects which both have an impact on energy usage. The findings of our study indicate a steady upward trend of both electricity consumption and generation with the largest sources being natural gas and solar energy.
FOMO: Can Twitter Sentiment Predict Crypto Prices?
Rj copeland and robert spring.
The intent of our Capstone project is to observe the effect of Twitter sentiment on Crypto prices. The coins of interest are Bitcoin, Ethereum, and Litecoin. The data for sentiment analysis is pulled from Twitter. The Crypto data is comprised of static files showing minute based Crypto price changes relative to the coins listed above. The model used is a Long Short-Term Memory model (LSTM). Both the model and exploratory analysis are implemented in Python.
Forecasting Web Traffic
This project looks at the problem of forecasting future values of time-series data. Wikipedia has over 145,000 articles and their view counts available to analyze. Following the Exploratory Data Analysis (EDA) process I was able to develop an Auto-Regressive Integrated Moving Average (ARIMA) model that showed a prediction of view counts for the following 45 – 60 days. Web forecasting is gaining popularity and has many applications including load balancing for cloud services, and understanding user behavior.
Major League Soccer Analysis
With highly influential salary restrictions in Major League Soccer, organizations must navigate limited high-dollar signings and maximize effectiveness of lower cost players while translating their signings to positive team performance. My goal was to use historical player performances, salary data, and team results to find trends and understand the key metrics that result in successful team performances. This analysis can be leveraged by MLS organizations and scouts to sign the most effective players and increase chances of team success.
MLB Optimizing Outfield Alignments
This project serves to provide a program that uses variable Euclidean distance means model to create optimized outfield shifts for MLB teams. Utilizing variables such as hang time of the ball, outfielder’s reaction, jump, and speed. The result is a GUI developed in R that could be used both in a gameday setting and used by front offices to determine which players fit in to their defensive schemes better.
NFL Combine Research
Taking results from combine participants from several years and testing the data in a few different methods to find what kind of scores are needed to perform successfully in the NFL.
Last Updated March 24, 2022
- Emergency alerts
- Privacy notice
- Public disclosure
- Website info
- We are an equal opportunity institution
Students in our semester-based bachelor’s and master’s degree programs complete a final capstone project which synthesizes and applies information from their coursework. Students serve in a consultant role, identifying a business need of the partner site/organization, assessing it, analyzing the data, and providing recommendations for implementation that are grounded in research and best practices.
Capstone projects take place at host site and can be completed on site or remotely as needed by the host. In some cases, students can complete their capstone through their current place of employment.
Capstone students build relationships and gain experience with their host organization–while working directly with stakeholders to solve a timely business problem or question.
Explore our database of completed capstone projects or refer to the capstone course description for your program of interest to learn more about the capstone process.
3M Menomonie Fibers Water Use Analysis and Reduction Proposal
A case study on machine learning assisted compression: using autoencoders for image compression on commodity hardware, a case study on modeling an nba expansion starting five, a case study on recommendation systems using implicit feedback, a case study on stock trading sentiment analysis, a case study on utilizing predictive analytics in cpm applications, a comparative assessment of recommendation systems and its ethical implications, a connected sustainable future, a dedicated program manager increases employee engagement, a deep dive into deep learning and its applications.
M.S. in Data Science students are required to complete a capstone project. Capstone projects challenge students to acquire and analyze data to solve real-world problems. Project teams consist of two to four students and a faculty advisor. Teams select their capstone project at the beginning of the year and work on the project over the course of two semesters.
Most projects are sponsored by an organization—academic, commercial, non-profit, and government—seeking valuable recommendations to address strategic and operational issues. Depending on the needs of the sponsor, teams may develop web-based applications that can support ongoing decision-making. The capstone project concludes with a paper and presentation.
- Synthesizing the concepts you have learned throughout the program in various courses (this requires that the question posed by the project be complex enough to require the application of appropriate analytical approaches learned in the program and that the available data be of sufficient size to qualify as ‘big’)
- Experience working with ‘raw’ data exposing you to the data pipeline process you are likely to encounter in the ‘real world’
- Demonstrating oral and written communication skills through a formal paper and presentation of project outcomes
- Acquisition of team building skills on a long-term, complex, data science project
Capstone projects have been sponsors by a variety of organizations and industries, including: Capital One, City of Charlottesville, Deloitte Consulting LLP, Metropolitan Museum of Art, MITRE Corporation, a multinational banking firm, The Public Library of Science, S&P Global Market Intelligence, UVA Brain Institute, UVA Center for Diabetes Technology, UVA Health System, U.S. Army Research Laboratory, Virginia Department of Health, Virginia Department of Motor Vehicles, Virginia Office of the Governor, Wikipedia, and more.
Sponsor a Capstone Project
View previous examples of capstone projects and check out answers to frequently asked questions.
What does the process look like?
- The School of Data Science periodically puts out a Call for Proposals . Prospective project sponsors submit official proposals, vetted by the SDS Associate Director for Research Development .
- Sponsors present their projects to students at “Pitch Day” near the start of the Fall term, where students have the opportunity to ask questions.
- Students individually rank their top project choices. An algorithm sorts students into capstone groups of approximately 3 to 4 students per group.
- Each group is assigned a faculty mentor, who will meet groups each week in a seminar-style format.
What is the seminar approach to mentoring capstones?
We utilize a seminar approach to managing capstones to provide faculty mentorship and streamlined logistics. This approach involves one mentor supervising three to four loosely related projects and meeting with these groups on a regular basis. Project teams often encounter similar roadblocks and issues so meeting together to share information and report on progress toward key milestones is highly beneficial.
Do all capstone projects have sponsors?
Not necessarily. Generally, each group works with a sponsor from outside the School of Data Science. Some sponsors are corporations, some are from nonprofit and governmental organizations, and some are from in other departments at UVA.
Why do we have to work in groups?
Because data science is a team sport!
All capstone projects are completed by group work. While this requires additional coordination , this collaborative component of the program reflects the way companies expect their employees to work. Building this skill is one of our core learning objectives for the program.
I didn’t get my first choice of capstone project from the algorithm matching. What can I do?
Remember that the point of the capstone projects isn’t the subject matter; it’s the data science. Professional data scientists may find themselves in positions in which they work on topics assigned to them, but they use methods they enjoy and still learn much through the process. That said, there are many ways to tackle a subject, and we are more than happy to work with you to find an approach to the work that most aligns with your interests.
Can I work on a project for my current employer?
Each spring, we put forward a public call for capstone projects. You are encouraged to share this call widely with your community, including your employer, non-profit organizations, or any entity that might have a big data problem that we can help solve. As a reminder, capstone projects are group projects so the project would require sufficient student interest after ‘pitch day’. In addition, you (the student) cannot serve as the project sponsor (someone else within your employer organization must serve in that capacity).
If my project doesn’t have a corporate sponsor, am I losing out on a career opportunity?
The capstone project will provide you with the opportunity to do relevant, high-quality work which can be included on a resume and discussed during job interviews. The project paper and your code on Github will provide more career opportunities than the sponsor of the project. Although it does happen from time to time, it is rare that capstones lead to a direct job offer with the capstone sponsor's company. Capstone projects are just one networking opportunity available to you in the program.
Capstone Project Reflections From Alumni
“Capstone projects are opportunities for you to deliver valuable, quantifiable results that you can use as a testimony of your long-term project success to the company you work for and other companies in future interviews.” — Gabriel Rushin, MSDS 2017, Procter & Gamble, Senior Machine Learning Engineer Manager
“For my capstone project, I worked to develop a clustering model to assess biogeographic ancestry, using DNA profiles. I felt like I was finally doing real-world data science and loved working with such an important organization as the Department of Defense.” — Colleen Callahan, Online MSDS 2021, Associate Research Analyst, CNA (Arlington, Virginia)
Capstone Project Reflections From Sponsors
“For us, the level of expertise, and special expertise, of the capstone students gives us ‘extra legs’ and an extra push to move a project forward. The team was asked to provide a replicable prototype air quality sensor that connected to the Cville Things Network, a free and community supported IoT network in Charlottesville. Their final product was a fantastic example that included clear circuit diagrams for replication by citizen scientists.” — Lucas Ames, Founder, Smart Cville
“Working with students on an exploratory project allowed us to focus on the data part of the problem rather than the business part, while testing with little risk. If our hypothesis falls flat, we gain valuable information; if it is validated or exceeded, we gain valuable information and are a few steps closer to a new product offering than when we started.” — Ellen Loeshelle, Senior Director of Product Management, Clarabridge
MSDS Capstone Projects Give Students Exposure to Industry While in Academia
Master's Students' Capstone Presentations
Get the latest news.
Subscribe to receive updates from the School of Data Science.
- Prospective Student
- School of Data Science Alumnus
- UVA Affiliate
- Industry Member
Featured Student Projects
Bank Loan Payment Analysis
Data Analytics Capstone Project by Ng Shao Zhi
Bank Marketing Campaign
Data Analytics Capstone Project by Nur Filzah Bte Jusmani
Bank Customer Identifying Analysis
Data Analytics Capstone Project by Lim Shue Ling
Credit Default Risk Analysis
Data Analytics Capstone Project by Jermaine Lee
Analyzing Customizing Solutions
Data Analytics Capstone Project by Jasmine Teo
Data Analytics Capstone Project by Claudia Lim
Insurance Fraud Analysis
Data Analytics Capstone Project by Michelle Eng
Credit Card Attrition
Data Analytics Capstone Project by Lee Ying Chia
Reducing Fraudulent Claims Analysis
Data Analytics Capstone Project by Lim Si Xian
Entry Points Analysis
Data Analytics Capstone Project by Zeph Han
Credit Card Customers
Data Analytics Capstone Project by Marissa Goh
Customer Service Analysis
Retaining Customer Analysis
Data Analytics Capstone Project by Joel Lim
Analysis on Wealth Management
Data Analytics Capstone Project by Tam Jie Qi
Customer Retention Analysis
Data Analytics Capstone Project by Su Wei Ng
Credit Card Department
Data Analytics Capstone Project by Tan Jin Hui
Bank Service Analysis
Credit Card Fraud Prediction
Data Analytics Capstone Project by Charmaine Neo
Ireland Loan Default Analysis
Data Analytics Capstone Project by Sophia Lim
US Credit Card Fraud Report
Data Analytics Capstone Project by Lili Loi
Online Banking Scams
Data Analytics Capstone Project by Felicia Chua
Identifying Customer Segments
Data Analytics Capstone Project by Joey Tan
Credit Customer Attrition
Health Insurance Analysis
Data Analytics Capstone Project by Victoria Leong
European Bank Customer Retention
Data Analytics Capstone Project by Michelle Leong
S&P 500 Exchange-Traded Fund (ETF)
Data Analytics Capstone project by Daniel C Lim
How to Increase Credit Card Retention Rate
Data Analytics Capstone project by Eugina Pek
Products streamline for our customers
Data Analytics Capstone Project by Tracy Bay
Predicting Annual Premiums for Customers
Data Analytics Capstone project by Suhashini
Home Loan Eligibility Analysis
Data Analytics Capstone project by Jasmine Yeo
Gain A Headstart In Your Career Today
- No suggested jump to results
Google Data Analytics Capstone Project
Name already in use.
Use Git or checkout with SVN using the web URL.
Work fast with our official CLI. Learn more about the CLI .
- Open with GitHub Desktop
- Download ZIP
Sign In Required
Please sign in to use Codespaces.
Launching GitHub Desktop
If nothing happens, download GitHub Desktop and try again.
If nothing happens, download Xcode and try again.
Launching Visual Studio Code
Your codespace will open once ready.
There was a problem preparing your codespace, please try again.
Interested in hearing more about Bay Path University? Please select a program below:
Interested in applying to Bay Path University? Please select an application below:
Applied data science (ms) student capstone projects.
Case Analysis Capstone (ADS670) aims to develop both technical and soft skills that are not directly taught in the traditional courses in the program, but are relevant and critical in order to develop, innovate and communicate in modern data science. This is a project-oriented capstone that will harness the skills gained throughout the program.
Below are some examples of original research studies done by students in our master's in Applied Data Science program for their completed capstone projects.
- Online Degree Explore Bachelor’s & Master’s degrees
- MasterTrack™ Earn credit towards a Master’s degree
- University Certificates Advance your career with graduate-level learning
- Top Courses
- Join for Free
This course is part of the Data Analysis and Interpretation Specialization
Data Analysis and Interpretation Capstone
Financial aid available
About this Course
The Capstone project will allow you to continue to apply and refine the data analytic techniques learned from the previous courses in the Specialization to address an important issue in society. You will use real world data to complete a project with our industry and academic partners. For example, you can work with our industry partner, DRIVENDATA, to help them solve some of the world's biggest social challenges! DRIVENDATA at www.drivendata.org, is committed to bringing cutting-edge practices in data science and crowdsourcing to some of the world's biggest social challenges and the organizations taking them on.
Or, you can work with our other industry partner, The Connection (www.theconnectioninc.org) to help them better understand recidivism risk for people on parole seeking substance use treatment. For more than 40 years, The Connection has been one of Connecticut’s leading private, nonprofit human service and community development agencies. Each month, thousands of people are assisted by The Connection’s diverse behavioral health, family support and community justice programs. The Connection’s Institute for Innovative Practice was created in 2010 to bridge the gap between researchers and practitioners in the behavioral health and criminal justice fields with the goal of developing maximally effective, evidence-based treatment programs. A major component of the Capstone project is for you to be able to choose the information from your analyses that best conveys results and implications, and to tell a compelling story with this information. By the end of the course, you will have a professional quality report of your findings that can be shown to colleagues and potential employers to demonstrate the skills you learned by completing the Specialization.
Could your company benefit from training employees on in-demand skills?
Wesleyan University, founded in 1831, is a diverse, energetic liberal arts community where critical thinking and practical idealism go hand in hand. With our distinctive scholar-teacher culture, creative programming, and commitment to interdisciplinary learning, Wesleyan challenges students to explore new ideas and change the world. Our graduates go on to lead and innovate in a wide variety of industries, including government, business, entertainment, and science.
See how employees at top companies are mastering in-demand skills
Syllabus - What you will learn from this course
Module 1. identify your data and research question.
In this Module, your goal is to review the lectures and readings in the Overview of the Capstone Project, and 1) decide which data set you will use to complete your capstone project. In addition 2) identify your research question, 3) propose a title for your final report, and 4) complete Milestone Assignment 1 as described in the assignment. By the end of this Module you will have drafted a final report Title and Introduction to the Research Question. Your Introduction to the Research Question should include a statement of your research question, your motivation or rationale for testing the research question, and some potential implications of answering your research question.
Module 2. Data Management
In this Module, your goals are to 1) complete the majority of your data management so that you are ready to begin your preliminary statistical analyses; and 2) complete Milestone Assignment 2 as described in the assignment. By the end of this Module you will have drafted a final report Methods section. Your Methods section should include a description of your sample, measures, and the analyses you plan to use to test your research question.
Module 3. Exploratory Data Analysis
In this Module, your goals are to 1) explore your data more extensively through descriptive and basic statistical analyses and data visualization; and 2) complete Milestone Assignment 3 as described in the Assignment. By the end of this module, you will have begun to draft your final report Results section, including some figures.
Complete Your Final Report
In this Module, you 1) will complete your analyses; 2) finish writing your final report, and 3) submit your completed Final Report as the fourth and final assignment. A complete description of what is required for your final report and a detailed grading rubric can be found with the assignment; a sample final report is provided with the materials in the first module.
- 5 stars 77.27%
- 4 stars 20.45%
- 3 stars 2.27%
TOP REVIEWS FROM DATA ANALYSIS AND INTERPRETATION CAPSTONE
Data Analysis and Interpretation Specialization is one of best courses on Coursera.
It really helped me applying what I've learned in the specialization.
Well thoughtful material and assignments that helps to understand and implement the learning from previous courses as well.
Great capstone course.
The instructor was always available for help and guidance throughout the course; she also held many Q&A live video conferences.
I highly recommend taking the course.
About the Data Analysis and Interpretation Specialization
Learn SAS or Python programming, expand your knowledge of analytical methods and applications, and conduct original research to inform complex decisions.
The Data Analysis and Interpretation Specialization takes you from data novice to data expert in just four project-based courses. You will apply basic data science tools, including data management and visualization, modeling, and machine learning using your choice of either SAS or Python, including pandas and Scikit-learn. Throughout the Specialization, you will analyze a research question of your choice and summarize your insights. In the Capstone Project, you will use real data to address an important issue in society, and report your findings in a professional-quality report. You will have the opportunity to work with our industry partners, DRIVENDATA and The Connection. Help DRIVENDATA solve some of the world's biggest social challenges by joining one of their competitions, or help The Connection better understand recidivism risk for people on parole in substance use treatment. Regular feedback from peers will provide you a chance to reshape your question. This Specialization is designed to help you whether you are considering a career in data, work in a context where supervisors are looking to you for data insights, or you just have some burning questions you want to explore. No prior experience is required. By the end you will have mastered statistical methods to conduct original research to inform complex decisions.
Frequently Asked Questions
When will I have access to the lectures and assignments?
Access to lectures and assignments depends on your type of enrollment. If you take a course in audit mode, you will be able to see most course materials for free. To access graded assignments and to earn a Certificate, you will need to purchase the Certificate experience, during or after your audit. If you don't see the audit option:
The course may not offer an audit option. You can try a Free Trial instead, or apply for Financial Aid.
The course may offer 'Full Course, No Certificate' instead. This option lets you see all course materials, submit required assessments, and get a final grade. This also means that you will not be able to purchase a Certificate experience.
What will I get if I subscribe to this Specialization?
When you enroll in the course, you get access to all of the courses in the Specialization, and you earn a certificate when you complete the work. Your electronic Certificate will be added to your Accomplishments page - from there, you can print your Certificate or add it to your LinkedIn profile. If you only want to read and view the course content, you can audit the course for free.
Is financial aid available?
Yes. In select learning programs, you can apply for financial aid or a scholarship if you can’t afford the enrollment fee. If fin aid or scholarship is available for your learning program selection, you’ll find a link to apply on the description page.
More questions? Visit the Learner Help Center .
Build employee skills, drive business results
Learn something new.
- Learn a Language
- Learn Accounting
- Learn Coding
- Learn Copywriting
- Learn Public Relations
- Boulder MS Data Science
- Illinois iMBA
- Illinois MS Computer Science
- UMich MS in Applied Data Science
Popular Data Science Courses
- AWS Cloud A Practitioner's Guide
- Basics of Computer Programming with Python
- Beginners Python Programming in IT
- Developing Professional High Fidelity Designs and Prototypes
- Get Google CBRS-CPI Certified
- Introduction to MATLAB Programming
- Learn HTML and CSS for Building Modern Web Pages
- Learn the Basics of Agile with Atlassian JIRA
- Managing IT Infrastructure Services
- Mastering the Fundamentals of IT Support
Popular Computer Science & IT Courses
- Building a Modern Computer System from the Ground Up
- Getting Started with Google Cloud Fundamentals
- Introduction to Cryptography
- Introduction to Programming and Web Development
- Introduction to UX Design
- Utilizing SLOs & SLIs to Measure Site Reliability
Popular Business Courses
- Building an Agile and Value-Driven Product Backlog
- Foundations of Financial Markets & Behavioral Finance
- Getting Started with Construction Project Management
- Getting Started With Google Sheets
- Introduction to AI for Non-Technical People
- Learn the Basics of SEO and Improve Your Website's Rankings
- Mastering Business Writing
- Mastering the Art of Effective Public Speaking
- Social Media Content Creation & Management
- Understanding Financial Statements & Disclosures
- What We Offer
- Coursera Plus
- Professional Certificates
- MasterTrack® Certificates
- For Enterprise
- For Government
- Become a Partner
- Coronavirus Response
- Free Courses
- All Courses
- Beta Testers
- Teaching Center
- Modern Slavery Statement
Data Analytics and Visualization Capstone Project
Accelerate the knowledge you gain from previous courses in the IBM Data Analyst Professional Certificate program. Assume the role of an Associate Data Analyst and use various skills and techniques on real-world datasets to accomplish a task.
There is one session available:
About this course.
Please Note: Learners who successfully complete this IBM course can earn a skill badge — a detailed, verifiable and digital credential that profiles the knowledge and skills you’ve acquired in this course. Enroll to learn more, complete the course and claim your badge!
This course provides students with the opportunity to assume the role of an Associate Data Analyst who has recently joined an organization. In this role, you will use Data Analytics skills and techniques on real-world datasets to complete a business task.
You will gain useful experience collecting, analyzing, and preparing data to be used for making charts and constructing an interactive dashboard. The course ends with a presentation of your data analysis report that tests your ability to create a compelling story for the various stakeholders in the organization.
This course is a great way to display your Data Analytics skills and prove your proficiency to potential employers.
At a glance
- Institution: IBM
- Subject: Data Analysis & Statistics
- Level: Introductory
- Prerequisites: None
- Language: English
- Video Transcript: English
- Professional Certificate in Data Analyst
- Associated skills: Data Analysis, Presentations
What you'll learn
- Data Collection scraping the internet, and using APIs
- Data Wrangling using various techniques to identify duplicate rows, find missing values, and normalize data
- Exploratory Analysis finding the distribution of data, the presence of outliers and the correlation between different columns
- Data Visualization using developer survey data to create visualizations highlighting the distribution of data, relationships between data, and the composition and comparison of data
- Dashboard Construction using the IBM Cognos Analytics as a platform to create a dashboard that is intuitive, appealing, and easy to understand
- Presentations demonstrating your ability to clarify your analysis and relay your findings in a way that is easy to understand
About the instructors
This course is part of data analyst professional certificate program, ways to take this course, interested in this course for your business or team.
Here are 10 fun and free datasets to get you started in your explorations. 1. National Centers for Environmental Information: Dig into the world's largest provider of weather and climate data. 2. World Happiness Report 2021: What makes the world's happiest countries so happy? 3.
For example, to explore the association between FICO score and default likelihood, Paul first visualized the distribution of FICO scores between individuals with fully paid vs. defaulted loans: From this figure, Paul hypothesized that individuals with charged-off loans would have a lower credit score. To quantify this, Paul performed a t-test:
Data scraping project ideas The Internet Movie Database A good beginner's project is to extract data from IMDb. You can collect details about popular TV shows, movie reviews and trivia, the heights and weights of various actors, and so on. Data on IMDb is stored in a consistent format across all its pages, making the task a lot easier.
Sentiment analysis (AKA "opinion mining") entails using natural language processing (NLP) to determine how people feel about a product, public figure, or political party, for example. Each input is assigned a sentiment score, which classifies it as positive, negative, or neutral.
I will be diving into the background, my full process of cleaning, analyzing and visualizing the data, along with my final suggestions and summary of the data. Quick Links: Tableau Dashboard | Github R Code for Analysis | Github R Code for Tableau Visualization | LinkedIn Post
I completed this Data Analytics Capstone Project as a part of Google Data Analytics Professional Course on Coursera. Check even this blog for more about Business Intelligence v/s Business Analytics…
A Data-Driven Approach to Healthcare IBM Intent Classification from Unlabelled Dataset Explainability and Bias Removal in Natural Language INVITAE Prediction and Optimization of Medical Billing Operations LINCOLN LABORATORY USTRANSCOM Flight Data Analysis Optimizing Lab Procurement with Sparse Vendor Selection
If you are looking to improve your data analysis and data visualization skills, this is a great data science project. For this, FiveThirtyEight obtained Uber's rideshare data and analyzed it to understand ridership patterns, how it interacts with public transport, and how it affects taxis.
This exploratory analysis case study is towards Capstome project requirement for Google Data Analytics Professional Certificate. The case study involves a bikeshare company's data of its customer's trip details over a 12 month period (November 2020 - October 2021). The data has been made available by Motivate International Inc. under this license.
The coins of interest are Bitcoin, Ethereum, and Litecoin. The data for sentiment analysis is pulled from Twitter. The Crypto data is comprised of static files showing minute based Crypto price changes relative to the coins listed above. The model used is a Long Short-Term Memory model (LSTM).
Capstone In their final semester of the UW Data Science program, students are required to take DS 785, the capstone course. Below are example capstone projects to give you an idea of the types of opportunities available to our students. Search Filter Capstone Keyword Search: Clear Filters Hide All filters Capstone Statuses Prospective (1)
Here are some ways to get more leads, either in the form of project ideas or datasets to use. 1. Academic papers 2. Kaggle Competitions 3. Kaggle Datasets 4. reddit.com/r/datasets 5. Awesome...
Watch on M.S. in Data Science students are required to complete a capstone project. Capstone projects challenge students to acquire and analyze data to solve real-world problems. Project teams consist of two to four students and a faculty advisor.
Need example data analytics projects here in Singapore? Vertical Institute is now showcasing capstone projects from our alumni! See samples here! ... Data Analytics Capstone project by Daniel C Lim. How to Increase Credit Card Retention Rate. Data Analytics Capstone project by Eugina Pek.
Go to file. Code. brianvan01 Add files via upload. 501b9d3 2 minutes ago. 2 commits. Capstone project - Brian Van's portfolio.pdf. Add files via upload. 2 minutes ago. README.md.
In this part of the course, you'll be introduced to capstone projects, case studies, and portfolios, as well as how they help employers better understand your skills and capabilities. You'll also have an opportunity to explore online portfolios of real data analysts. 3 videos (Total 15 min), 5 readings, 2 quizzes. 3 videos.
The project will culminate with a presentation of your data analysis report for various stakeholders in the organization. The report will include an executive summary, your analysis, and a conclusion. You will be assessed on both your work for the various stages in the Data Analysis process, as well as the final deliverable.
Below are some examples of original research studies done by students in our master's in Applied Data Science program for their completed capstone projects. Predicting Store Weekly Sales: A Case Study of Walmart Historical Sales Data Predicting Burnout: A Workplace Calculator Beyond Artist: Text-to-Image AI
3 hours to complete. Module 1. Identify Your Data and Research Question. In this Module, your goal is to review the lectures and readings in the Overview of the Capstone Project, and 1) decide which data set you will use to complete your capstone project. In addition 2) identify your research question, 3) propose a title for your final report ...
The MS Analytics program capstone is a career enhancer. Students utilize real data from their organization (or another) and partner with a project coach to build a predictive model. The individual project is completed over the five semesters. Graduates have earned significant raises because of their capstone projects.
Data Analytics and Visualization Capstone Project Accelerate the knowledge you gain from previous courses in the IBM Data Analyst Professional Certificate program. Assume the role of an Associate Data Analyst and use various skills and techniques on real-world datasets to accomplish a task. 6 weeks 2-3 hours per week Self-paced