14 Popular Data Science Project Ideas for Beginners
The best way to get good at Data Science tools and technologies, as a beginner, is to build projects that solve real-world problems. Keeping that in mind, in this blog, we will take a look at the Top 14 Data Science Projects Ideas that you can undertake to upskill yourself.

As a beginner, it can be extremely daunting to understand Data Science, have a good understanding of the concepts involved, and gain hands-on experience in them. One of the best ways to become good at Data Science or anything creative is by deliberately practicing the acquired skills to reinforce them in your brain. For this, you may have to work on various projects but, as a beginner, it can be quite difficult to choose not-very-complicated Data Science projects—some projects may be difficult to implement and some may not help you push yourself to the limits. If all this sounds familiar to you, then this blog is for you.
In this blog, we will discuss the best projects in Data Science for beginners to try out and expand their knowledge and skill set. These Data Science project ideas will also help you get a taste of how to deal with real-world Data Science problems.
This blog will discuss the following topics:

Recommendation System Project
Data analysis project, sentiment analysis project, fraud detection project, image classification project, image caption generator project in python, chatbot project in python, brain tumor detection with data science, traffic sign recognition, fake news detection, forest fire prediction, human action recognition, classifying breast cancer, gender detection and age prediction, tips for a good data science project.
Check out our Data Science Project Tutorial Video on YouTube designed especially for Beginners:
Data Science Project Ideas
Without delay, let us start exploring the most interesting Data Science projects for beginners.

A recommendation system is one of the most important aspects of any content-based application such as blog, e-commerce website, streaming platform, etc. A recommendation system suggests new content to users from the site’s content library or database based on what the users have already viewed and liked. A recommendation system needs data about users and their activities on the site as well as information about the content so that it can be classified and recommended to the users based on their tastes and preferences. A project-based recommendation system is also one of the most popular Data Science project ideas.
These systems can be built by using the following techniques:
- Collaborative filtering: In this technique, the system generates recommendations for users based on other users who have viewed and liked similar things. This technique is good but can end up generating bad recommendations as the users who were used for generating recommendations may have changed their opinion about a movie they had liked in the past, which might lead the engine to recommend a movie that a user similar to you may not like right now. Moreover, the geographical and cultural context of users may make them consider the recommendations to be undesirable.
- Content-based filtering: In this technique, the system generates recommendations for users by recommending content similar to what the users have previously viewed and liked. This technique is much more stable and consistent than collaborative filtering as it relies on the users’ own preferences as well as on the attributes of the available content, which do not usually change over time.
This is one of the most interesting projects. There are many other techniques that are quite advanced and complicated, but these two techniques would be enough for you to build your own recommendation engine as a beginner. You can train the engine to be used for recommending movies, blog posts, products, etc.
- Movie or web show recommendation system
- Product recommendation system
- Blog post recommendation system
Get 100% Hike!
Master Most in Demand Skills Now !

Data analysis is one of the core skills that is needed by a data scientist . In data analysis, you take some data and try to gain insights from it by analyzing it in order to make better decisions. One of the ways in which we can simplify the analysis is by generating visualizations that can be interpreted easily. The scope of data analysis is vast but this is one of the most useful Data Science projects.
Today, data is considered more important than oil. All companies store data about their users and how they interact with the products. This data allows companies to craft better policies and features that help solve customer problems and attract more user engagement with the platform.
Willing to master the most in-demand technology? Enroll in this Data Science course in Kottayam Now!
For example, if you are working on the data of an e-commerce company and find that users from a particular country buy only specific kinds of products, then you can use this information to get a better understanding of why it is happening and to generate better product recommendations for more engagement.
Companies, such as Uber, Amazon, Flipkart, etc., use data analysis to create better offers and generate better quotes to meet customer expectations in the best way possible. It is one of the projects in Data Science that many companies implement in their own ways.
For Data Science projects on data analysis, you can use e-commerce datasets or datasets from ride-hailing apps, such as Uber, Lyft, etc.
- Analysis of cab and weather data
- Analysis of store sales data
- Generate offers using association rule mining
Master the skills to become a top Data Scientist by enrolling for Intellipaat’s Data Science Online Course .

Sentiment analysis is used to add emotional intelligence to systems. It is one of the projects in Data Science that people start with when they wish to learn how to process text. For example, when a user types in a comment on a video or blog post, sentiment analysis can be used to determine if the comment is appreciative, disparaging, critical, etc. These can also be used to classify emails, messages, reviews, queries, etc.
One of the major applications of these kinds of Data Science projects can be seen on public platforms, such as Twitter, Reddit, etc., where users post things that are tagged to indicate the type of content contained in them, i.e., positive or negative, with the help of sentiment analysis. This technique helps companies to understand, process, and tag even unstructured text.
These projects on sentiment analysis can be quite useful for various companies. Sentiment analysis can also be used to analyze and make sense of reviews, complaints, queries, emails, product descriptions, etc. For instance, you can use sentiment analysis to generate tags for such content as being negative, positive, neutral, etc.
Career Transition

Use Cases :
- For classifying emails as positive or negative
- For labeling tweets as positive or negative
- For categorizing emotions on an audio based on speech patterns

Fraud detection is one of the most important Data Science projects and also one of the most challenging for final-year students. With many forms of online and digital transactions being used widely, the chances of them being fraudulent are increasingly high. Since any form of digital transaction generates data regarding current and previous transactions, as well as customer purchase records, you can use these data and Data Science techniques to identify if the transactions are potentially fraudulent.
Any transaction done digitally is bound to create some data. When a customer uses a digital medium to make a payment, you can use this generated data with the trained model to flag the transaction as potentially fraudulent, which can later be dealt with and reviewed. This is one of the most important projects to practice in case you wish to be able to build Machine Learning models based on data about user activity.
Large amounts of money are being digitally transferred every day; thus, you should be able to classify if these records are fraudulent or not. To do this, you have to create models that are trained on the data collected from previous transactions. These models use and analyze factors such as the amount transferred, the location it is transferred from, the location to which it is transferred, etc. These factors are taken into account when new transactions take place, and then, based on these factors, they are flagged as fraudulent or authentic transactions.
- Credit card fraud detection
- Transaction records fraud detection
Preparing for job interviews? Go through our list of most-asked questions on our blog on Data Science Interview Questions and Answers .

Image classification is one of the Data Science projects that can be used to classify and tag images based on their content. Image classification is widely used in the fields of science, security, etc. This is also among the most important applications of Data Science as it is very difficult to classify images with traditional application programming. Earlier, a lot of time and research was required to generate complicated rules and image transformations to classify images, and the result was still quite prone to errors. With Data Science, you can create models by training them with a lot of labeled images. These models can then generate Machine Learning classification rules on their own, and you can feed new images to be classified by the classification rules.
In Data Science projects like these kinds of classifications can be done by using several algorithms, and it is better to use several algorithms to find the one that performs the best for your dataset . You will have to make sure to use a large collection of images with good resolution for training and testing purposes. Image classification also requires you to have a good grasp of fundamental image concepts and manipulation techniques such as image reshaping, resizing, edge detection, etc.
Courses you may like

- Digit recognition system
- Facial detection system
- Gender and age detection system

Any social media application that allows storing and sharing images lets users provide captions to those images. The captions are given to provide more context and necessary information about the images. The captions also help in things such as Search Engine Optimization (SEO), content ranking, etc. In blogs, having a caption or good description of what a particular image contains can be very helpful for the readers. Captions also help with accessibility and allow screen reader software to help people with disabilities get a better understanding of the content of the image. Generating these captions can be one of the most challenging Data Science projects.
However, in many cases, generating captions is a long and tedious process, especially when there are a lot of images. To solve this issue, you can generate captions based on what is actually shown in the image. The captions will serve as descriptions of what the images have in them, e.g., a man surfing, a dog smiling, etc.
To do this, you need to understand and use neural networks , especially convolutional neural networks (CNNs) and long short-term memory (LSTM). There are a lot of large datasets available to do this task such as Flickr8k dataset. If training a new model is not possible on your current machine, then you can use the available pretrained models as well. Image Caption Generator is one of the best Data Science projects to understand how to process images using neural networks.
- Twitter hashtag generator for images
- Facebook image caption generator
- Blog post image alt-text generator
Thinking of getting a master’s degree in Data Science? Enroll in the MSc in Data Science in India !

Chatbots are one of the most essential parts of any customer-centric app of the day. They help in the better tracking of customer issues, faster issue resolution, and generating commands using normal text. For example, many bots on platforms, such as Slack and GitHub , allow you to perform certain tasks just by writing and sending them requirements in the chat box. Chatbots also help customers get resolutions to their grievances without any human interaction. For example, food delivery apps, such as Uber Eats and DoorDash, use chatbots to assist users to resolve common issues including refunds, missing food items, incorrect items, etc.
There are two types of chatbots:
- Domain-specific chatbots: A domain-specific chatbot is a chatbot that can be used to answer questions based on a particular domain, such as healthcare, engineering, etc. So, it needs to be customized quite effectively to suit our needs.
- Open-domain chatbots: An open-domain chatbot is a chatbot that can be used to ask questions about any domain, which means that it does not require careful customizations. However, it does need a large volume of data from where to learn.
Data Science projects like these make extensive use of Natural Language Processing (NLP). Implementing a chatbot requires a good grasp of concepts related to NLP, access to a dataset that contains the patterns that you need to find and the responses that you have to return to the user.
- Customer care using a chatbot
- Customer feedback using a chatbot
- Quote generation using a chatbot

There are many applications of Data Science in the healthcare field as well. One of these is brain tumor detection. In this project, you will take a lot of labeled images of MRI scans and train a model using them. Once the model is well-trained, you will use it to check an MRI image to see if there is any chance of detection of a brain tumor.
To implement these kinds of Data Science projects, you need access to MRI scan images of the human brain. Thankfully, there are datasets available on Kaggle. All you have to do is use these images to train your model so that, when fed with similar images, it can classify them as detecting a brain tumor or not. Though such models do not completely remove the need for a consultation from a domain expert, they do help doctors get a quick second opinion.
- Brain tumor detection using MRI images
- Brain tumor detection using vital information
- Brain tumor detection using patient history

Nowadays, one of the most popular applications of Data Science is self-driving cars. Although a self-driving car could be very difficult and expensive to work with, you can implement a specific and important feature needed in a self-driving car, which is traffic sign recognition.
In this project, you will use images of different traffic signs and label them, depicting what the signs are indicating. The more images there are, the more accurate the model will be, though it will take longer to train the model. You will start by using convolutional neural networks (CNNs) to build the model with images that are labeled with what is being indicated by a specific traffic sign. Your model will learn with the help of these images and labels. Next, when a new image is given as the input, the model will be able to classify it.
- Gesture recognition system
- Sign language translator
- Product quality checking system
Looking to get started with Data Science? Check out our comprehensive Data Science Tutorial for Beginners now!

A recent study done by MIT claims that fake news spreads six times faster than real news. Fake news is becoming a great source of trouble in all spheres of life. It leads to a lot of problems around the globe, ranging from political polarization, violence, and propagation of misinformation to religious and cultural conflicts. It is also troubling that more and more unverified sources of information, especially social media platforms, are gaining traction; this is doubly concerning as these platforms do not have systems in place to distinguish between fake news and real news.
To tackle a problem like this, especially on a smaller scale, you can use a dataset that contains fake news and real news labeled in the form of textual information. You can use NLP and techniques such as Term Frequency-Inverse Document Frequency (TF-IDF) Vectorizer. This allows you to enter some text from a news article to get a label that tells if it is fake news or real news. It is important to note that these labels may not be 100 percent accurate, but they can give a good approximation to know what is correct or real.
- Fake news checker
- Fact checker
- Information verification system
Building a forest fire prediction model can be a great data science project. Forest fire or wildfire are known to be uncontrollable and capable of causing a large amount of damage. You can apply k-means clustering to manage wildfires as well as assume their disrupted nature. It will also help to spot the major fire hotspots and their severity.
This model can also be useful in the proper allocation of resources. Meteorological data can be used to search for specific periods and seasons for wildfires to increase the accuracy of the model.
Become a Data Science engineer with expertise in Python. Enroll in Data Science with Python Certification in Philippines
This model will attempt to execute classification based on human actions. The human action recognition model will analyze short videos of human beings performing specific actions.
This Data Science project will require the use of a complex neural network that is trained on a specific dataset containing short videos. Accelerometer data is associated with the dataset. First, the accelerometer data conversion is performed along with a time-sliced representation. The Keras library is then used to train, validate, and test the network based on these datasets.
Breast cancer cases are on the rise, and early detection is the best possible way to take suitable measures. A breast cancer detection system can be built by using Python. You can use the Invasive Ductal Carcinoma (IDC) dataset carrying the histology images for cancer-inducing malignant cells. The model can be trained based on this dataset.
Some useful Python libraries that will be helpful for this Data Science project are NumPy, Keras, TensorFlow, OpenCV, Scikit-learn, and Matplotlib.
Gender Detection and Age Prediction with OpenCV is an impressive Data Science project idea that can easily grab a recruiter’s attention if it is on your resume. This real-time Machine Learning project is based on computer visioning.
Through this project, you will come across the practical application of convolutional neural networks (CNNs). Eventually, you will also get the opportunity to implement models that are trained by Tal Hassner and Gil Levi for Adience dataset collection. This collection contains unfiltered faces and working with them will help with gender and age classification.
The project may also require the use of files such as .pb, .prototxt, .pbtxt, and .caffemodel. This project is very practical, and the model can detect any age and gender via an image using single face detection.
While gender and age ranges can be classified with this model, due to various factors, such as makeup, poor lighting, or unusual facial expressions, the accuracy of the model can become a challenge. Therefore, a classification model instead of a regression model can be used.
Now, let us discuss some key aspects of a good Data Science project:
- Language: You can use any programming language of your choice, whatever you are comfortable with and is familiar to you. Just make sure that the language you are using is a popular one so that other people can collaborate and understand your code and can help you with it. But still, some of the most popular languages for data science are R and Python. Data Science projects in Python are especially useful as it is more widely used than R.
- Datasets: You can get datasets from several sources, but make sure that you are using a large enough dataset that does not contain a lot of errors and incorrect data. In case your dataset has many errors, try removing those errors or use another dataset. To get good datasets, try using Kaggle or UCI Machine Learning Repository.
- Visualizations: Before training your model, try to get a good understanding of the dataset through visualization . You can find useful information, including correlated columns, bias, etc., in your dataset through visualizations. If any issue is found in your dataset, such as the dataset being skewed, biased, or having outliers, try rectifying the same before proceeding.
- Data cleaning: Make sure that the data you are using is clean and usable. The reason is that the data with a lot of errors will lead to a terrible performance of the model.
- Data transformation: In case you use multiple datasets from different sources, it can be difficult to merge them as they can be quite different from each other. For example, different datasets may end up using different formats for dates, different measurement units based on specific geographical locations, etc.; so, you may have to transform the data to make it standardized to train your model.
- Validation: Try to validate your model’s accuracy by using multiple slices of your dataset with the help of techniques such as stratified k-folds cross-validation to get a more accurate performance from your model. If you find issues, try digging deeper to rectify them.
In this blog, we have discussed the most relevant real-time Data Science projects as well as some tips for beginners to be able to better utilize their skills and tackle some real-world problems using various datasets. Hopefully, this blog was helpful and informative to you.
You can also explore this Data Science course in Pune to know more about Data Science projects!
Course Schedule
Leave a Reply Cancel reply
Your email address will not be published. Required fields are marked *
Looking for 100% Salary Hike ?
Speak to our course Advisor Now !

Related Articles

What is Data Science?
Updated on: Mar 01, 2023

How to Learn Data Science?

Data Scientist Salary: How much does a Data Scient...
Updated on: Mar 02, 2023

Different Data Science Job Profiles
Associated courses.

Data Science Course Online
- (591 Ratings)

PGP in Data Science and Machine Learning - Job Gua...
- (2654 Ratings)

M.Sc in Data Science by IU
- (1236 Ratings)

PG Program in Data Science
- (467 Ratings)
PG Program in Data Analytics
- (2765 Ratings)

Advanced Certification in Data Analytics for Busin...

Master of Science in Data Science
- (1467 Ratings)
All Tutorials

Data Science Tutorial for Beginners

Machine Learning Tutorial for Beginners
Updated on: Nov 28, 2022

Artificial Intelligence Tutorial for Beginners

Statistics and Probability Tutorial
Updated on: Apr 22, 2022

R Programming Tutorial for Beginners - Learn R
Updated on: Jan 10, 2023
Subscribe to our newsletter
Signup for our weekly newsletter to get the latest news, updates and amazing offers delivered directly in your inbox.
Download Salary Trends
Learn how professionals like you got upto 100% hike!
Course Preview
Expert-Led No.1
In their final semester of the UW Data Science program, students are required to take DS 785 , the capstone course. Below are example capstone projects to give you an idea of the types of opportunities available to our students.
Using Mock Draft Data to Create a Player Availability Dashboard for the NFL Draft

A Practical Data Science Application: Developing Prediction Models for Product Inventory Reduction and Ongoing Monitoring to Create Efficiency

An In-Depth Review Customer Segmentation, Recommendation Systems, and the Benefits of Combined Use
Time-series forecasting of maple tree sap harvesting.

Comparative Study on Employee Turnover

The Development of Feed Type Classification Algorithms for a Commercial Testing Laboratory

Daily Driving Route Optimization for Small Businesses Using Metaheuristics

Cost Analysis of a Local Union’s Digital Transformation
Examining and predicting the university of wisconsin’s system library ebook usage.

Advertisement campaign targeting attributes recommendation engine

Qlik Application Creation for Deeper Analysis of Department of Defense Budget

Exploring Rural Road Crash Data with Statistical Models

- Program Information
- Get Started
- Experience UW Data Science
780 Regent Street Suite 130 Madison WI, 53715
Advising: 608-800-6762 [email protected]
Current students can email: [email protected]
Technical Support: 1-877-724-7883
A Collaboration of the University of Wisconsin System

50 Best Data Science Project Ideas You Must Know in 2023

Have you learned Data Science? … If yes then your next step should be Data Science Projects . Because without working on Data Science Projects, you can’t excel in this field. That’s why in this article, I am going to share the 50 Best Data Science Project Ideas with you.
I have categorized these Data Science Project Ideas into three sections- Beginners, Intermediate, and Advanced. You can easily pick the project idea based on your knowledge level.
Now, without any further ado, let’s get started-
Best Data Science Project Ideas
For your convenience, I have created a table from where you can easily pick the most suitable Data Science Project Idea for you.
Let’s start with the Beginner Level Best Data Science Project Ideas –
Beginner-Level Data Science Project Ideas
Intermediate-level data science project ideas, advanced-level data science project ideas.
So these are the 50 Best Data Science Project Ideas . I hope you have found the most suitable project in this article for you. For more project ideas, you can check Kaggle , DataCamp , Coursera , DataFlair , etc.
If you have any questions, feel free to ask me in the comment section. I am here to help you. And If you found this article helpful, share it with others to help them too.
All the Best for your Data Science Journey!
Happy Learning!
Related Article
10 Best Online Courses for Data Science with R Programming 8 Best Free Online Data Analytics Courses You Must Know in 2023 Data Analyst Online Certification to Become a Successful Data Analyst 8 Best Books on Data Science with Python You Must Read in 2023 14 Best+Free Data Science with Python Courses Online- [Bestseller 2023] 10 Best Online Courses for Data Science with R Programming in 2023 8 Best Data Engineering Courses Online- Complete List of Resources Best Course on Statistics for Data Science to Master in Statistics 8 Best Tableau Courses Online– Find the Best One For You! 8 Best Online Courses on Big Data Analytics You Need to Know Best SQL Online Course Certificate Programs for Data Science 7 Best SAS Certification Online Courses You Need to Know
Explore More about Data Science , Visit Here
Subscribe For More Updates!
[mc4wp_form id=”28437″]
Though of the Day…
‘ It’s what you learn after you know it all that counts.’ – John Wooden
Leave a Comment Cancel Reply
Your email address will not be published. Required fields are marked *
Save my name, email, and website in this browser for the next time I comment.
8 Awesome Data Science Capstone Projects from Praxis Business School
Introduction.
It is not the strongest or the most intelligent who will survive but those who can best manage change.
Evolution is the only way anything can survive in this universe. And when it comes to industry relevant education in a fast evolving domain like Machine Learning and Artificial Intelligence – it is necessary to evolve or you will simply perish (over time).
I have personally experienced this first hand while building Analytics Vidhya. It still amazes me to see where we started and where we are today. During this period, there have been several ups and downs, several product launches, product re-launches and what not! But one thing has been a constant in our story – constant evolution!
So, when I got an invite to be a judge on the panel judging Capstone projects done by students of PGP in Data Science with ML & AI program at Praxis Business School, the same school where I had reviewed the program almost 4 years back – I was curious. I was curious to see and learn how their evolution had panned out.

My interaction with the students four years ago was quite different from my experience sitting in a panel of judges for Capstone projects. You get to see the final outcome coming from a rigorous program as opposed to just having a classroom interaction. This is like the proof of the pudding!
I was hoping to find out answers to 2 broad questions in the process:
- How has the program evolved over the years?
- What kind of projects are students currently doing and how industry relevant were they?
With those questions in mind – I boarded an early morning flight to Bengaluru and was in the Praxis campus by 9:00 a.m. Since the evaluations were supposed to start at 10:30 a.m., I had some time on my hand.
I used this time to catch up with the course faculty Gourab Nath , and other judges of our esteemed panel – Suresh Bommu (Advanced Analytics Practice Head at Wipro Limited) and Rudrani Ghosh (Director at American Express Merchant Recommender and Signal Processing team).
I also grabbed some authentic South Indian breakfast in the process. 🙂
Program Details and Capstone Projects
For people who are not aware – Praxis Business School offers a year-long program – PGP in Data Science with ML & AI at both its campuses – Kolkata and Bengaluru. The program is structured in a manner where the first 9 months are spent in the classroom with in-house and industry faculty and the last 3 months are spent as an intern with an industry partner.
The Capstone project happens before the internship actually starts. So, students spent a total of 9 months in the classroom and had been doing these projects for the last 3 months (month 6 – month 9 in the curriculum).
How has the Program Evolved over the Years?
The last time I had visited Praxis was in 2015 and I was dead sure that the program would have evolved. The question was how much? In which direction? What are the key takeaways for the students and how are the students from Praxis doing in the real world?
So, let me share my findings based on the interaction with Gourab and the rest of the panel.
How Much has the Program Evolved? In which Direction?
The first noticeable change was the name of the program itself. Back in 2015, the Program was called PGP in Business Analytics as most of the material in the course was related to Business Analytics and Statistical Modelling.
Over time, the program has evolved a lot – I was surprised to see the number of topics that are covered in the program. Here is a screenshot of topics covered in the curriculum, picked directly from their site:

The program has clearly evolved a lot. It not only includes Machine Learning and Deep Learning, but also Big Data Tools and Business-Focused topics. As far as I can see – the program has evolved a lot and has become a comprehensive course for data scientists.
What are the key takeaways for the students undergoing the program?
I think the best way to judge this is to look at the projects. So – I held this off and the projects were sufficient proof by themselves.
Needless to say, I was pretty excited by these discussions and with the context of this evolution – I was ready for what the rest of the day was supposed to be.
Here are the views of Gourab Nath, part of the judging panel and Assistant Professor of Praxis’ Data Science Program:
Collection of images is a challenging task for projects that involves topics like face recognition. Previously we were using an approach which was a little time-consuming. So, this time we decided to take a more systematic approach to collect the images that can massively same time of our participants. The teams working on such projects designed and developed an easy-to-handle application for facial image collection. A participant was requested to sit in front of the computer where we had the software running and all he/she needed to do was to enter his/her names and press a capture button to start the image collection process.
The students at Praxis Business School are highly encouraged not to be hugely dependent on the tools and the packages and focus more on writing algorithms. This approach helps them to code better no matter what programming languages they use.
Capstone Projects by Current Passing out batch at Praxis Business School

A glance at the list of projects confirmed my views until now. I could see projects on Machine Learning, Natural Language Processing (NLP) and Computer Vision (CV).
More importantly – it looked like these projects were not based on some open datasets. The problems mentioned were unique and I was not aware of many open datasets addressing these problems. Now, I was curious and excited to see what students have and how they have done.
Here’s the list of Capstone Projects done by students at Praxis Business School:
- Detection of Spam Reviews
- Opinion Mining on Mobile Phone Features
- Drowsiness Detection using Computer Vision
- Gesture Recognition using Computer Vision
- Team Selection using Computer Vision
- Attendance Tracking System using Computer Vision
- Recommender System for Fashion Apparel
- Nearest Document Search
Just to put things in perspective – most of the students presenting to us did not have any knowledge of predictive modeling and machine learning till July 2018 – when they started with the program.
Details of the Capstone Projects
Let’s look at each capstone project in a bit more detail to understand what it was about plus the tools and techniques used in each project.
Project 1 – Detection of Spam Reviews
Customer reviews have a huge influence on potential buyers of any product. A number of false reviews may drive the influence either in a positive direction or a negative direction. Any of these cases may make the customers take wrong decisions and the trustworthiness of the online opinions could be an issue.
In this project, we investigate opinion spam in reviews.
Note that this problem is different from email spam classification. Email spam usually refers to unsolicited commercial advertisements to attract people towards some products or services and hence they usually contain some prominent features.
Our specific problem is more challenging because untruthful opinion spam is much harder to deal with. These kinds of spamming material can be carefully crafted and made indistinguishable.
Techniques: Shingle Method, n-grams, Feature Extraction
Project 2 – Opinion Mining on Mobile Phone Features
You open amazon.com and find that lots of customers have given great reviews about a well-branded mobile phone you are interested in. You wonder – are these good reviews due to the camera of the phone? Or, how good is the battery of the phone? And what about the display?
While the number of reviews is really large and its almost impractical for the readers to go through all of them for evaluating the product, answers to these kinds of questions can be really helpful in making useful decisions.
In this project, our focus is to identify various features of a mobile phone that the customers are talking about in their reviews and mine the customers’ opinion on these features.
Further, we focus on identifying the polarity of these opinions and summarize the reviews. Finally, we develop a user-interface that summarizes the opinions about the features of the phone and rank the customer reviews based on its utility. We also propose an architecture that can perform the same on the reviews of any mobile phones.
Tools: Python [Packages: NLTK, SpaCy, sklearn], Wix.com (for the website creation)
Techniques : Fuzzy Matching, POS tagging, Association Rules Mining, Compactness Pruning, Redundancy Pruning, identifying sentiments based on the word list and weights in AFINN and WordNet
Check out a demonstration of this project below:
Project 3 – Drowsiness Detection using Computer Vision
How many times has this happened to you – you started a movie on your computer at night and fell asleep in the middle of it? And when you woke up the next day, you simply have no clue about how far you watched it? Happens to the best of us.
In this project, we focus on developing an application that will be able to detect if you are asleep and automatically pause the video for you. The system waits to see if you wake up in the next 30 minutes. In case you don’t, it will save a snapshot of the screen, close all the windows and shut down your computer automatically.
Tool: Python, Open CV, Tensorflow, Keras
Techniques: Viola-Jones algorithm on Rapid Object Detection using a Boosted Cascade of Simple Features, Inception V3, LSTM
Project 4 – Gesture Recognition using Computer Vision
Picture this – you are watching a video on your computer but are feeling way too lazy to use the mouse or the keyboard to control the video player. Sounds familiar?
We have a solution for you!
In this project, we focus on making the computer recognize some special gestures which will enable one to control a video player by just using those gestures.
For example, showing your palm in front of the system will enable the pause and the un-pause function. You will also be able to control the volume, fast forward a video or rewind it. You will also be able to do a wide range of other things like changing the slides of your PPT, changing pages, scrolling, etc. without grabbing your mouse or keyboard.
Techniques: Green Screen (for background subtraction), Single-Shot Multi-box Detector (SSD)
Project 5 – Team Selection using Computer Vision
Students are asked to create teams for their projects or their assignments, which is of course a very common thing in every school and college. The class representative (CR) creates a Google spreadsheet and shares it with everyone.
Students, after deciding who they want to team up with, populate the spreadsheet with the names of their team members. But the CR must remember the rules given by their Professor – the team size should be three and every team must have one female member at least.
So, the CR checks the restrictions and if everything is fine, he/she shares it with the Professor. This is one way to do it.
Or, you can do it the smart way.
You stand with your teams in front of the computer, the computer checks the restrictions, recognizes you, and fills in the database with your names and photos.
But remember, the computer won’t allow you to register if the constraints are not satisfied or when at least one of the members in your team is already registered as members of any other team. So, you cannot fool it!
Techniques: VGG-NET 19, HOG Detector
Project 6 – Attendance Tracking System using Computer Vision
In this project, we developed a system to record class attendance using computer vision.
After a faculty enters the system using a password and sets the period, the camera opens up to capture the picture of the class. The number of snapshots of the class is first passed through a face detector followed by a face recognizer.
After the system recognizes the students, it updates the attendance spreadsheet and saves the captured image in its respective image directory – labeling it by the date and time of the day. The unidentified students are marked as absent.
Techniques: Haar Cascade Classifier, HOG, Siamese Model (One Shot Learning), kNN
Project 7 – Recommender System for Fashion Apparel
The use of a recommender system in e-commerce companies is a highly targeted approach that can generate a high conversion rate. These systems help customers discover the products which they might be interested in and will likely purchase.
In this project, we have created a recommender system for a small fashion apparel industry that: Allows the customers to search by the image of a product Gives a personalized recommendation to the heavy buyers, and Displays the most frequently purchased item for the selected item
Tools: Python
Techniques: kNN, Collaborative Filtering, Content-Based Filtering, Autoencoders
Here’s a demo video of this project:
Project 8 – Nearest Document Search
In this project, we have created a nearest document search engine for News reading. The application will not just recommend you related news but also give you the sentiment and highlight important words associated with the news. If the news is big and you do not want to read the full news, fair enough, this app will have a summarized version ready for you.
Techniques: kNN, KDTree, Word Cloud, Lex Rank Summarizer
How relevant were these projects for the Industry?
One of the most critical questions I had was – are these projects industry relevant? Bridging the gap between academia and industry has been a significant challenge in data science. It turns out the answer is quite comprehensive.
In the last 4 years, the number of companies hiring has increased 4 times (from 15 in 2015 to 60 in 2018-19) and the average salary has doubled (5LPA in 2015 to 9LPA in 2018-19).
So, here are the thoughts of my fellow panelists on this topic:
“I am very impressed on the scope, objectives, and contents of the capstone projects executed by Praxis students. The majority of the projects are around the application of deep learning concepts which they have learned as a part of the course work. The entire project execution and development activities were well planned and organized. Starting from defining the problem statement, challenges, real-time application and finally presenting the results.” – Suresh Bommu, Advanced Analytics Practice Head at Wipro Limited
“What really stood out for me was the effort put in by students in attempting to create an end-to-end product with a UI as well as the variety of projects and its extended application.” – Rudrani Ghosh, Director at American Express Merchant Recommender and Signal Processing team
Key Takeaways from the day
I loved the day and would live it again without second thoughts. But there were a few things which stood out for me:
- There was a stark difference in the projects which students were doing currently. In a period of 9 months, they have completed learning the subject and have completed a Capstone project. This would not have been possible without the efforts of students themselves and the faculty members.
- Most of these projects exposed students to the perils of design thinking, creating and collecting the dataset and cleaning it. I just loved this aspect. I am sure the students realised that building a deep learning model is far easier than actually collecting the data for it.
- I also loved the way students presented their projects. They created video teasers and demo sessions to bring out the work they had done.
It was great to see the high level of projects presented by these students. As I mentioned, I was glad to see the students picking up challenging problems on not openly available datasets.
At the end of the day, I had to rush back to the airport. Day trips to Bengaluru are bad! And the fact that I had to rush through projects for a few students only made it worse. I would have loved to spend more than a day – the Energy of the class, the faculty and the judges was infectious 🙂 Looking at these projects – I can confidently say that Praxis Business School continues to offer one of the best full time program in Machine Learning and Deep Learning in India.

About the Author

Kunal is a post graduate from IIT Bombay in Aerospace Engineering. He has spent more than 10 years in field of Data Science. His work experience ranges from mature markets like UK to a developing market like India. During this period he has lead teams of various sizes and has worked on various tools like SAS, SPSS, Qlikview, R, Python and Matlab.
Our Top Authors

Download Analytics Vidhya App for the Latest blog/Article
One thought on " 8 awesome data science capstone projects from praxis business school ".
Ramdas says: April 29, 2019 at 9:30 pm
Leave a reply your email address will not be published. required fields are marked *.
Notify me of follow-up comments by email.
Notify me of new posts by email.
Top Resources

30 Best Data Science Books to Read in 2023

How to Read and Write With CSV Files in Python:..

Understand Random Forest Algorithms With Examples (Updated 2023)

Feature Selection Techniques in Machine Learning (Updated 2023)
Welcome to India's Largest Data Science Community
Back welcome back :), don't have an account yet register here, back start your journey here, already have an account login here.
A verification link has been sent to your email id
If you have not recieved the link please goto Sign Up page again
back Please enter the OTP that is sent to your registered email id
Back please enter the otp that is sent to your email id, back please enter your registered email id.
This email id is not registered with us. Please enter your registered email id.
back Please enter the OTP that is sent your registered email id
Please create the new password here, privacy overview.
- Data Science
Job guarantee
- How it works Overview Job guarantee Payment options Scholarships
- Students Student Outcomes Student Stories Community
12 Data Science Projects To Try (From Beginner to Advanced)
In this article
What Is a Data Science Project?
Data science projects to try, datasets for data science project ideas, tips for creating interesting data science projects, data science projects faqs.

From breast cancer detection to user experience design, businesses across the globe are leveraging data science to solve a wide range of problems. Every mobile/web-based product or digital experience today demands the application of data science for personalization, customer experience, and so on. This opens up a world of opportunities for data science professionals.
To land a data science job, however, early career professionals need more than just a strong theoretical foundation. Hiring managers today are looking for data scientists who have the hands-on experience of delivering projects that solve real-world problems. Even before you land your first job, you need to have ‘experience’ demonstrating your ability to deliver them. No sweat. We’ve brought help.
A data science project is a practical application of your skills. A typical project allows you to use skills in data collection, cleaning , analysis, visualization, programming, machine learning, and so on. It helps you take your skills to solve real-world problems. On successful completion, you can also add this to your portfolio to show your skills to potential employers.
Whether you’re a complete beginner or one with advanced skills, you can gain hands-on experience by trying out projects on your own or working with peers. To help you get started, we’ve curated a list of the top 15 interesting data science projects to try. See what catches your fancy and get started!
Beginner Data Science Projects
“eat, rate, love”—an exploration of r, yelp, and the search for good indian food.

When it comes time to eat, many people turn to Yelp to choose the best options for the type of food they’re looking for. They search, eat, rate, and leave reviews for the restaurants they’ve visited. This makes Yelp a great source of data to run data science projects.
A Springboard Data Science Bootcamp graduate Robert Chen chose this data to explore if the best reviews led to the best Indian restaurants. Chen discovered while searching Yelp that there were many recommended Indian restaurants with similar scores. Certainly, not all the reviewers had the same knowledge of this cuisine, right? With this in mind, he took into consideration the following:
- The number of restaurant reviews by a single person of a particular cuisine (in this case, Indian food). He was able to justify this parameter by looking at reviewers of other cuisines, such as Chinese food.
- The apparent ethnicity of the reviewer in question. If the reviewer had an Indian name, he could infer that they might be of Indian ethnicity, and therefore more familiar with what constituted good Indian food.
- He used Python and R programming languages.
His modification to the data and the variables showed that those with Indian names tended to give good reviews to only one restaurant per city out of the 11 cities he analyzed, thus providing a clear choice per city for restaurant patrons.
Yelp’s data has become popular among newcomers to data science. You can access it here . Find out more about Robert’s project here .
Customer Segmentation with R, PCA, and K-Means Clustering

Marketers perform complex segmentation across demographic, psychographic, behavioral, and preference data for each customer to deliver personalized products and services. To do this at scale, they leverage data science techniques like supervised learning.
Data scientist Rebecca Yiu’s project on market segmentation for a fictional organization, using R, principal component analysis (PCA), and K-means clustering, is an excellent example of this. She uses data science techniques to identify the prospective customer base and applies clustering algorithms to group them. She classifies customers into clusters based on age, gender, region, interests, etc. This data can then be used for targeted advertising, email campaigns, and social media posts.
You can learn more about her data science project here .
Road Lane Line Detection

To follow lane discipline, self-driving cars need to detect the lane line. Data science and machine learning can play a crucial role in making this happen. Using computer vision techniques, you can build an application to autonomously identify track lines from continuous video frames or image inputs. Data scientists typically use OpenCV library, NumPy, Hough Transform, Spacial Convolutional Neural Networks (CNN), etc., to achieve this.
You can access a sample video for this project from this git repository here .
Intermediate Data Science Projects
Nfl third and goal behavior.

The intersection of sports and data is full of opportunities for aspiring data scientists . Divya Parmar, a lover of both, decided to focus on the NFL for his capstone project during Springboard’s Introduction to Data Science course. His goal was to determine the efficiency of various offensive plays in different tactical situations.
Parmar collected play-by-play data from Armchair Analysis, and used R and RStudio for analysis. He developed a new data frame and used conventional NFL definitions. Through this project, he learned to:
- Assess the problem
- Manipulate data
- Deliver actionable insights to stakeholders
You can access the dataset here .
Who’s a Good Dog? Identifying Dog Breeds Using Neural Networks

Image classification is one of the most popular and widely in-demand data science projects. Classifying dogs based on their breeds by looking into their image is a highly loved data science project. Garrick Chu , a graduate of Springboard’s Data Science Career Track, chose this for his final year submission.
One of Garrick’s goals was to determine whether he could build a model that would be better than humans at identifying a dog’s breed from an image. Because this was a learning task with no benchmark for human accuracy, once Garrick optimized the network to his satisfaction, he went on to conduct original survey research to make a meaningful comparison.
He worked with large data sets to effectively process images (rather than traditional data structures) with network design and tuning, avoiding over-fitting, transfer learning (combining neural nets trained on different data sets), and performing exploratory data analysis.
To do this, he leveraged neural networks with Keras through Jupyter notebooks. You can explore more of Garrick’s work here and access the data set he used here .
Uber’s Pickup Analysis

Is Uber Making NYC Rush-Hour Traffic Worse? —This was one of the four questions answered by FiveThirtyEight, a data-driven news website now owned by ABC. If you are looking to improve your data analysis and data visualization skills, this is a great data science project.
For this, FiveThirtyEight obtained Uber’s rideshare data and analyzed it to understand ridership patterns, how it interacts with public transport, and how it affects taxis. They then wrote detailed news stories supported by this data analysis. You can read their work of data journalism here . You can access the original data on Github .
Predicting Restaurant Success

Here is another Yelp-based project, but more complex than the one we discussed earlier. Data scientist Michail Alifierakis used Yelp data to build his “Restaurant Success Model” to evaluate the success/failure rates of restaurants. He uses a linear logistic regression model for its simplicity and interpretability, optimized for the precision of open restaurants using grid search with cross-validation.
This is a great data science use case for lenders and investors, helping them make profitable financial decisions. You can learn more about the project from here and take a look at the code on GitHub .
Predictive Policing

Many law enforcement agencies worldwide are moving towards data-driven approaches to forecasting and preventing crimes. They leverage data science technologies to automate the pattern detection process that will help to reduce the burden on crime analysts. Data scientist Orlando Torres launched a data science project on predictive policing, albeit to unexpected results. He used data from the open data initiative and trained the model on 2016 data to predict the crime incidents in a given zip code, day, and time in 2017. He used linear regression, random forest regressor, K-nearest neighbors, XGBoost, and deep learning model — multilayer perceptron.
With this data science project, he learned that it is very easy to lose explainability while building models. He writes, “if we start sending more police to the areas where we predict more crime, the police will find crime. However, if we start sending more police anywhere, they will also find more crime. This is simply a result of having more police in any given area trying to find crime.” Given the number of law enforcement agencies using data science for policing, it almost feels like a self-fulfilling prophecy.
You can read more about his project here .

Building Chatbots

Today, businesses are automating their customer services with chatbots. Creating your own chatbot can be a great data science project too. The two types of chatbots available today are domain-specific chatbots and open-domain chatbots. They both use Natural Language Processing (NLP) and Recurrent Neural Networks (RNN). For an intermediary data scientist, you can perhaps take this up a notch—try creating a sensitive chatbot with capabilities to detect user sentiment.
Patrick Meyer runs a data science project of this kind. He discusses using the polarity system to identify happy, neutral, and unhappy; Paul Ekman’s initial model with six emotions—anger, disgust, fear, joy, sadness, and surprise or his extended list of sixteen; Robert Plutchik’s wheel of emotions and Ortony, Clore, and Collins (OCC) model.
You can learn more about his detection techniques here . And access the dataset here .
Advanced Data Science Projects
Amazon vs. ebay analysis.

Finding the lowest price for a product on the Internet makes up a significant part of online shopping. Chase Roberts decided to make that easier. In support of a Chrome extension he was building, Roberts compared the prices of 3,500 products on eBay and Amazon. The results showed the potential for substantial savings. For his project, Roberts built a shopping cart with 3,520 products to compare prices on eBay vs. Amazon. Here’s what he found:
- If you chose the wrong platform to buy each of these items (by always shopping at whichever site has a more expensive price), this cart would cost you $193,498.45. (Or you could pay off your mortgage.) This is the worst-case scenario for the shopping cart.
- The best-case scenario for our shopping cart, assuming you found the lowest price between eBay and Amazon on every item, is $149,650.94. This is a $44,000 difference—or 23%!
You can read more about his project, starting with how he gathered the data and documenting the challenges he faced during this process.
Fake News Detection

A recent study revealed that false news spread faster and reached more people than the truth and around 52% of Americans shared that they regularly encountered fake news online. A four-person team from the University of California at Berkeley built a fake news classifier . For this, the team focussed on clickbait and propaganda, the two common forms of fake news. They then developed a classifier that would detect these two forms. Their process involved:
- Taking data from news sources listed on OpenSources
- Used NLP to do the preliminary processing of articles for content-based classification
- Trained various machine learning models to divide the news articles
- Developed a web application to act as the front end of their classifier.
You can learn and try out more about this here .
Audio Snowflake

When you think about interesting data science projects, chances are you think about how to solve a particular problem, as seen in the examples above. But what about creating a project for the sheer beauty of the data? For her Hackbright Academy project, Wendy Dherin did just that.
She developed Audio Snowflake to create a splendid visual representation of music as it played, capturing specific components like tempo, key, mood, and duration. Audio Snowflake mapped both quantitative and qualitative characteristics of songs to visual traits like saturation, color, rotation speed, and figures it produces.
Read more on this project here .
Visualizing Climate Change

2020 was recorded as the warmest year to date by NASA, and the last seven years have been the warmest seven years on record. Climate change is one of the most pressing issues humans face today. It is more important than ever to spread awareness and inform people of the magnitude of this problem. Data visualization can play a crucial role in that.
The data scientist Giannis Tolios did a project where he visualized the changes in global mean temperatures and the rise of CO2 levels in the atmosphere using Python . He uses various libraries such as Pandas, Matplotlib, and Seaborn for the data, visualizing it in line graphs and scatterplots. If climate change is a topic you want to work on, you can learn more about the project here .
Democratizing Data Science at Uber

One of the key challenges in data science is that it requires one to be a mathematician or a statistician even to make basic predictions and forecasts. Uber’s data science platform overcomes this challenge by automating forecasting using pre-built algorithms and tools, enabling everyone on the team to get predictions as long as they have data.
Director of Data Science at Uber, Franziska Bell , talks about how they plan to give the capabilities of a data scientist to every Uber employee. This way, Uber uses artificial intelligence, machine learning, and data science to solve real-world problems. Read more about it here .
Credit Card Fraud Detection

With online and digital transactions gaining more popularity today, their chances of being fraudulent are also on the rise. Therefore banks and financial institutions are looking to leverage data science techniques to identify fraudulent transactions and prevent them from being executed. By processing data across customer location, behavior, transaction value, network, payment method, etc., you can train the algorithm to detect anomalies. You can build your classification engine for fraud detection using decision trees , K-nearest neighbor, logistic regression , support vector machine, random forest, and XGBoost.
To get started, you can find datasets here .

Here are some online data sources which you can access and download for free for your data science projects:
VoxCeleb . A gender-balanced, audio-visual data set containing short clips of human speech from speakers of different ages, professions, accents, etc. They are extracted from interviews uploaded to YouTube. It can be used for various applications like speech separation, speaker identification, emotion recognition, etc.
Boston Housing Data . A fairly small data set based on the information collected by the U.S. Census Bureau data regarding housing in Boston. This data set can be used for assessment, focusing on the regression problem.
Kaggle . With over 50,000 public datasets on a wide range of topics, you can find all the data and code that you require to do your data science project ideas. They also offer competitive data sets that are clean, detailed, and curated.
National Centres for Environmental Information . The largest storehouse of environmental data in the world, this provides information on the oceanic, atmospheric, meteorological, geophysical, climatic conditions, and more.
Global Health Observatory . If you are interested in doing projects in the health industry, then this is the best place to get the data you need. It also has some of the latest COVID-19 data.
Google Cloud Public Datasets . A place where you can access data sets that are hosted by BigQuery , Cloud Storage , Earth Engine , and other Google Cloud services.
Amazon Web Services Open Data Registry . This has an extensive repository of data sets that you can either download and use or analyze on the Amazon Elastic Compute Cloud (Amazon EC2). You need to first create a free AWS account to get access to the data sets.

To help you navigate the world of data science projects, we asked Springboard mentors and instructors for their advice. Here’s what they had to say.
Choose the Right Problem
If you’re a data science beginner, it’s best to consider problems that have limited data and variables. Otherwise, your project may get too complex too quickly, potentially deterring you from moving forward. Choose one of the data sets in this post, or look for something in real life that has a limited data set. Data wrangling can be tedious work, so it’s critical, especially when starting out, to make sure the data you’re manipulating and the larger topic is interesting to you. These are challenging projects, but they should be fun!
Breaking Up the Project Into Manageable Pieces
Your next task is to outline the steps you’ll need to take in order to create your data science project. Once you have your outline, you can tackle the problem and develop a model to prove your hypothesis. You can do this in six steps:
- Generate your hypotheses
- Study the data
- Clean the data
- Engineer the features
- Create predictive models
- Communicate your results
Generate Your Hypotheses
After you have your problem, you need to create at least one hypothesis to help solve the problem. The hypothesis is your belief about how the data reacts to certain variables.
This is, of course, dependent on you obtaining the general demographics of specific neighborhoods. You will need to create as many hypotheses as you need to solve the problem.
Study the Data
Your hypotheses need to have data that will allow you to prove or disprove them. Look in the data set for variables that affect the problem. If you do not have the data, either dig deeper or change your hypothesis.
Clean the Data
As much as data scientists prefer to have clean, ready-to-go data, the reality is seldom neat or orderly. You may have outlier data that you can’t readily explain, like a sudden large, one-time purchase of an expensive item in a store that is in a lower-income neighborhood. Or maybe one store didn’t report data for a week.
These are all problems with the data that aren’t the norm. In these cases, it’s up to you as a data scientist to remove those outliers and add missing data so that the data is more or less consistent. Without these changes, your results will become skewed, and the outlier data will affect the results, sometimes drastically.
Engineer the Features
At this stage, you need to start assigning variables to your data. You need to factor in what will affect your data. Does a heatwave during the summer cause sales to drop? Does the holiday season affect sales in all stores and not just middle-to-high-income neighborhoods? Things like seasonal purchases become variables you need to account for.
Create Your Predictive Models
At some point, you’ll have to come up with predictive models to support your hypotheses. For example, you’ll have to write code to predict sales. You may explore whether an after-Christmas sale increases profits and, if so, by how much. You may find that a certain percentage of sales earns more money than other sales, given the volume and overall profit.
Communicate Your Results
In the real world, all the analysis and technical results you come up with are of little value unless you can explain to your stakeholders what they mean in a comprehensible and compelling way. Data storytelling is a critical and underrated skill that you must develop. To finish your project, you’ll want to create a data visualization or a presentation that explains your results to non-technical folks.
Get To Know Other Data Science Students

Karen Masterson
Data Analyst at Verizon Digital Media Services

Mikiko Bazeley
ML Engineer at MailChimp

Leoman Momoh
Senior Data Engineer at Enterprise Products
How Do You Measure the Success of Data Science Projects?
As a learner, the most critical measure of success is that you have put your skills and knowledge to practice. Good data science projects not only show that you can solve problems but also shows the potential employer how you approach problem-solving. As long as you can add your project to your portfolio, consider it successful.
How Can You Find Interesting Data Science Projects To Try?
This blog post should get you started on various projects you could take up. Online courses like the Springboard Data Science Bootcamp include real-world projects that amplify your portfolio. You can contribute to open-source projects. You can also participate in competitions on platforms like Kaggle and Driven Data to improve your model-building skills.
How Can You Showcase Your Data Science Projects?
You can: – Include it in your resume – Link them to your Linkedin profile – Maintain an active Github account – Create your portfolio website – Write case studies of your projects and publish them on a blog/Medium
Since you’re here… Are you a future data scientist? Investigate with our free guide to what a data scientist actually does . When you’re ready to build a CV that will make hiring managers melt, join our Data Science Bootcamp that guarantees a job or your tuition back!
Download our guide to becoming a data scientist in six months
Learn how to land your dream data science job in just six months with in this comprehensive guide.
Related Articles
How much does a data scientist at facebook earn.

K Means Clustering Machine Learning Algorithm: Introduction and Implementation

3 Proven Steps For Career Transition from Data Analyst to Data Scientist

- Data Analytics Bootcamp
- Data Science Bootcamp
- Data Engineering Bootcamp
- Machine Learning Bootcamp
- Software Engineering Bootcamp
- UI/UX Design Bootcamp
- UX Bootcamp
- Cyber Security Bootcamp
- Tech Sales Bootcamp
- Free Learning Paths
- E-books and Guides
- Career Assessment Test
- Student Outcomes
- Compare Bootcamps
- About the Company
- Become a Mentor
- Hire Our Students
- Universities
- Student Beans
- Inclusion Scholarships

Oct 5, 2018
Data science capstone ideas (and how to get started)
Capstones are standalone projects meant to integrate, synthesize, and demonstrate all your data science knowledge in a multi-faceted way. Capstone projects show your readiness for using data science in real life, and are ideally something you can add to your resume, show to employers, or even use to start a career.
I find data science capstone ideas are like puppies: you want all of them, but can only keep one. Below is a list of some of my ideas and starting points.
Idea #1: Nutritional analysis from Instacart orders
In 2017 Instacart released a dataset of over 3 million grocery orders from over 200,000 users as a Kaggle competition . With a dataset this juicy, immediately a few ideas come time to mind:
- Predict what products users will order again (this was the goal of the Kaggle challenge).
- Build a model to stock the store so there are never any product shortages, but no wasted space or money in ordering.
- Predict a user’s healthiness from order content.
- Make a recommender system for healthier order alternatives.
The first and second are doable with the data you already have, which is nice.
The third was my personal choice, using the USDA food composition database to look up products and create a nutritional breakdown (by the way, they have an API ). But it also introduced a lot of hurdles:
- Users don’t eat everything they order (e.g. cat food, soap, toilet paper). This would require a lot of cleaning and munging.
- Users don’t order just for themselves (e.g. companies, birthday parties, families).
- Users order on different timelines (e.g. once per week, once every two weeks, once a month).
- Items such as deli food may not have entries in the USDA database.
The fourth would also utilize the USDA database, but would not require any user-specific information or messing about with time-series.
I dea #2: Predicting solar output from satellite imaging/historical weather
One of the big issues with mainstream adoption of solar power is unlike other energy sources (hydroelectric, oil, nuclear), you can’t control how long the sun shines for. Overestimating this amount means losses for producers and investors, and downtime for users. Underestimating means a lower chance of adoption in upfront decision-making. Sounds like a job for… machine learning!
Many datasets can be found at NREL , however they are in different years and different locations with limits on how much you can download at once. They have an API , which is useful.
SolarAnywhere has an academic license, allowing you to look up any location (but only for the year 2013). They too have an API .
Also, the NREL NSRDB data viewer .
There are three immediate approaches I can think of:
- Using previous solar output to predict current solar output (time-series or RNN).
- Using weather datasets
- Using satellite imaging datasets
There are a lot of academic papers on this last subject ( a quick Google Scholar search returns about 30,000 results ), but not a lot of publicly available satellite time-series datasets.
Idea #3: Fake news detection
This is a hot one. Without going into full rant-mode, fake news is obviously deleterious for democracy and individual mental stability.
So how to accurately identify what’s fake and what’s true? Here are a few leads on this as a data science problem:
1. Fake News Challenge
This is the best-formatted challenge around this topic, with organizers, advisors, and volunteers from the academic, ML, and fact-checking communities. Includes GitHub repos of winning submissions. Check out the competition page on Codalab.
2. Snopes Junk News
A starting point for well-verified fake news stories vs. actual events.
3. Getting Real About Fake News — Kaggle Dataset
A collection of nearly 13,000 items from 244 websites tagged “BS” from the BS Detector chrome extension. The BS Detector is powered by Open Sources , a project that classifies biased and fake websites.
Where To Get More Ideas
Never stop searching! Here are some ways to get more leads, either in the form of project ideas or datasets to use.
1. Academic papers
2. Kaggle Competitions
3. Kaggle Datasets
4. reddit.com/r/datasets
5. Awesome Public Datasets GitHub Repo
6. Google Datasets
Anything I can write about to help you find success in data science or trading? Tell me about it here: https://bit.ly/3mStNJG
More from samcha
Python, trading, data viz. Get access to samchaaa++ for ready-to-implement algorithms and quantitative studies: https://samchaaa.substack.com/
About Help Terms Privacy
Get the Medium app

Text to speech
All Capstone Projects (2017-2021)
A Data-Driven Approach to Forecasting the U.S. Beer Industry
Assortment Optimization
Suggested Order Quantities
Natural Language Processing for Customer Experience Evaluation
Business Churn Projection and Prediction
Revenue Integrity: Fraudulent Booking Identification
ARCA COCA-COLA
Portfolio Recommendation System for A Leading Coca-Cola Bottler
Prioritizing Customers Visits
True Sales Potential: Unleashing the untapped opportunity
ASSURANCE IQ
Predicting Approval and Denial Rates for Insurance Shoppers
Fostering Innovative Outreach Methods to Engage with New and Existing Customers
Demand Forecasting for a Luxury Fashion Retailer
Trend Forecasting to Quantify Consumer Sentiment
Customer Retention & Targeted Recommendations
Option Take-Rate Forecasting for the BMW Group
Automotive Noise Mining and Classification
Car Recommender for U.S. Dealerships
Connecting the Dots: Matching Existing Solutions to New Defects
Automating the quality control in car manufacturing using computer vision
Reprice with Confidence: Dynamic Pricing with Robust Time-series Forecasting
Cloud Cost Prediction
COLUMBIA THREADNEEDLE INVESTMENTS/AMERIPRISE FINANCIAL
Quantifying Advisors Marketing Engagement and Predicting Quality Leads for Sales
Optimizing Content Likely Personalization
Chatbot or Call? Optimal Contact Channel Selection for Customer Issue Resolution
CORVUS INSURANCE
Automated Dataset Creation using PDF Text Extraction
Improving SMS Customer Experience through a Transformer-based Chatbot
Transport Acquisition Recommendation
ESTEE LAUDER
Identifying Customer Sentiment’s Business Impact
GENERAL ELECTRIC
Predicting Appliance Failures
GENERAL MOTORS
Zero Crashes Initiative
Tackling Congestion Using Connected Car Data
Crowd Sourcing Fuel Data for Sustainable Routing Algorithms
Understanding US Dealership Visitation through Automated Geofence Creation
Electric Vehicle as an Energy Reservoir: Vehicle-to-Grid
[m]clusters: Audiences First
Project Peggy Olson: Data Driven Creativity
Peggy Olson 2.0: Creative AI
Advertisement Attribution for Smarter Channel Investment Strategy
Dynamic Promotion Optimization over Sparse Demand Regression
HANDLE GLOBAL
The Hidden Cost of Healthcare: Transforming medical equipment management
HARTFORD HEALTHCARE
A Data-Driven Approach to Healthcare
Intent Classification from Unlabelled Dataset
Explainability and Bias Removal in Natural Language
Prediction and Optimization of Medical Billing Operations
LINCOLN LABORATORY
USTRANSCOM Flight Data Analysis
Optimizing Lab Procurement with Sparse Vendor Selection
Predictive Aircraft Maintenance: Detecting Imminent Part Failure with Cox Regression and Advanced Ensemble Learning Methods
Avert Disaster: Safety Modeling for the Military Sealift Command (MSC) Ships
Automating UAV Classification and Detection Through Signal and Image Recognition
Budget Allocation Through Marketing AttributioN, a.k.a. BATMAN
Generating Product Recommendations for Small Businesses at Scale
Email Performance and Personalized Recommendations
MASS GENERAL HOSPITAL (MGH)
Interpretable Machine Learning to Alleviate Bias In Trauma Patient Disposition
Routing Vehicles for MBTA’s The Ride
Reducing Costs at The Ride
Paratransit Operations: Impact of Driver Behavior and Demand Forecast
Ridership forecasting and automated geocoding for paratransit ride services
MCKINSEY & CO
What are Large Organizations Hungry For?
Introducing Ratatouille: a Generalizable, Goal-Oriented Dialog Bot
Machine Learning Methods in Credit Risk
Industrial Agglomeration for Single-Industry Spatial Pattern Recognition and Predictive Growth Modeling
Algorithm for Vector-Based Topic Extraction with NLP
Knowledge video summarization through AI
Segmenting Retail Advisors and Optimizing Coverage Model
From Unstructured Text Data to Interpretable Financial Prescriptions: An Optimization Approach
Optimal Client Interaction
To meet, or not to meet, that is the question: Optimizing Interaction Strategies
NEON PAGAMENTOS SA
Customer Relationship Network for Credit Default Prediction
Local Inventory Deployment Optimization
Forecasting Demand for E-Commerce
Prevenar Factory Schedule Optimization: A Mixed Integer Programming Approach
Sharing is Caring: Investigation Load Balancing
QUEST DIAGNOSTICS
Predicting Disease from Longitudinal Laboratory Data
Disease Risk Evaluations in Life Insurance Underwriting via Laboratory and Prescription-Driven Diagnosis Models
Finding the Needle in the Haystack: Anomaly Detection in the Cybersecurity Industry
Lateral Movement Detection: Leveraging Data in the Cybersecurity Industry
RUE GILT GROUPE
Navigation-Based Personalized Recommender System
SCHLUMBERGER
Deep Reinforcement Learning to Automate Acoustic Data Processing
Reliable Machine Learning in a World of Uncertainty
Price Prediction for the Dubai Residential Real-Estate Market
Brewing a Better Shot: IoT Predictive Maintenance for Mastrena II Espresso Machines
Automated Ticket Trading
Events and Tickets Representation Learning and Personalized Recommendation
Guaranteed Sales
Dynamic Pricing Models
Home Page Event Recommendation Optimization
Project Phoenix: Wildfire Prediction in Canada
Protection Gap Explorer: A Data-Driven Exploration of US Life Underinsurance
Life and Health in a Changing Climate
TAKEDA PHARMACEUTICALS
Understanding what causes suboptimal operational performance in clinical trials
THERMO FISHER SCIENTIFIC
Empowering Sales Management with Potential Detection and Conversion Analysis
TRIP ADVISOR
Optimizing User Experience in Hotels Searches by Accurate Price Forecasts
Demand Forecasting with a Segmented Approach
Digital Marketing Attribution Model
Personalized Marketing: Who, How, and When to Market Any Product at Target
Opioid Detection in US Mail Stream
Creating a Tool to Diagnose Out Of Stock Causes
Improving Inventory Placement for Walmart E-Commerce
Planogram Optimization: Finding Optimal Product Placement on the Shelf
Transportation and Shipping Efficiency
The Value of a Day: Optimizing Delivery Time
Optimizing Targeting Strategy for Services
Characterizing Intent Using Customer Journey: a Sequential and Graphical Model Approach
What Products Should be Displayed? Double Assortment Optimization

Interested in hearing more about Bay Path University? Please select a program below:
Interested in applying to Bay Path University? Please select an application below:
Additional Navigation
Applied data science (ms) student capstone projects.
Case Analysis Capstone (ADS670) aims to develop both technical and soft skills that are not directly taught in the traditional courses in the program, but are relevant and critical in order to develop, innovate and communicate in modern data science. This is a project-oriented capstone that will harness the skills gained throughout the program.
Below are some examples of original research studies done by students in our master's in Applied Data Science program for their completed capstone projects.
Capstone Projects
M.S. in Data Science students are required to complete a capstone project. Capstone projects challenge students to acquire and analyze data to solve real-world problems. Project teams consist of two to four students and a faculty advisor. Teams select their capstone project at the beginning of the year and work on the project over the course of two semesters.
Most projects are sponsored by an organization—academic, commercial, non-profit, and government—seeking valuable recommendations to address strategic and operational issues. Depending on the needs of the sponsor, teams may develop web-based applications that can support ongoing decision-making. The capstone project concludes with a paper and presentation.
Key takeaways:
- Synthesizing the concepts you have learned throughout the program in various courses (this requires that the question posed by the project be complex enough to require the application of appropriate analytical approaches learned in the program and that the available data be of sufficient size to qualify as ‘big’)
- Experience working with ‘raw’ data exposing you to the data pipeline process you are likely to encounter in the ‘real world’
- Demonstrating oral and written communication skills through a formal paper and presentation of project outcomes
- Acquisition of team building skills on a long-term, complex, data science project
Capstone projects have been sponsors by a variety of organizations and industries, including: Capital One, City of Charlottesville, Deloitte Consulting LLP, Metropolitan Museum of Art, MITRE Corporation, a multinational banking firm, The Public Library of Science, S&P Global Market Intelligence, UVA Brain Institute, UVA Center for Diabetes Technology, UVA Health System, U.S. Army Research Laboratory, Virginia Department of Health, Virginia Department of Motor Vehicles, Virginia Office of the Governor, Wikipedia, and more.
Sponsor a Capstone Project
View previous examples of capstone projects and check out answers to frequently asked questions.
What does the process look like?
- The School of Data Science periodically puts out a Call for Proposals . Prospective project sponsors submit official proposals, vetted by the SDS Associate Director for Research Development .
- Sponsors present their projects to students at “Pitch Day” near the start of the Fall term, where students have the opportunity to ask questions.
- Students individually rank their top project choices. An algorithm sorts students into capstone groups of approximately 3 to 4 students per group.
- Each group is assigned a faculty mentor, who will meet groups each week in a seminar-style format.
What is the seminar approach to mentoring capstones?
We utilize a seminar approach to managing capstones to provide faculty mentorship and streamlined logistics. This approach involves one mentor supervising three to four loosely related projects and meeting with these groups on a regular basis. Project teams often encounter similar roadblocks and issues so meeting together to share information and report on progress toward key milestones is highly beneficial.
Do all capstone projects have sponsors?
Not necessarily. Generally, each group works with a sponsor from outside the School of Data Science. Some sponsors are corporations, some are from nonprofit and governmental organizations, and some are from in other departments at UVA.
Why do we have to work in groups?
Because data science is a team sport!
All capstone projects are completed by group work. While this requires additional coordination , this collaborative component of the program reflects the way companies expect their employees to work. Building this skill is one of our core learning objectives for the program.
I didn’t get my first choice of capstone project from the algorithm matching. What can I do?
Remember that the point of the capstone projects isn’t the subject matter; it’s the data science. Professional data scientists may find themselves in positions in which they work on topics assigned to them, but they use methods they enjoy and still learn much through the process. That said, there are many ways to tackle a subject, and we are more than happy to work with you to find an approach to the work that most aligns with your interests.
Can I work on a project for my current employer?
Each spring, we put forward a public call for capstone projects. You are encouraged to share this call widely with your community, including your employer, non-profit organizations, or any entity that might have a big data problem that we can help solve. As a reminder, capstone projects are group projects so the project would require sufficient student interest after ‘pitch day’. In addition, you (the student) cannot serve as the project sponsor (someone else within your employer organization must serve in that capacity).
If my project doesn’t have a corporate sponsor, am I losing out on a career opportunity?
The capstone project will provide you with the opportunity to do relevant, high-quality work which can be included on a resume and discussed during job interviews. The project paper and your code on Github will provide more career opportunities than the sponsor of the project. Although it does happen from time to time, it is rare that capstones lead to a direct job offer with the capstone sponsor's company. Capstone projects are just one networking opportunity available to you in the program.
Capstone Project Reflections From Alumni

“Capstone projects are opportunities for you to deliver valuable, quantifiable results that you can use as a testimony of your long-term project success to the company you work for and other companies in future interviews.” — Gabriel Rushin, MSDS 2017, Procter & Gamble, Senior Machine Learning Engineer Manager

“For my capstone project, I worked to develop a clustering model to assess biogeographic ancestry, using DNA profiles. I felt like I was finally doing real-world data science and loved working with such an important organization as the Department of Defense.” — Colleen Callahan, Online MSDS 2021, Associate Research Analyst, CNA (Arlington, Virginia)
Capstone Project Reflections From Sponsors
“For us, the level of expertise, and special expertise, of the capstone students gives us ‘extra legs’ and an extra push to move a project forward. The team was asked to provide a replicable prototype air quality sensor that connected to the Cville Things Network, a free and community supported IoT network in Charlottesville. Their final product was a fantastic example that included clear circuit diagrams for replication by citizen scientists.” — Lucas Ames, Founder, Smart Cville
“Working with students on an exploratory project allowed us to focus on the data part of the problem rather than the business part, while testing with little risk. If our hypothesis falls flat, we gain valuable information; if it is validated or exceeded, we gain valuable information and are a few steps closer to a new product offering than when we started.” — Ellen Loeshelle, Senior Director of Product Management, Clarabridge

MSDS Capstone Projects Give Students Exposure to Industry While in Academia

Master's Students' Capstone Presentations
Get the latest news.
Subscribe to receive updates from the School of Data Science.
- Prospective Student
- School of Data Science Alumnus
- UVA Affiliate
- Industry Member
Capstone Projects
The capstone project experience.
In the final two quarters of the program, students gain real world experience working in small groups on a data science challenge facing a company or not-for-profit. At the conclusion of the capstone project, sponsoring organizations are invited to attend a formal Capstone Event where students showcase their work. Capstone projects typically span a wide range of interests, including energy, agriculture, retail, urban planning, healthcare, marketing, and education.
Examples of Previous Capstone Sponsors
- Biblioteca Italiana Seattle
- Civil & Environmental Engineering, WSU
- Equal Opportunity Schools
- iSchool, UW
- Kids on 45th
- Seattle Children’s Hospital
- Urban Planning, UW
Capstone 2020-22 Archives (Gather.Town)

Due to the pandemic, our Capstone 2021 was held entirely online in the Gather.Town platform , to which we added galleries of our 2020 and 2022 Capstone projects for an archive you can digitally wander and browse.
Gather presents a map-based, interactive platform where you can wander among projects, see media like posters, infographics, and video, and do video/audio chat with others who are logged into the space. You can read some basics about using this platform at the Gather site. One of the other benefits of Gather is that it created a persistent archive of our Capstone 2020-2022 projects, which you can view and digitally wander among here:
https://tinyurl.com/msdsfair
Other examples of past projects.

Visualizing Gentrification in Seattle
MSDS students Deepa Agrawal, Angel Wang, and Erin Orbits created an interactive mapping tool to visualize gentrification in Seattle.
Sponsor: Urban Planning, University of Washington

Using Artificial Intelligence to Monitor Inventory in Real Time
Capstone researchers Havan Agrawal, Toan Luong, Vishnu Nandakumar, and Tejas Hosangadi explored new methods for optimizing supply chains and product placements to improve sales.
Sponsor: Clobotics

Predicting Soil Moisture with Machine Learning
MSDS students Samir Patel, Rex Thompson, Michael Grant, and Dane Jordan developed machine learning models to accurately estimate soil moisture using satellite imagery.
Sponsor: Civil & Environmental Engineering, Washington State University
Admissions Timelines
Application for Autumn 2023 is now closed. Next admissions cycle opens in September 2023 for Autumn 2024 admissions.
January 12, 2023 @ 11:59pm PST – International application deadline
January 19, 2023 @ 11:59pm PST – Domestic application deadline
Mid-March 2023 – Admissions decisions released
Please note: No late applications accepted for any reason.
Sign Up for Email Updates
Be boundless, connect with us:.
© 2023 University of Washington | Seattle, WA

IMAGES
VIDEO
COMMENTS
The conclusion in a science project summarizes the results of the experiment and either contradicts or supports the original hypothesis. It is a simple and straightforward answer to the question posed by the experiment. This section is clea...
One of the key purposes of the introduction to a science project is setting forth or outlining the purpose of the project in a clear, concise manner. The introduction summarizes how the science project is to work or proceed from start to fi...
Some easy investigatory science project ideas include attempting to purify used cooking oil, making biodegradable plastic and increasing the shelf life of fruits and vegetables. One easy experiment is to investigate possible strategies for ...
Building a forest fire prediction model can be a great data science project. Forest fire or wildfire are known to be uncontrollable and capable of causing a
Using Mock Draft Data to Create a Player Availability Dashboard for the NFL Draft · A Practical Data Science Application: Developing Prediction Models for
Beginner-Level Data Science Project Ideas · 1. Dr. Semmelweis and the Discovery of Handwashing · 2. Build a Chatbots, Project Source Code · 3. Recommendation
Project 3 – Drowsiness Detection using Computer Vision · Project 4 – Gesture Recognition using Computer Vision · Project 5 – Team Selection using
A typical project allows you to use skills in data collection, cleaning, analysis, visualization, programming, machine learning, and so on. It
Data science capstone ideas (and how to get started) · Idea #1: Nutritional analysis from Instacart orders · Idea #2: Predicting solar output from
All Capstone Projects (2017-2021) · Deep Reinforcement Learning to Automate Acoustic Data Processing · Reliable Machine Learning in a World of Uncertainty
Applied Data Science (MS) Student Capstone Projects · Predicting Burnout: A Workplace Calculator · Beyond Artist: Text-to-Image AI · Predicting the Assessment
1. Analyzing a large dataset to find patterns or trends. · 2. Building a machine learning model to predict something future events. · 3. Extracting insights from
Capstone projects challenge students to acquire and analyze data to solve real-world problems. Project teams consist of two to four students and a faculty
Other Examples of Past Projects · Visualizing Gentrification in Seattle · Using Artificial Intelligence to Monitor Inventory in Real Time · Predicting Soil