14 Popular Data Science Project Ideas for Beginners

The best way to get good at Data Science tools and technologies, as a beginner, is to build projects that solve real-world problems. Keeping that in mind, in this blog, we will take a look at the Top 14 Data Science Projects Ideas that you can undertake to upskill yourself.

Top Data Science Course

As a beginner, it can be extremely daunting to understand Data Science, have a good understanding of the concepts involved, and gain hands-on experience in them. One of the best ways to become good at Data Science or anything creative is by deliberately practicing the acquired skills to reinforce them in your brain. For this, you may have to work on various projects but, as a beginner, it can be quite difficult to choose not-very-complicated Data Science projects—some projects may be difficult to implement and some may not help you push yourself to the limits. If all this sounds familiar to you, then this blog is for you.

In this blog, we will discuss the best projects in Data Science for beginners to try out and expand their knowledge and skill set. These Data Science project ideas will also help you get a taste of how to deal with real-world Data Science problems.

This blog will discuss the following topics:

Recommendation System Project

Data analysis project, sentiment analysis project, fraud detection project, image classification project, image caption generator project in python, chatbot project in python, brain tumor detection with data science, traffic sign recognition, fake news detection, forest fire prediction, human action recognition, classifying breast cancer, gender detection and age prediction, tips for a good data science project.

Check out our Data Science Project Tutorial Video on YouTube designed especially for Beginners:

Data Science Project Ideas

Without delay, let us start exploring the most interesting Data Science projects for beginners.

Recommendation System Project

A recommendation system is one of the most important aspects of any content-based application such as blog, e-commerce website, streaming platform, etc. A recommendation system suggests new content to users from the site’s content library or database based on what the users have already viewed and liked. A recommendation system needs data about users and their activities on the site as well as information about the content so that it can be classified and recommended to the users based on their tastes and preferences. A project-based recommendation system is also one of the most popular Data Science project ideas.

These systems can be built by using the following techniques:

This is one of the most interesting projects. There are many other techniques that are quite advanced and complicated, but these two techniques would be enough for you to build your own recommendation engine as a beginner. You can train the engine to be used for recommending movies, blog posts, products, etc.

Get 100% Hike!

Master Most in Demand Skills Now !

Data Analysis Project

Data analysis is one of the core skills that is needed by a data scientist . In data analysis, you take some data and try to gain insights from it by analyzing it in order to make better decisions. One of the ways in which we can simplify the analysis is by generating visualizations that can be interpreted easily. The scope of data analysis is vast but this is one of the most useful Data Science projects.

Today, data is considered more important than oil. All companies store data about their users and how they interact with the products. This data allows companies to craft better policies and features that help solve customer problems and attract more user engagement with the platform.

Willing to master the most in-demand technology? Enroll in this Data Science course in Kottayam Now!

For example, if you are working on the data of an e-commerce company and find that users from a particular country buy only specific kinds of products, then you can use this information to get a better understanding of why it is happening and to generate better product recommendations for more engagement.

Companies, such as Uber, Amazon, Flipkart, etc., use data analysis to create better offers and generate better quotes to meet customer expectations in the best way possible. It is one of the projects in Data Science that many companies implement in their own ways.

For Data Science projects on data analysis, you can use e-commerce datasets or datasets from ride-hailing apps, such as Uber, Lyft, etc.

Master the skills to become a top Data Scientist by enrolling for Intellipaat’s Data Science Online Course .

Sentiment Analysis Project

Sentiment analysis is used to add emotional intelligence to systems. It is one of the projects in  Data Science that people start with when they wish to learn how to process text. For example, when a user types in a comment on a video or blog post, sentiment analysis can be used to determine if the comment is appreciative, disparaging, critical, etc. These can also be used to classify emails, messages, reviews, queries, etc.

One of the major applications of these kinds of Data Science projects can be seen on public platforms, such as Twitter, Reddit, etc., where users post things that are tagged to indicate the type of content contained in them, i.e., positive or negative, with the help of sentiment analysis. This technique helps companies to understand, process, and tag even unstructured text.

These projects on sentiment analysis can be quite useful for various companies. Sentiment analysis can also be used to analyze and make sense of reviews, complaints, queries, emails, product descriptions, etc. For instance, you can use sentiment analysis to generate tags for such content as being negative, positive, neutral, etc.

Career Transition

capstone project ideas for data science

Use Cases :

Fraud Detection Project

Fraud detection is one of the most important Data Science projects and also one of the most challenging for final-year students. With many forms of online and digital transactions being used widely, the chances of them being fraudulent are increasingly high. Since any form of digital transaction generates data regarding current and previous transactions, as well as customer purchase records, you can use these data and Data Science techniques to identify if the transactions are potentially fraudulent.

Any transaction done digitally is bound to create some data. When a customer uses a digital medium to make a payment, you can use this generated data with the trained model to flag the transaction as potentially fraudulent, which can later be dealt with and reviewed. This is one of the most important projects to practice in case you wish to be able to build Machine Learning models based on data about user activity.

Large amounts of money are being digitally transferred every day; thus, you should be able to classify if these records are fraudulent or not. To do this, you have to create models that are trained on the data collected from previous transactions. These models use and analyze factors such as the amount transferred, the location it is transferred from, the location to which it is transferred, etc. These factors are taken into account when new transactions take place, and then, based on these factors, they are flagged as fraudulent or authentic transactions.

Preparing for job interviews? Go through our list of most-asked questions on our blog on Data Science Interview Questions and Answers .

Image Classification Project

Image classification is one of the Data Science projects that can be used to classify and tag images based on their content. Image classification is widely used in the fields of science, security, etc. This is also among the most important applications of Data Science as it is very difficult to classify images with traditional application programming. Earlier, a lot of time and research was required to generate complicated rules and image transformations to classify images, and the result was still quite prone to errors. With Data Science, you can create models by training them with a lot of labeled images. These models can then generate Machine Learning classification rules on their own, and you can feed new images to be classified by the classification rules.

In Data Science projects like these kinds of classifications can be done by using several algorithms, and it is better to use several algorithms to find the one that performs the best for your dataset . You will have to make sure to use a large collection of images with good resolution for training and testing purposes. Image classification also requires you to have a good grasp of fundamental image concepts and manipulation techniques such as image reshaping, resizing, edge detection, etc.

Courses you may like

IIT Madras Data Science

Image Caption Generator Project

Any social media application that allows storing and sharing images lets users provide captions to those images. The captions are given to provide more context and necessary information about the images. The captions also help in things such as Search Engine Optimization (SEO), content ranking, etc. In blogs, having a caption or good description of what a particular image contains can be very helpful for the readers. Captions also help with accessibility and allow screen reader software to help people with disabilities get a better understanding of the content of the image. Generating these captions can be one of the most challenging Data Science projects.

However, in many cases, generating captions is a long and tedious process, especially when there are a lot of images. To solve this issue, you can generate captions based on what is actually shown in the image. The captions will serve as descriptions of what the images have in them, e.g., a man surfing, a dog smiling, etc.

To do this, you need to understand and use neural networks , especially convolutional neural networks (CNNs) and long short-term memory (LSTM). There are a lot of large datasets available to do this task such as Flickr8k dataset. If training a new model is not possible on your current machine, then you can use the available pretrained models as well. Image Caption Generator is one of the best Data Science projects to understand how to process images using neural networks.

Thinking of getting a master’s degree in Data Science? Enroll in the MSc in Data Science in India !

Chatbot Project

Chatbots are one of the most essential parts of any customer-centric app of the day. They help in the better tracking of customer issues, faster issue resolution, and generating commands using normal text. For example, many bots on platforms, such as Slack and GitHub , allow you to perform certain tasks just by writing and sending them requirements in the chat box. Chatbots also help customers get resolutions to their grievances without any human interaction. For example, food delivery apps, such as Uber Eats and DoorDash, use chatbots to assist users to resolve common issues including refunds, missing food items, incorrect items, etc.

There are two types of chatbots:

Data Science projects like these make extensive use of Natural Language Processing (NLP). Implementing a chatbot requires a good grasp of concepts related to NLP, access to a dataset that contains the patterns that you need to find and the responses that you have to return to the user.

Certification in Bigdata Analytics

There are many applications of Data Science in the healthcare field as well. One of these is brain tumor detection. In this project, you will take a lot of labeled images of MRI scans and train a model using them. Once the model is well-trained, you will use it to check an MRI image to see if there is any chance of detection of a brain tumor.

To implement these kinds of Data Science projects, you need access to MRI scan images of the human brain. Thankfully, there are datasets available on Kaggle. All you have to do is use these images to train your model so that, when fed with similar images, it can classify them as detecting a brain tumor or not. Though such models do not completely remove the need for a consultation from a domain expert, they do help doctors get a quick second opinion.

Traffic Sign Recognition Project

Nowadays, one of the most popular applications of Data Science is self-driving cars. Although a self-driving car could be very difficult and expensive to work with, you can implement a specific and important feature needed in a self-driving car, which is traffic sign recognition.

In this project, you will use images of different traffic signs and label them, depicting what the signs are indicating. The more images there are, the more accurate the model will be, though it will take longer to train the model. You will start by using convolutional neural networks (CNNs) to build the model with images that are labeled with what is being indicated by a specific traffic sign. Your model will learn with the help of these images and labels. Next, when a new image is given as the input, the model will be able to classify it.

Looking to get started with Data Science? Check out our comprehensive Data Science Tutorial for Beginners now!

Fake News Detection Project

A recent study done by MIT claims that fake news spreads six times faster than real news. Fake news is becoming a great source of trouble in all spheres of life. It leads to a lot of problems around the globe, ranging from political polarization, violence, and propagation of misinformation to religious and cultural conflicts. It is also troubling that more and more unverified sources of information, especially social media platforms, are gaining traction; this is doubly concerning as these platforms do not have systems in place to distinguish between fake news and real news.

To tackle a problem like this, especially on a smaller scale, you can use a dataset that contains fake news and real news labeled in the form of textual information. You can use NLP and techniques such as Term Frequency-Inverse Document Frequency (TF-IDF) Vectorizer. This allows you to enter some text from a news article to get a label that tells if it is fake news or real news. It is important to note that these labels may not be 100 percent accurate, but they can give a good approximation to know what is correct or real.

Building a forest fire prediction model can be a great data science project. Forest fire or wildfire are known to be uncontrollable and capable of causing a large amount of damage. You can apply k-means clustering to manage wildfires as well as assume their disrupted nature. It will also help to spot the major fire hotspots and their severity.

This model can also be useful in the proper allocation of resources. Meteorological data can be used to search for specific periods and seasons for wildfires to increase the accuracy of the model.

Become a Data Science engineer with expertise in Python. Enroll in Data Science with Python Certification in Philippines

This model will attempt to execute classification based on human actions. The human action recognition model will analyze short videos of human beings performing specific actions.

This Data Science project will require the use of a complex neural network that is trained on a specific dataset containing short videos. Accelerometer data is associated with the dataset. First, the accelerometer data conversion is performed along with a time-sliced representation. The Keras library is then used to train, validate, and test the network based on these datasets.

Breast cancer cases are on the rise, and early detection is the best possible way to take suitable measures. A breast cancer detection system can be built by using Python. You can use the Invasive Ductal Carcinoma (IDC) dataset carrying the histology images for cancer-inducing malignant cells. The model can be trained based on this dataset.

Some useful Python libraries that will be helpful for this Data Science project are NumPy, Keras, TensorFlow, OpenCV, Scikit-learn, and Matplotlib.

Gender Detection and Age Prediction with OpenCV is an impressive Data Science project idea that can easily grab a recruiter’s attention if it is on your resume. This real-time Machine Learning project is based on computer visioning.

Through this project, you will come across the practical application of convolutional neural networks (CNNs). Eventually, you will also get the opportunity to implement models that are trained by Tal Hassner and Gil Levi for Adience dataset collection. This collection contains unfiltered faces and working with them will help with gender and age classification.

The project may also require the use of files such as .pb, .prototxt, .pbtxt, and .caffemodel. This project is very practical, and the model can detect any age and gender via an image using single face detection.

While gender and age ranges can be classified with this model, due to various factors, such as makeup, poor lighting, or unusual facial expressions, the accuracy of the model can become a challenge. Therefore, a classification model instead of a regression model can be used.

Now, let us discuss some key aspects of a good Data Science project:

In this blog, we have discussed the most relevant real-time Data Science projects as well as some tips for beginners to be able to better utilize their skills and tackle some real-world problems using various datasets. Hopefully, this blog was helpful and informative to you.

You can also explore this Data Science course in Pune to know more about Data Science projects!

Course Schedule

Leave a Reply Cancel reply

Your email address will not be published. Required fields are marked *

Looking for 100% Salary Hike ?

Speak to our course Advisor Now !

Data Science

Related Articles

what is data science

What is Data Science?

Updated on: Mar 01, 2023

Intellipaat

How to Learn Data Science?

Intellipaat

Data Scientist Salary: How much does a Data Scient...

Updated on: Mar 02, 2023

Intellipaat

Different Data Science Job Profiles

Associated courses.

Data Science Course Online

Data Science Course Online

PGP DS and ML Category image

PGP in Data Science and Machine Learning - Job Gua...

IU IITM Pravartak

M.Sc in Data Science by IU

PG Program in Data Science

PG Program in Data Science

PG Program in Data Analytics

IITM Pravartak

Advanced Certification in Data Analytics for Busin...

University of essex Feature Image

Master of Science in Data Science

All Tutorials

Data Science

Data Science Tutorial for Beginners

Machine Learning Interview Questions

Machine Learning Tutorial for Beginners

Updated on: Nov 28, 2022

Artificial Intelligence

Artificial Intelligence Tutorial for Beginners

Statistics and Probability Tutorial

Statistics and Probability Tutorial

Updated on: Apr 22, 2022

R Programming Tutorial

R Programming Tutorial for Beginners - Learn R

Updated on: Jan 10, 2023

Subscribe to our newsletter

Signup for our weekly newsletter to get the latest news, updates and amazing offers delivered directly in your inbox.

Download Salary Trends

Learn how professionals like you got upto 100% hike!

Course Preview

Expert-Led No.1

In their final semester of the UW Data Science program, students are required to take DS 785 , the capstone course. Below are example capstone projects to give you an idea of the types of opportunities available to our students.

Using Mock Draft Data to Create a Player Availability Dashboard for the NFL Draft

capstone project ideas for data science

A Practical Data Science Application: Developing Prediction Models for Product Inventory Reduction and Ongoing Monitoring to Create Efficiency

capstone project ideas for data science

An In-Depth Review Customer Segmentation, Recommendation Systems, and the Benefits of Combined Use

Time-series forecasting of maple tree sap harvesting.

capstone project ideas for data science

Comparative Study on Employee Turnover

capstone project ideas for data science

The Development of Feed Type Classification Algorithms for a Commercial Testing Laboratory

capstone project ideas for data science

Daily Driving Route Optimization for Small Businesses Using Metaheuristics

capstone project ideas for data science

Cost Analysis of a Local Union’s Digital Transformation

Examining and predicting the university of wisconsin’s system library ebook usage.

capstone project ideas for data science

Advertisement campaign targeting attributes recommendation engine

capstone project ideas for data science

Qlik Application Creation for Deeper Analysis of Department of Defense Budget

capstone project ideas for data science

Exploring Rural Road Crash Data with Statistical Models

capstone project ideas for data science

780 Regent Street Suite 130 Madison WI, 53715

Advising: 608-800-6762 [email protected]

Current students can email: [email protected]

Technical Support: 1-877-724-7883

A Collaboration of the University of Wisconsin System

Site Logo

50 Best Data Science Project Ideas You Must Know in 2023

Best Data Science Project Ideas

Have you learned Data Science? … If yes then your next step should be Data Science Projects . Because without working on Data Science Projects, you can’t excel in this field. That’s why in this article, I am going to share the 50 Best Data Science Project Ideas with you.

I have categorized these Data Science Project Ideas into three sections- Beginners, Intermediate, and Advanced. You can easily pick the project idea based on your knowledge level.

Now, without any further ado, let’s get started-

Best Data Science Project Ideas

For your convenience, I have created a table from where you can easily pick the most suitable Data Science Project Idea for you.

Let’s start with the  Beginner Level Best Data Science Project Ideas –

Beginner-Level Data Science Project Ideas

Intermediate-level data science project ideas, advanced-level data science project ideas.

So these are the 50 Best Data Science Project Ideas . I hope you have found the most suitable project in this article for you. For more project ideas, you can check  Kaggle ,  DataCamp ,  Coursera ,  DataFlair , etc.

If you have any questions, feel free to ask me in the comment section. I am here to help you. And If you found this article helpful, share it with others to help them too.

All the Best for your Data Science Journey!

Happy Learning!

Related   Article

10 Best Online Courses for Data Science with R Programming 8 Best Free Online Data Analytics Courses You Must Know in 2023 Data Analyst Online Certification to Become a Successful Data Analyst 8 Best Books on Data Science with Python You Must Read in 2023 14 Best+Free Data Science with Python Courses Online- [Bestseller 2023] 10 Best Online Courses for Data Science with R Programming in 2023 8 Best Data Engineering Courses Online- Complete List of Resources Best Course on Statistics for Data Science to Master in Statistics 8 Best Tableau Courses Online– Find the Best One For You! 8 Best Online Courses on Big Data Analytics You Need to Know Best SQL Online Course Certificate Programs for Data Science 7 Best SAS Certification Online Courses You Need to Know

Explore More about Data Science , Visit Here

Subscribe For More Updates!

[mc4wp_form id=”28437″]

Though of the Day…

‘ It’s what you learn after you know it all that counts.’ – John Wooden

Leave a Comment Cancel Reply

Your email address will not be published. Required fields are marked *

Save my name, email, and website in this browser for the next time I comment.

8 Awesome Data Science Capstone Projects from Praxis Business School

Introduction.

It is not the strongest or the most intelligent who will survive but those who can best manage change.

Evolution is the only way anything can survive in this universe. And when it comes to industry relevant education in a fast evolving domain like Machine Learning and Artificial Intelligence – it is necessary to evolve or you will simply perish (over time).

I have personally experienced this first hand while building Analytics Vidhya. It still amazes me to see where we started and where we are today. During this period, there have been several ups and downs, several product launches, product re-launches and what not! But one thing has been a constant in our story – constant evolution!

So, when I got an invite to be a judge on the panel judging Capstone projects done by students of PGP in Data Science with ML & AI program at Praxis Business School, the same school where I had reviewed the program almost 4 years back – I was curious. I was curious to see and learn how their evolution had panned out.

capstone project ideas for data science

My interaction with the students four years ago was quite different from my experience sitting in a panel of judges for Capstone projects. You get to see the final outcome coming from a rigorous program as opposed to just having a classroom interaction. This is like the proof of the pudding!

I was hoping to find out answers to 2 broad questions in the process:

With those questions in mind – I boarded an early morning flight to Bengaluru and was in the Praxis campus by 9:00 a.m. Since the evaluations were supposed to start at 10:30 a.m., I had some time on my hand.

I used this time to catch up with the course faculty Gourab Nath , and other judges of our esteemed panel – Suresh Bommu (Advanced Analytics Practice Head at Wipro Limited) and Rudrani Ghosh  (Director at American Express Merchant Recommender and Signal Processing team).

I also grabbed some authentic South Indian breakfast in the process. 🙂

Program Details and Capstone Projects

For people who are not aware – Praxis Business School offers a year-long program – PGP in Data Science with ML & AI at both its campuses – Kolkata and Bengaluru. The program is structured in a manner where the first 9 months are spent in the classroom with in-house and industry faculty and the last 3 months are spent as an intern with an industry partner.

The Capstone project happens before the internship actually starts. So, students spent a total of 9 months in the classroom and had been doing these projects for the last 3 months (month 6 – month 9 in the curriculum).

How has the Program Evolved over the Years?

The last time I had visited Praxis was in 2015 and I was dead sure that the program would have evolved. The question was how much? In which direction? What are the key takeaways for the students and how are the students from Praxis doing in the real world?

So, let me share my findings based on the interaction with Gourab and the rest of the panel.

How Much has the Program Evolved? In which Direction?

The first noticeable change was the name of the program itself. Back in 2015, the Program was called PGP in Business Analytics as most of the material in the course was related to Business Analytics and Statistical Modelling.

Over time, the program has evolved a lot – I was surprised to see the number of topics that are covered in the program. Here is a screenshot of topics covered in the curriculum, picked directly from their site:

capstone project ideas for data science

The program has clearly evolved a lot. It not only includes Machine Learning and Deep Learning, but also Big Data Tools and Business-Focused topics. As far as I can see – the program has evolved a lot and has become a comprehensive course for data scientists.

What are the key takeaways for the students undergoing the program?

I think the best way to judge this is to look at the projects. So – I held this off and the projects were sufficient proof by themselves.

Needless to say, I was pretty excited by these discussions and with the context of this evolution – I was ready for what the rest of the day was supposed to be.

Here are the views of Gourab Nath, part of the judging panel and Assistant Professor of Praxis’ Data Science Program:

Collection of images is a challenging task for projects that involves topics like face recognition. Previously we were using an approach which was a little time-consuming.   So, this time we decided to take a more systematic approach to collect the images that can massively same time of our participants. The teams working on such projects designed and developed an easy-to-handle application for facial image collection.   A participant was requested to sit in front of the computer where we had the software running and all he/she needed to do was to enter his/her names and press a capture button to start the image collection process.
The students at Praxis Business School are highly encouraged not to be hugely dependent on the tools and the packages and focus more on writing algorithms. This approach helps them to code better no matter what programming languages they use.

Capstone Projects by Current Passing out batch at Praxis Business School

capstone project ideas for data science

A glance at the list of projects confirmed my views until now. I could see projects on Machine Learning, Natural Language Processing (NLP) and Computer Vision (CV).

More importantly – it looked like these projects were not based on some open datasets. The problems mentioned were unique and I was not aware of many open datasets addressing these problems. Now, I was curious and excited to see what students have and how they have done.

Here’s the list of Capstone Projects done by students at Praxis Business School:

Just to put things in perspective – most of the students presenting to us did not have any knowledge of predictive modeling and machine learning till July 2018 – when they started with the program.

Details of the Capstone Projects

Let’s look at each capstone project in a bit more detail to understand what it was about plus the tools and techniques used in each project.

Project 1 – Detection of Spam Reviews

Customer reviews have a huge influence on potential buyers of any product. A number of false reviews may drive the influence either in a positive direction or a negative direction. Any of these cases may make the customers take wrong decisions and the trustworthiness of the online opinions could be an issue.

In this project, we investigate opinion spam in reviews.

Note that this problem is different from email spam classification. Email spam usually refers to unsolicited commercial advertisements to attract people towards some products or services and hence they usually contain some prominent features.

Our specific problem is more challenging because untruthful opinion spam is much harder to deal with. These kinds of spamming material can be carefully crafted and made indistinguishable.

Techniques: Shingle Method, n-grams, Feature Extraction

Project 2 – Opinion Mining on Mobile Phone Features

You open amazon.com and find that lots of customers have given great reviews about a well-branded mobile phone you are interested in. You wonder – are these good reviews due to the camera of the phone? Or, how good is the battery of the phone? And what about the display?

While the number of reviews is really large and its almost impractical for the readers to go through all of them for evaluating the product, answers to these kinds of questions can be really helpful in making useful decisions.

In this project, our focus is to identify various features of a mobile phone that the customers are talking about in their reviews and mine the customers’ opinion on these features.

Further, we focus on identifying the polarity of these opinions and summarize the reviews. Finally, we develop a user-interface that summarizes the opinions about the features of the phone and rank the customer reviews based on its utility. We also propose an architecture that can perform the same on the reviews of any mobile phones.

Tools: Python [Packages: NLTK, SpaCy, sklearn], Wix.com (for the website creation)

Techniques : Fuzzy Matching, POS tagging, Association Rules Mining, Compactness Pruning, Redundancy Pruning, identifying sentiments based on the word list and weights in AFINN and WordNet

Check out a demonstration of this project below:

Project 3 – Drowsiness Detection using Computer Vision

How many times has this happened to you – you started a movie on your computer at night and fell asleep in the middle of it? And when you woke up the next day, you simply have no clue about how far you watched it? Happens to the best of us.

In this project, we focus on developing an application that will be able to detect if you are asleep and automatically pause the video for you. The system waits to see if you wake up in the next 30 minutes. In case you don’t, it will save a snapshot of the screen, close all the windows and shut down your computer automatically.

Tool: Python, Open CV, Tensorflow, Keras

Techniques: Viola-Jones algorithm on Rapid Object Detection using a Boosted Cascade of Simple Features, Inception V3, LSTM

Project 4 – Gesture Recognition using Computer Vision

Picture this – you are watching a video on your computer but are feeling way too lazy to use the mouse or the keyboard to control the video player. Sounds familiar?

We have a solution for you!

In this project, we focus on making the computer recognize some special gestures which will enable one to control a video player by just using those gestures.

For example, showing your palm in front of the system will enable the pause and the un-pause function. You will also be able to control the volume, fast forward a video or rewind it. You will also be able to do a wide range of other things like changing the slides of your PPT, changing pages, scrolling, etc. without grabbing your mouse or keyboard.

Techniques: Green Screen (for background subtraction), Single-Shot Multi-box Detector (SSD)

Project 5 – Team Selection using Computer Vision

Students are asked to create teams for their projects or their assignments, which is of course a very common thing in every school and college. The class representative (CR) creates a Google spreadsheet and shares it with everyone.

Students, after deciding who they want to team up with, populate the spreadsheet with the names of their team members. But the CR must remember the rules given by their Professor – the team size should be three and every team must have one female member at least.

So, the CR checks the restrictions and if everything is fine, he/she shares it with the Professor. This is one way to do it.

Or, you can do it the smart way.

You stand with your teams in front of the computer, the computer checks the restrictions, recognizes you, and fills in the database with your names and photos.

But remember, the computer won’t allow you to register if the constraints are not satisfied or when at least one of the members in your team is already registered as members of any other team. So, you cannot fool it!

Techniques: VGG-NET 19, HOG Detector

Project 6 – Attendance Tracking System using Computer Vision

In this project, we developed a system to record class attendance using computer vision.

After a faculty enters the system using a password and sets the period, the camera opens up to capture the picture of the class. The number of snapshots of the class is first passed through a face detector followed by a face recognizer.

After the system recognizes the students, it updates the attendance spreadsheet and saves the captured image in its respective image directory – labeling it by the date and time of the day. The unidentified students are marked as absent.

Techniques: Haar Cascade Classifier, HOG, Siamese Model (One Shot Learning), kNN

Project 7 – Recommender System for Fashion Apparel

The use of a recommender system in e-commerce companies is a highly targeted approach that can generate a high conversion rate. These systems help customers discover the products which they might be interested in and will likely purchase.

In this project, we have created a recommender system for a small fashion apparel industry that: Allows the customers to search by the image of a product Gives a personalized recommendation to the heavy buyers, and Displays the most frequently purchased item for the selected item

Tools: Python

Techniques: kNN, Collaborative Filtering, Content-Based Filtering, Autoencoders

Here’s a demo video of this project:

Project 8 – Nearest Document Search

In this project, we have created a nearest document search engine for News reading. The application will not just recommend you related news but also give you the sentiment and highlight important words associated with the news. If the news is big and you do not want to read the full news, fair enough, this app will have a summarized version ready for you.

Techniques: kNN, KDTree, Word Cloud, Lex Rank Summarizer

How relevant were these projects for the Industry?

One of the most critical questions I had was – are these projects industry relevant? Bridging the gap between academia and industry has been a significant challenge in data science. It turns out the answer is quite comprehensive.

In the last 4 years, the number of companies hiring has increased 4 times (from 15 in 2015 to 60 in 2018-19) and the average salary has doubled (5LPA in 2015 to 9LPA in 2018-19).

So, here are the thoughts of my fellow panelists on this topic:

“I am very impressed on the scope, objectives, and contents of the capstone projects executed by Praxis students. The majority of the projects are around the application of deep learning concepts which they have learned as a part of the course work.   The entire project execution and development activities were well planned and organized. Starting from defining the problem statement, challenges, real-time application and finally presenting the results.” – Suresh Bommu, Advanced Analytics Practice Head at Wipro Limited
“What really stood out for me was the effort put in by students in attempting to create an end-to-end product with a UI as well as the variety of projects and its extended application.” – Rudrani Ghosh, Director at American Express Merchant Recommender and Signal Processing team

Key Takeaways from the day

I loved the day and would live it again without second thoughts. But there were a few things which stood out for me:

It was great to see the high level of projects presented by these students. As I mentioned, I was glad to see the students picking up challenging problems on not openly available datasets.

At the end of the day, I had to rush back to the airport. Day trips to Bengaluru are bad! And the fact that I had to rush through projects for a few students only made it worse. I would have loved to spend more than a day – the Energy of the class, the faculty and the judges was infectious 🙂 Looking at these projects – I can confidently say that Praxis Business School continues to offer one of the best full time program in Machine Learning and Deep Learning in India.

capstone project ideas for data science

About the Author

Kunal Jain

Kunal is a post graduate from IIT Bombay in Aerospace Engineering. He has spent more than 10 years in field of Data Science. His work experience ranges from mature markets like UK to a developing market like India. During this period he has lead teams of various sizes and has worked on various tools like SAS, SPSS, Qlikview, R, Python and Matlab.

Our Top Authors

Rahul Shah

Download Analytics Vidhya App for the Latest blog/Article

One thought on " 8 awesome data science capstone projects from praxis business school ".

Ramdas

Ramdas says: April 29, 2019 at 9:30 pm

Leave a reply your email address will not be published. required fields are marked *.

Notify me of follow-up comments by email.

Notify me of new posts by email.

Top Resources

capstone project ideas for data science

30 Best Data Science Books to Read in 2023

capstone project ideas for data science

How to Read and Write With CSV Files in Python:..

capstone project ideas for data science

Understand Random Forest Algorithms With Examples (Updated 2023)

capstone project ideas for data science

Feature Selection Techniques in Machine Learning (Updated 2023)

Welcome to India's Largest Data Science Community

Back welcome back :), don't have an account yet register here, back start your journey here, already have an account login here.

A verification link has been sent to your email id

If you have not recieved the link please goto Sign Up page again

back Please enter the OTP that is sent to your registered email id

Back please enter the otp that is sent to your email id, back please enter your registered email id.

This email id is not registered with us. Please enter your registered email id.

back Please enter the OTP that is sent your registered email id

Please create the new password here, privacy overview.

Job guarantee

12 Data Science Projects To Try (From Beginner to Advanced)

Sakshi Gupta

In this article

What Is a Data Science Project?

Data science projects to try, datasets for data science project ideas, tips for creating interesting data science projects, data science projects faqs.

capstone project ideas for data science

From breast cancer detection to user experience design, businesses across the globe are leveraging data science to solve a wide range of problems. Every mobile/web-based product or digital experience today demands the application of data science for personalization, customer experience, and so on. This opens up a world of opportunities for data science professionals.

To land a data science job, however, early career professionals need more than just a strong theoretical foundation. Hiring managers today are looking for data scientists who have the hands-on experience of delivering projects that solve real-world problems. Even before you land your first job, you need to have ‘experience’ demonstrating your ability to deliver them. No sweat. We’ve brought help.

A data science project is a practical application of your skills. A typical project allows you to use skills in data collection, cleaning , analysis, visualization, programming, machine learning, and so on. It helps you take your skills to solve real-world problems. On successful completion, you can also add this to your portfolio to show your skills to potential employers.

Whether you’re a complete beginner or one with advanced skills, you can gain hands-on experience by trying out projects on your own or working with peers. To help you get started, we’ve curated a list of the top 15 interesting data science projects to try. See what catches your fancy and get started!

Beginner Data Science Projects

“eat, rate, love”—an exploration of r, yelp, and the search for good indian food.

Beginner Data Science Projects, Yelp

When it comes time to eat, many people turn to Yelp to choose the best options for the type of food they’re looking for. They search, eat, rate, and leave reviews for the restaurants they’ve visited. This makes Yelp a great source of data to run data science projects. 

A Springboard Data Science Bootcamp graduate Robert Chen chose this data to explore if the best reviews led to the best Indian restaurants. Chen discovered while searching Yelp that there were many recommended Indian restaurants with similar scores. Certainly, not all the reviewers had the same knowledge of this cuisine, right? With this in mind, he took into consideration the following:

His modification to the data and the variables showed that those with Indian names tended to give good reviews to only one restaurant per city out of the 11 cities he analyzed, thus providing a clear choice per city for restaurant patrons.

Yelp’s data has become popular among newcomers to data science. You can access it here . Find out more about Robert’s project here .

Customer Segmentation with R, PCA, and K-Means Clustering

Beginner Data Science Projects, Customer Segmentation with R, PCA, and K-Means Clustering

Marketers perform complex segmentation across demographic, psychographic, behavioral, and preference data for each customer to deliver personalized products and services. To do this at scale, they leverage data science techniques like supervised learning.

Data scientist Rebecca Yiu’s project on market segmentation for a fictional organization, using R, principal component analysis (PCA), and K-means clustering, is an excellent example of this. She uses data science techniques to identify the prospective customer base and applies clustering algorithms to group them. She classifies customers into clusters based on age, gender, region, interests, etc. This data can then be used for targeted advertising, email campaigns, and social media posts. 

You can learn more about her data science project here .

Road Lane Line Detection

Beginner Data Science Projects, Road Lane Line Detection

To follow lane discipline, self-driving cars need to detect the lane line. Data science and machine learning can play a crucial role in making this happen. Using computer vision techniques, you can build an application to autonomously identify track lines from continuous video frames or image inputs. Data scientists typically use OpenCV library, NumPy, Hough Transform, Spacial Convolutional Neural Networks (CNN), etc., to achieve this.

You can access a sample video for this project from this git repository here .

Intermediate Data Science Projects

Nfl third and goal behavior.

Intermediate Data Science Projects, NFL Third and Goal Behavior

The intersection of sports and data is full of opportunities for aspiring data scientists . Divya Parmar, a lover of both, decided to focus on the NFL for his capstone project during Springboard’s Introduction to Data Science course. His goal was to determine the efficiency of various offensive plays in different tactical situations. 

Parmar collected play-by-play data from Armchair Analysis, and used R and RStudio for analysis. He developed a new data frame and used conventional NFL definitions. Through this project, he learned to:

You can access the dataset here . 

Who’s a Good Dog? Identifying Dog Breeds Using Neural Networks

Intermediate Data Science Projects, Identifying Dog Breeds Using Neural Networks

Image classification is one of the most popular and widely in-demand data science projects. Classifying dogs based on their breeds by looking into their image is a highly loved data science project. Garrick Chu , a graduate of Springboard’s Data Science Career Track, chose this for his final year submission. 

One of Garrick’s goals was to determine whether he could build a model that would be better than humans at identifying a dog’s breed from an image. Because this was a learning task with no benchmark for human accuracy, once Garrick optimized the network to his satisfaction, he went on to conduct original survey research to make a meaningful comparison.

He worked with large data sets to effectively process images (rather than traditional data structures) with network design and tuning, avoiding over-fitting, transfer learning (combining neural nets trained on different data sets), and performing exploratory data analysis. 

To do this, he leveraged neural networks with Keras through Jupyter notebooks. You can explore more of Garrick’s work here and access the data set he used here .

Uber’s Pickup Analysis

Intermediate Data Science Projects, Uber’s Pickup Analysis

Is Uber Making NYC Rush-Hour Traffic Worse? —This was one of the four questions answered by FiveThirtyEight, a data-driven news website now owned by ABC. If you are looking to improve your data analysis and data visualization skills, this is a great data science project. 

For this, FiveThirtyEight obtained Uber’s rideshare data and analyzed it to understand ridership patterns, how it interacts with public transport, and how it affects taxis. They then wrote detailed news stories supported by this data analysis. You can read their work of data journalism here . You can access the original data on Github .

Predicting Restaurant Success

Intermediate Data Science Projects, Predicting Restaurant Success

Here is another Yelp-based project, but more complex than the one we discussed earlier. Data scientist Michail Alifierakis used Yelp data to build his “Restaurant Success Model” to evaluate the success/failure rates of restaurants. He uses a linear logistic regression model for its simplicity and interpretability, optimized for the precision of open restaurants using grid search with cross-validation.

This is a great data science use case for lenders and investors, helping them make profitable financial decisions. You can learn more about the project from here and take a look at the code on GitHub .

Predictive Policing

Intermediate Data Science Projects, Predictive Policing

Many law enforcement agencies worldwide are moving towards data-driven approaches to forecasting and preventing crimes. They leverage data science technologies to automate the pattern detection process that will help to reduce the burden on crime analysts. Data scientist Orlando Torres launched a data science project on predictive policing, albeit to unexpected results. He used data from the open data initiative and trained the model on 2016 data to predict the crime incidents in a given zip code, day, and time in 2017. He used linear regression, random forest regressor, K-nearest neighbors, XGBoost, and deep learning model — multilayer perceptron.

With this data science project, he learned that it is very easy to lose explainability while building models. He writes, “if we start sending more police to the areas where we predict more crime, the police will find crime. However, if we start sending more police anywhere, they will also find more crime. This is simply a result of having more police in any given area trying to find crime.” Given the number of law enforcement agencies using data science for policing, it almost feels like a self-fulfilling prophecy.

You can read more about his project here .

Building Chatbots

Intermediate Data Science Projects, Building Chatbots

Today, businesses are automating their customer services with chatbots. Creating your own chatbot can be a great data science project too. The two types of chatbots available today are domain-specific chatbots and open-domain chatbots. They both use Natural Language Processing (NLP) and Recurrent Neural Networks (RNN). For an intermediary data scientist, you can perhaps take this up a notch—try creating a sensitive chatbot with capabilities to detect user sentiment.

Patrick Meyer runs a data science project of this kind. He discusses using the polarity system to identify happy, neutral, and unhappy; Paul Ekman’s initial model with six emotions—anger, disgust, fear, joy, sadness, and surprise or his extended list of sixteen; Robert Plutchik’s wheel of emotions and Ortony, Clore, and Collins (OCC) model. 

You can learn more about his detection techniques here . And access the dataset here .

Advanced Data Science Projects

Amazon vs. ebay analysis.

Advanced Data Science Projects, Amazon vs. eBay Analysis

Finding the lowest price for a product on the Internet makes up a significant part of online shopping. Chase Roberts decided to make that easier. In support of a Chrome extension he was building, Roberts compared the prices of 3,500 products on eBay and Amazon. The results showed the potential for substantial savings. For his project, Roberts built a shopping cart with 3,520 products to compare prices on eBay vs. Amazon. Here’s what he found:

You can read more about his project, starting with how he gathered the data and documenting the challenges he faced during this process.

Fake News Detection

Advanced Data Science Projects, Fake News Detection

A recent study revealed that false news spread faster and reached more people than the truth and around 52% of Americans shared that they regularly encountered fake news online. A four-person team from the University of California at Berkeley built a fake news classifier . For this, the team focussed on clickbait and propaganda, the two common forms of fake news. They then developed a classifier that would detect these two forms. Their process involved: 

You can learn and try out more about this here .

Audio Snowflake

Advanced Data Science Projects, Audio Snowflake

When you think about interesting data science projects, chances are you think about how to solve a particular problem, as seen in the examples above. But what about creating a project for the sheer beauty of the data? For her Hackbright Academy project, Wendy Dherin did just that. 

She developed Audio Snowflake to create a splendid visual representation of music as it played, capturing specific components like tempo, key, mood, and duration. Audio Snowflake mapped both quantitative and qualitative characteristics of songs to visual traits like saturation, color, rotation speed, and figures it produces. 

Read more on this project here .

Visualizing Climate Change

Advanced Data Science Projects, Visualizing Climate Change

2020 was recorded as the warmest year to date by NASA, and the last seven years have been the warmest seven years on record. Climate change is one of the most pressing issues humans face today. It is more important than ever to spread awareness and inform people of the magnitude of this problem. Data visualization can play a crucial role in that. 

The data scientist Giannis Tolios did a project where he visualized the changes in global mean temperatures and the rise of CO2 levels in the atmosphere using Python . He uses various libraries such as Pandas, Matplotlib, and Seaborn for the data, visualizing it in line graphs and scatterplots. If climate change is a topic you want to work on, you can learn more about the project here .

Democratizing Data Science at Uber

Advanced Data Science Projects, Democratizing Data Science at Uber

One of the key challenges in data science is that it requires one to be a mathematician or a statistician even to make basic predictions and forecasts. Uber’s data science platform overcomes this challenge by automating forecasting using pre-built algorithms and tools, enabling everyone on the team to get predictions as long as they have data. 

Director of Data Science at Uber, Franziska Bell , talks about how they plan to give the capabilities of a data scientist to every Uber employee. This way, Uber uses artificial intelligence, machine learning, and data science to solve real-world problems. Read more about it here .

Credit Card Fraud Detection

Advanced Data Science Projects, Credit Card Fraud Detection

With online and digital transactions gaining more popularity today, their chances of being fraudulent are also on the rise. Therefore banks and financial institutions are looking to leverage data science techniques to identify fraudulent transactions and prevent them from being executed. By processing data across customer location, behavior, transaction value, network, payment method, etc., you can train the algorithm to detect anomalies. You can build your classification engine for fraud detection using decision trees , K-nearest neighbor, logistic regression , support vector machine, random forest, and XGBoost.

To get started, you can find datasets here .

Datasets for Data Science Project Ideas

Here are some online data sources which you can access and download for free for your data science projects:  

 VoxCeleb . A gender-balanced, audio-visual data set containing short clips of human speech from speakers of different ages, professions, accents, etc. They are extracted from interviews uploaded to YouTube. It can be used for various applications like speech separation, speaker identification, emotion recognition, etc.

  Boston Housing Data . A fairly small data set based on the information collected by the U.S. Census Bureau data regarding housing in Boston. This data set can be used for assessment, focusing on the regression problem.

Kaggle . With over 50,000 public datasets on a wide range of topics, you can find all the data and code that you require to do your data science project ideas. They also offer competitive data sets that are clean, detailed, and curated. 

National Centres for Environmental Information . The largest storehouse of environmental data in the world, this provides information on the oceanic, atmospheric, meteorological, geophysical, climatic conditions, and more. 

Global Health Observatory . If you are interested in doing projects in the health industry, then this is the best place to get the data you need. It also has some of the latest COVID-19 data. 

Google Cloud Public Datasets . A place where you can access data sets that are hosted by  BigQuery , Cloud Storage , Earth Engine , and other Google Cloud services. 

Amazon Web Services Open Data Registry . This has an extensive repository of data sets that you can either download and use or analyze on the Amazon Elastic Compute Cloud (Amazon EC2). You need to first create a free AWS account to get access to the data sets. 

Tips for Creating Interesting Data Science Projects

To help you navigate the world of data science projects, we asked Springboard mentors and instructors for their advice. Here’s what they had to say. 

Choose the Right Problem

If you’re a data science beginner, it’s best to consider problems that have limited data and variables. Otherwise, your project may get too complex too quickly, potentially deterring you from moving forward. Choose one of the data sets in this post, or look for something in real life that has a limited data set. Data wrangling can be tedious work, so it’s critical, especially when starting out, to make sure the data you’re manipulating and the larger topic is interesting to you. These are challenging projects, but they should be fun!

Breaking Up the Project Into Manageable Pieces

Your next task is to outline the steps you’ll need to take in order to create your data science project. Once you have your outline, you can tackle the problem and develop a model to prove your hypothesis. You can do this in six steps:

Generate Your Hypotheses

After you have your problem, you need to create at least one hypothesis to help solve the problem. The hypothesis is your belief about how the data reacts to certain variables. 

This is, of course, dependent on you obtaining the general demographics of specific neighborhoods. You will need to create as many hypotheses as you need to solve the problem.

Study the Data

Your hypotheses need to have data that will allow you to prove or disprove them. Look in the data set for variables that affect the problem. If you do not have the data, either dig deeper or change your hypothesis.

Clean the Data

As much as data scientists prefer to have clean, ready-to-go data, the reality is seldom neat or orderly. You may have outlier data that you can’t readily explain, like a sudden large, one-time purchase of an expensive item in a store that is in a lower-income neighborhood. Or maybe one store didn’t report data for a week.

These are all problems with the data that aren’t the norm. In these cases, it’s up to you as a data scientist to remove those outliers and add missing data so that the data is more or less consistent. Without these changes, your results will become skewed, and the outlier data will affect the results, sometimes drastically.

Engineer the Features

At this stage, you need to start assigning variables to your data. You need to factor in what will affect your data. Does a heatwave during the summer cause sales to drop? Does the holiday season affect sales in all stores and not just middle-to-high-income neighborhoods? Things like seasonal purchases become variables you need to account for.

Create Your Predictive Models

At some point, you’ll have to come up with predictive models to support your hypotheses. For example, you’ll have to write code to predict sales. You may explore whether an after-Christmas sale increases profits and, if so, by how much. You may find that a certain percentage of sales earns more money than other sales, given the volume and overall profit.

Communicate Your Results

In the real world, all the analysis and technical results you come up with are of little value unless you can explain to your stakeholders what they mean in a comprehensible and compelling way. Data storytelling is a critical and underrated skill that you must develop. To finish your project, you’ll want to create a data visualization or a presentation that explains your results to non-technical folks.

Get To Know Other Data Science Students

Karen Masterson

Karen Masterson

Data Analyst at Verizon Digital Media Services

Mikiko Bazeley

Mikiko Bazeley

ML Engineer at MailChimp

Leoman Momoh

Leoman Momoh

Senior Data Engineer at Enterprise Products

How Do You Measure the Success of Data Science Projects?

As a learner, the most critical measure of success is that you have put your skills and knowledge to practice. Good data science projects not only show that you can solve problems but also shows the potential employer how you approach problem-solving. As long as you can add your project to your portfolio, consider it successful.

How Can You Find Interesting Data Science Projects To Try?

This blog post should get you started on various projects you could take up. Online courses like the Springboard Data Science Bootcamp include real-world projects that amplify your portfolio. You can contribute to open-source projects. You can also participate in competitions on platforms like Kaggle and Driven Data to improve your model-building skills.

How Can You Showcase Your Data Science Projects?

You can: – Include it in your resume – Link them to your Linkedin profile – Maintain an active Github account  – Create your portfolio website – Write case studies of your projects and publish them on a blog/Medium

Since you’re here… Are you a future data scientist? Investigate with our free guide to what a data scientist actually does . When you’re ready to build a CV that will make hiring managers melt, join our Data Science Bootcamp that guarantees a job or your tuition back!

Download our guide to becoming a data scientist in six months

Learn how to land your dream data science job in just six months with in this comprehensive guide.

Related Articles

How much does a data scientist at facebook earn.

How Much Does a Data Scientist at Facebook Earn?

K Means Clustering Machine Learning Algorithm: Introduction and Implementation

k-means-clustering-springboard-india

3 Proven Steps For Career Transition from Data Analyst to Data Scientist

capstone project ideas for data science

capstone project ideas for data science

samcha

Oct 5, 2018

Data science capstone ideas (and how to get started)

Capstones are standalone projects meant to integrate, synthesize, and demonstrate all your data science knowledge in a multi-faceted way. Capstone projects show your readiness for using data science in real life, and are ideally something you can add to your resume, show to employers, or even use to start a career.

I find data science capstone ideas are like puppies: you want all of them, but can only keep one. Below is a list of some of my ideas and starting points.

Idea #1: Nutritional analysis from Instacart orders

In 2017 Instacart released a dataset of over 3 million grocery orders from over 200,000 users as a Kaggle competition . With a dataset this juicy, immediately a few ideas come time to mind:

The first and second are doable with the data you already have, which is nice.

The third was my personal choice, using the USDA food composition database to look up products and create a nutritional breakdown (by the way, they have an API ). But it also introduced a lot of hurdles:

- Users don’t eat everything they order (e.g. cat food, soap, toilet paper). This would require a lot of cleaning and munging.

- Users don’t order just for themselves (e.g. companies, birthday parties, families).

- Users order on different timelines (e.g. once per week, once every two weeks, once a month).

- Items such as deli food may not have entries in the USDA database.

The fourth would also utilize the USDA database, but would not require any user-specific information or messing about with time-series.

I dea #2: Predicting solar output from satellite imaging/historical weather

One of the big issues with mainstream adoption of solar power is unlike other energy sources (hydroelectric, oil, nuclear), you can’t control how long the sun shines for. Overestimating this amount means losses for producers and investors, and downtime for users. Underestimating means a lower chance of adoption in upfront decision-making. Sounds like a job for… machine learning!

Many datasets can be found at NREL , however they are in different years and different locations with limits on how much you can download at once. They have an API , which is useful.

SolarAnywhere has an academic license, allowing you to look up any location (but only for the year 2013). They too have an API .

Also, the NREL NSRDB data viewer .

There are three immediate approaches I can think of:

- Using previous solar output to predict current solar output (time-series or RNN).

- Using weather datasets

- Using satellite imaging datasets

There are a lot of academic papers on this last subject ( a quick Google Scholar search returns about 30,000 results ), but not a lot of publicly available satellite time-series datasets.

Idea #3: Fake news detection

This is a hot one. Without going into full rant-mode, fake news is obviously deleterious for democracy and individual mental stability.

So how to accurately identify what’s fake and what’s true? Here are a few leads on this as a data science problem:

1. Fake News Challenge

This is the best-formatted challenge around this topic, with organizers, advisors, and volunteers from the academic, ML, and fact-checking communities. Includes GitHub repos of winning submissions. Check out the competition page on Codalab.

2. Snopes Junk News

A starting point for well-verified fake news stories vs. actual events.

3. Getting Real About Fake News — Kaggle Dataset

A collection of nearly 13,000 items from 244 websites tagged “BS” from the BS Detector chrome extension. The BS Detector is powered by Open Sources , a project that classifies biased and fake websites.

Where To Get More Ideas

Never stop searching! Here are some ways to get more leads, either in the form of project ideas or datasets to use.

1. Academic papers

2. Kaggle Competitions

3. Kaggle Datasets

4. reddit.com/r/datasets

5. Awesome Public Datasets GitHub Repo

6. Google Datasets

Anything I can write about to help you find success in data science or trading? Tell me about it here: https://bit.ly/3mStNJG

More from samcha

Python, trading, data viz. Get access to samchaaa++ for ready-to-implement algorithms and quantitative studies: https://samchaaa.substack.com/

About Help Terms Privacy

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store

Text to speech

All Capstone Projects (2017-2021)

A Data-Driven Approach to Forecasting the U.S. Beer Industry

Assortment Optimization

Suggested Order Quantities

Natural Language Processing for Customer Experience Evaluation

Business Churn Projection and Prediction

Revenue Integrity: Fraudulent Booking Identification

ARCA COCA-COLA

Portfolio Recommendation System for A Leading Coca-Cola Bottler

Prioritizing Customers Visits

True Sales Potential: Unleashing the untapped opportunity

ASSURANCE IQ

Predicting Approval and Denial Rates for Insurance Shoppers

Fostering Innovative Outreach Methods to Engage with New and Existing Customers

Demand Forecasting for a Luxury Fashion Retailer

Trend Forecasting to Quantify Consumer Sentiment

Customer Retention & Targeted Recommendations

Option Take-Rate Forecasting for the BMW Group

Automotive Noise Mining and Classification

Car Recommender for U.S. Dealerships

Connecting the Dots: Matching Existing Solutions to New Defects

Automating the quality control in car manufacturing using computer vision

Reprice with Confidence: Dynamic Pricing with Robust Time-series Forecasting

Cloud Cost Prediction

COLUMBIA THREADNEEDLE INVESTMENTS/AMERIPRISE FINANCIAL

Quantifying Advisors Marketing Engagement and Predicting Quality Leads for Sales

Optimizing Content Likely Personalization

Chatbot or Call? Optimal Contact Channel Selection for Customer Issue Resolution

CORVUS INSURANCE

Automated Dataset Creation using PDF Text Extraction

Improving SMS Customer Experience through a Transformer-based Chatbot

Transport Acquisition Recommendation

ESTEE LAUDER

Identifying Customer Sentiment’s Business Impact

GENERAL ELECTRIC

Predicting Appliance Failures

GENERAL MOTORS

Zero Crashes Initiative

Tackling Congestion Using Connected Car Data

Crowd Sourcing Fuel Data for Sustainable Routing Algorithms

Understanding US Dealership Visitation through Automated Geofence Creation

Electric Vehicle as an Energy Reservoir: Vehicle-to-Grid

[m]clusters: Audiences First

Project Peggy Olson: Data Driven Creativity

Peggy Olson 2.0: Creative AI

Advertisement Attribution for Smarter Channel Investment Strategy

Dynamic Promotion Optimization over Sparse Demand Regression

HANDLE GLOBAL

The Hidden Cost of Healthcare: Transforming medical equipment management

HARTFORD HEALTHCARE

A Data-Driven Approach to Healthcare

Intent Classification from Unlabelled Dataset

Explainability and Bias Removal in Natural Language

Prediction and Optimization of Medical Billing Operations

LINCOLN LABORATORY

USTRANSCOM Flight Data Analysis

Optimizing Lab Procurement with Sparse Vendor Selection

Predictive Aircraft Maintenance: Detecting Imminent Part Failure with Cox Regression and Advanced Ensemble Learning Methods

Avert Disaster: Safety Modeling for the Military Sealift Command (MSC) Ships

Automating UAV Classification and Detection Through Signal and Image Recognition

Budget Allocation Through Marketing AttributioN, a.k.a. BATMAN

Generating Product Recommendations for Small Businesses at Scale

Email Performance and Personalized Recommendations

MASS GENERAL HOSPITAL (MGH)

Interpretable Machine Learning to Alleviate Bias In Trauma Patient Disposition

Routing Vehicles for MBTA’s The Ride

Reducing Costs at The Ride

Paratransit Operations: Impact of Driver Behavior and Demand Forecast

Ridership forecasting and automated geocoding for paratransit ride services

MCKINSEY & CO

What are Large Organizations Hungry For?

Introducing Ratatouille: a Generalizable, Goal-Oriented Dialog Bot

Machine Learning Methods in Credit Risk

Industrial Agglomeration for Single-Industry Spatial Pattern Recognition and Predictive Growth Modeling

Algorithm for Vector-Based Topic Extraction with NLP

Knowledge video summarization through AI

Segmenting Retail Advisors and Optimizing Coverage Model

From Unstructured Text Data to Interpretable Financial Prescriptions: An Optimization Approach

Optimal Client Interaction

To meet, or not to meet, that is the question: Optimizing Interaction Strategies

NEON PAGAMENTOS SA

Customer Relationship Network for Credit Default Prediction

Local Inventory Deployment Optimization

Forecasting Demand for E-Commerce

Prevenar Factory Schedule Optimization: A Mixed Integer Programming Approach

Sharing is Caring: Investigation Load Balancing

QUEST DIAGNOSTICS

Predicting Disease from Longitudinal Laboratory Data

Disease Risk Evaluations in Life Insurance Underwriting via Laboratory and Prescription-Driven Diagnosis Models

Finding the Needle in the Haystack: Anomaly Detection in the Cybersecurity Industry

Lateral Movement Detection: Leveraging Data in the Cybersecurity Industry

RUE GILT GROUPE

Navigation-Based Personalized Recommender System

SCHLUMBERGER

Deep Reinforcement Learning to Automate Acoustic Data Processing

Reliable Machine Learning in a World of Uncertainty

Price Prediction for the Dubai Residential Real-Estate Market

Brewing a Better Shot: IoT Predictive Maintenance for Mastrena II Espresso Machines

Automated Ticket Trading

Events and Tickets Representation Learning and Personalized Recommendation

Guaranteed Sales

Dynamic Pricing Models

Home Page Event Recommendation Optimization

Project Phoenix: Wildfire Prediction in Canada

Protection Gap Explorer: A Data-Driven Exploration of US Life Underinsurance

Life and Health in a Changing Climate

TAKEDA PHARMACEUTICALS

Understanding what causes suboptimal operational performance in clinical trials

THERMO FISHER SCIENTIFIC

Empowering Sales Management with Potential Detection and Conversion Analysis

TRIP ADVISOR

Optimizing User Experience in Hotels Searches by Accurate Price Forecasts

Demand Forecasting with a Segmented Approach

Digital Marketing Attribution Model

Personalized Marketing: Who, How, and When to Market Any Product at Target

Opioid Detection in US Mail Stream

Creating a Tool to Diagnose Out Of Stock Causes

Improving Inventory Placement for Walmart E-Commerce

Planogram Optimization: Finding Optimal Product Placement on the Shelf

Transportation and Shipping Efficiency

The Value of a Day: Optimizing Delivery Time

Optimizing Targeting Strategy for Services

Characterizing Intent Using Customer Journey: a Sequential and Graphical Model Approach

What Products Should be Displayed? Double Assortment Optimization 

capstone project ideas for data science

Interested in hearing more about Bay Path University? Please select a program below:

Interested in applying to Bay Path University? Please select an application below:

Additional Navigation

Applied data science (ms) student capstone projects.

Case Analysis Capstone (ADS670) aims to develop both technical and soft skills that are not directly taught in the traditional courses in the program, but are relevant and critical in order to develop, innovate and communicate in modern data science. This is a project-oriented capstone that will harness the skills gained throughout the program.

Below are some examples of original ​research studies done by students in our master's in Applied Data Science program for their completed capstone projects.

Capstone Projects

M.S. in Data Science students are required to complete a capstone project. Capstone projects challenge students to acquire and analyze data to solve real-world problems. Project teams consist of two to four students and a faculty advisor. Teams select their capstone project at the beginning of the year and work on the project over the course of two semesters. 

Most projects are sponsored by an organization—academic, commercial, non-profit, and government—seeking valuable recommendations to address strategic and operational issues. Depending on the needs of the sponsor, teams may develop web-based applications that can support ongoing decision-making. The capstone project concludes with a paper and presentation.

Key takeaways:

Capstone projects have been sponsors by a variety of organizations and industries, including: Capital One, City of Charlottesville, Deloitte Consulting LLP, Metropolitan Museum of Art, MITRE Corporation, a multinational banking firm, The Public Library of Science, S&P Global Market Intelligence, UVA Brain Institute, UVA Center for Diabetes Technology, UVA Health System, U.S. Army Research Laboratory, Virginia Department of Health, Virginia Department of Motor Vehicles, Virginia Office of the Governor, Wikipedia, and more. 

Sponsor a Capstone Project  

View previous examples of capstone projects  and check out answers to frequently asked questions. 

What does the process look like?

What is the seminar approach to mentoring capstones?

We utilize a seminar approach to managing capstones to provide faculty mentorship and streamlined logistics. This approach involves one mentor supervising three to four loosely related projects and meeting with these groups on a regular basis. Project teams often encounter similar roadblocks and issues so meeting together to share information and report on progress toward key milestones is highly beneficial.

Do all capstone projects have sponsors?

Not necessarily. Generally, each group works with a sponsor from outside the School of Data Science. Some sponsors are corporations, some are from nonprofit and governmental organizations, and some are from in other departments at UVA.

Why do we have to work in groups?

Because data science is a team sport!

All capstone projects are completed by group work. While this requires additional coordination , this collaborative component of the program reflects the way companies expect their employees to work. Building this skill is one of our core learning objectives for the program. 

I didn’t get my first choice of capstone project from the algorithm matching. What can I do?

Remember that the point of the capstone projects isn’t the subject matter; it’s the data science. Professional data scientists may find themselves in positions in which they work on topics assigned to them, but they use methods they enjoy and still learn much through the process. That said, there are many ways to tackle a subject, and we are more than happy to work with you to find an approach to the work that most aligns with your interests.

Can I work on a project for my current employer?

Each spring, we put forward a public call for capstone projects. You are encouraged to share this call widely with your community, including your employer, non-profit organizations, or any entity that might have a big data problem that we can help solve. As a reminder, capstone projects are group projects so the project would require sufficient student interest after ‘pitch day’. In addition, you (the student) cannot serve as the project sponsor (someone else within your employer organization must serve in that capacity).

If my project doesn’t have a corporate sponsor, am I losing out on a career opportunity?

The capstone project will provide you with the opportunity to do relevant, high-quality work which can be included on a resume and discussed during job interviews. The project paper and your code on Github will provide more career opportunities than the sponsor of the project. Although it does happen from time to time, it is rare that capstones lead to a direct job offer with the capstone sponsor's company. Capstone projects are just one networking opportunity available to you in the program.

Capstone Project Reflections From Alumni

Gabriel Rushin, MSDS 2017

“Capstone projects are opportunities for you to deliver valuable, quantifiable results that you can use as a testimony of your long-term project success to the company you work for and other companies in future interviews.” — Gabriel Rushin, MSDS 2017, Procter & Gamble, Senior Machine Learning Engineer Manager

Colleen Callahan

“For my capstone project, I worked to develop a clustering model to assess biogeographic ancestry, using DNA profiles. I felt like I was finally doing real-world data science and loved working with such an important organization as the Department of Defense.” — Colleen Callahan, Online MSDS 2021, Associate Research Analyst, CNA (Arlington, Virginia)

Capstone Project Reflections From Sponsors

“For us, the level of expertise, and special expertise, of the capstone students gives us ‘extra legs’ and an extra push to move a project forward. The team was asked to provide a replicable prototype air quality sensor that connected to the Cville Things Network, a free and community supported IoT network in Charlottesville. Their final product was a fantastic example that included clear circuit diagrams for replication by citizen scientists.” — Lucas Ames, Founder, Smart Cville
“Working with students on an exploratory project allowed us to focus on the data part of the problem rather than the business part, while testing with little risk. If our hypothesis falls flat, we gain valuable information; if it is validated or exceeded, we gain valuable information and are a few steps closer to a new product offering than when we started.” — Ellen Loeshelle, Senior Director of Product Management, Clarabridge

students working on capstone project

MSDS Capstone Projects Give Students Exposure to Industry While in Academia

student presentations

Master's Students' Capstone Presentations

Get the latest news.

Subscribe to receive updates from the School of Data Science.

Capstone Projects

The capstone project experience.

In the final two quarters of the program, students gain real world experience working in small groups on a data science challenge facing a company or not-for-profit. At the conclusion of the capstone project, sponsoring organizations are invited to attend a formal Capstone Event where students showcase their work. Capstone projects typically span a wide range of interests, including energy, agriculture, retail, urban planning, healthcare, marketing, and education.

Examples of Previous Capstone Sponsors

Capstone 2020-22 Archives (Gather.Town)

capstone project ideas for data science

Due to the pandemic, our Capstone 2021 was held entirely online in the Gather.Town platform , to which we added galleries of our 2020 and 2022 Capstone projects for an archive you can digitally wander and browse.

Gather presents a map-based, interactive platform where you can wander among projects, see media like posters, infographics, and video, and do video/audio chat with others who are logged into the space. You can read some basics about using this platform at the Gather site. One of the other benefits of Gather is that it created a persistent archive of our Capstone 2020-2022 projects, which you can view and digitally wander among here:

https://tinyurl.com/msdsfair

Other examples of past projects.

capstone project ideas for data science

Visualizing Gentrification in Seattle

MSDS students Deepa Agrawal, Angel Wang, and Erin Orbits created an interactive mapping tool to visualize gentrification in Seattle.

Sponsor: Urban Planning, University of Washington

capstone project ideas for data science

Using Artificial Intelligence to Monitor Inventory in Real Time

Capstone researchers Havan Agrawal, Toan Luong, Vishnu Nandakumar, and Tejas Hosangadi explored new methods for optimizing supply chains and product placements to improve sales.

Sponsor: Clobotics

capstone project ideas for data science

Predicting Soil Moisture with Machine Learning

MSDS students Samir Patel, Rex Thompson, Michael Grant, and Dane Jordan developed machine learning models to accurately estimate soil moisture using satellite imagery.

Sponsor: Civil & Environmental Engineering, Washington State University

Admissions Timelines

Application for Autumn 2023 is now closed. Next admissions cycle opens in September 2023 for Autumn 2024 admissions.

January 12, 2023 @ 11:59pm PST – International application deadline

January 19, 2023 @ 11:59pm PST – Domestic application deadline

Mid-March 2023 – Admissions decisions released

Please note: No late applications accepted for any reason.

Sign Up for Email Updates

Be boundless, connect with us:.

© 2023 University of Washington | Seattle, WA

IMAGES

  1. Capstone Project Ideas for Data Science

    capstone project ideas for data science

  2. Selecting Data Science Capstone Project Ideas You Can Be Proud Of

    capstone project ideas for data science

  3. Major Features Of The Capstone Projects

    capstone project ideas for data science

  4. Nursing Capstone Papers : What Is a Capstone Project in Nursing?

    capstone project ideas for data science

  5. Searching for some interesting IT capstone project titles? You can find a lot of them on this

    capstone project ideas for data science

  6. Capstone Project Stage 1

    capstone project ideas for data science

VIDEO

  1. Capstone Project : Demo Using Java language

  2. Data Science Project ideas from ChatGPT #shorts

  3. Creation, Creativity & Ethics in the Age of AI (DTSC-690)- By Negash Fufa

  4. Masters of Data Analytics: Capstone Project Presentation

  5. IBM Data Science Professional Certificate

  6. Capstone Project Data Karyawan Dyah Ayu Daratika

COMMENTS

  1. What Is the Conclusion in a Science Project?

    The conclusion in a science project summarizes the results of the experiment and either contradicts or supports the original hypothesis. It is a simple and straightforward answer to the question posed by the experiment. This section is clea...

  2. What Is a Introduction in a Science Project?

    One of the key purposes of the introduction to a science project is setting forth or outlining the purpose of the project in a clear, concise manner. The introduction summarizes how the science project is to work or proceed from start to fi...

  3. What Are Some Easy Science Investigatory Projects?

    Some easy investigatory science project ideas include attempting to purify used cooking oil, making biodegradable plastic and increasing the shelf life of fruits and vegetables. One easy experiment is to investigate possible strategies for ...

  4. 14 Popular Data Science Project Ideas for Beginners

    Building a forest fire prediction model can be a great data science project. Forest fire or wildfire are known to be uncontrollable and capable of causing a

  5. Capstone Projects Archive

    Using Mock Draft Data to Create a Player Availability Dashboard for the NFL Draft · A Practical Data Science Application: Developing Prediction Models for

  6. 50 Best Data Science Project Ideas You Must Know in 2023

    Beginner-Level Data Science Project Ideas · 1. Dr. Semmelweis and the Discovery of Handwashing · 2. Build a Chatbots, Project Source Code · 3. Recommendation

  7. Data Science Capstone Projects From Praxis Business School

    Project 3 – Drowsiness Detection using Computer Vision · Project 4 – Gesture Recognition using Computer Vision · Project 5 – Team Selection using

  8. 12 Data Science Projects To Try (From Beginner to Advanced)

    A typical project allows you to use skills in data collection, cleaning, analysis, visualization, programming, machine learning, and so on. It

  9. Data science capstone ideas (and how to get started)

    Data science capstone ideas (and how to get started) · Idea #1: Nutritional analysis from Instacart orders · Idea #2: Predicting solar output from

  10. All Projects

    All Capstone Projects (2017-2021) · Deep Reinforcement Learning to Automate Acoustic Data Processing · Reliable Machine Learning in a World of Uncertainty

  11. Applied Data Science (MS) Student Capstone Projects

    Applied Data Science (MS) Student Capstone Projects · Predicting Burnout: A Workplace Calculator · Beyond Artist: Text-to-Image AI · Predicting the Assessment

  12. What do you suggest for my capstone project in data science?

    1. Analyzing a large dataset to find patterns or trends. · 2. Building a machine learning model to predict something future events. · 3. Extracting insights from

  13. Capstone Projects

    Capstone projects challenge students to acquire and analyze data to solve real-world problems. Project teams consist of two to four students and a faculty

  14. Capstone Projects

    Other Examples of Past Projects · Visualizing Gentrification in Seattle · Using Artificial Intelligence to Monitor Inventory in Real Time · Predicting Soil