- No suggested jump to results
- Notifications

ankit1797/Udacity-data-scientist-capstone-project
Name already in use.
Use Git or checkout with SVN using the web URL.
Work fast with our official CLI. Learn more about the CLI .
- Open with GitHub Desktop
- Download ZIP
Sign In Required
Please sign in to use Codespaces.
Launching GitHub Desktop
If nothing happens, download GitHub Desktop and try again.
Launching Xcode
If nothing happens, download Xcode and try again.
Launching Visual Studio Code
Your codespace will open once ready.
There was a problem preparing your codespace, please try again.

Latest commit
Udacity - data scientist capstone project: sparkify - customer churn prediction model, table of contents.
- Description
- Dependencies
- Executing Program
- Defined Features and Models
- Acknowledgement
1. Description
Like Spotify and fizy, Sparkify is music streaming application. Udacity provides datasets of Sparkify user log transactions. Mini dataset (128MB) and the full dataset (12GB) are also available.
In this project, I completed steps (methodology) below to implement model predicting potential users who will churn. Then, Sparkify can identify users who are likely cancel your service membership, then Sparkify can send marketing offers or campaign to them to preventing churn and revenue leakage. I also used mini dataset in Jupyter notebook.
- Load and Clean Dataset
- Exploratory Data Analysis and Defining Churn Label
- Feature Engineering
- Test Models and Determine Best Model with tuning parameters.
Medium blog post is also written to explain details of these steps.
Medium Link: https://dev.to/ankit1797/sparkify-digital-music-service-customer-churn-analysis-prediction-14de
2. Dependencies
Python 3.5+ Python libraries: pandas, numpy, matplotlib.pyplot, seaborn, time, date time Pyspark 2.4 Pyspark.ml Pyspark.sql
Dataset: “mini_sparkify_event_data.json” is the user transactional log data that provided by Udacity before starting project.
Models: Logistic Regression, Random Forest, Gradient Boosted Trees
3.Executing Program
File structure of project is showed below. Jupyter notebook can be executed.
|- Sparkify.ipynb # Notebook importing related packages, cleaning, processing the data, data analysis and ml pipeline (models). |- mini_sparkify_event_data.json # Sparkify user transactional log data provided by Udacity |- README.md # ReadME file
4. Defined Features and Models:
Churn is defined as Cancellation Confirmation events in mini_sparkify_event_data.json data.
Following 10 features are defined to build models
- Average listened songs per session
- Listened total songs by users
- Number of Add Friend transactions
- Number of Add Playlist transactions
- Number of Thumbs Down transactions
- Number of Thumbs Up transactions
- Register duration (days) - between last event date of user and registration date.
- Length of listen time
- Account level
Random Forest Model performs the best on the dataset provided which provides a F1 score of 0.949.
Ankit Patel - Udacity Student in Data Science Nanodegree
7. Acknowledgement
Udacity for providing great online lessons in Data Science Nanodegree Program
- Jupyter Notebook 57.3%

COMMENTS
As a capstone project, students build their own data science portfolio project. Update: Our original Data Scientist Nanodegree program has been updated, and now is two separate programs to offer students more flexibility; the Intro to Machine Learning Nanodegree Program for those new to ML, and the Data Scientist Nanodegree program for those ...
Data Scientist. Estimated 4 months to complete. You’ll master the skills necessary to become a successful Data Scientist. You’ll work on projects designed by industry experts, and learn to run data pipelines, design experiments, build recommendation systems, and deploy solutions to the cloud.
This repository has all the code and report for my Udacity Data Scientist Nanodegree Capstone project. Project Overview : This data set contains simulated data that mimics customer behavior on the Starbucks rewards mobile app.
This is my capstone project for the Udacity Data Scientists Nanodegree. This project has involved simulated data that mimics customer behavior on the Starbucks rewards mobile app. The data consists of customers' demographics attributes, promotions' details and timestamped customer transaction data.
Towards Data Science. ·. 8 min read. ·. Nov 18, 2018. --. It’s a capstone project of the data scientist course in Udacity. The data set contains simulated data that mimics customer behavior on the Starbucks rewards mobile app. Once every few days, Starbucks sends out an offer to users of the mobile app.
Udacity provides datasets of Sparkify user log transactions. Mini dataset (128MB) and the full dataset (12GB) are also available. In this project, I completed steps (methodology) below to implement model predicting potential users who will churn. Then, Sparkify can identify users who are likely cancel your service membership, then Sparkify can ...