udacity data science capstone project

  • No suggested jump to results
  • Notifications

ankit1797/Udacity-data-scientist-capstone-project

Name already in use.

Use Git or checkout with SVN using the web URL.

Work fast with our official CLI. Learn more about the CLI .

Sign In Required

Please sign in to use Codespaces.

Launching GitHub Desktop

If nothing happens, download GitHub Desktop and try again.

Launching Xcode

If nothing happens, download Xcode and try again.

Launching Visual Studio Code

Your codespace will open once ready.

There was a problem preparing your codespace, please try again.

Latest commit

Udacity - data scientist capstone project: sparkify - customer churn prediction model, table of contents.

1. Description

Like Spotify and fizy, Sparkify is music streaming application. Udacity provides datasets of Sparkify user log transactions. Mini dataset (128MB) and the full dataset (12GB) are also available.

In this project, I completed steps (methodology) below to implement model predicting potential users who will churn. Then, Sparkify can identify users who are likely cancel your service membership, then Sparkify can send marketing offers or campaign to them to preventing churn and revenue leakage. I also used mini dataset in Jupyter notebook.

Medium blog post is also written to explain details of these steps.

Medium Link: https://dev.to/ankit1797/sparkify-digital-music-service-customer-churn-analysis-prediction-14de

2. Dependencies

Python 3.5+ Python libraries: pandas, numpy, matplotlib.pyplot, seaborn, time, date time Pyspark 2.4 Pyspark.ml Pyspark.sql

Dataset: “mini_sparkify_event_data.json” is the user transactional log data that provided by Udacity before starting project.

Models: Logistic Regression, Random Forest, Gradient Boosted Trees

3.Executing Program

File structure of project is showed below. Jupyter notebook can be executed.

|- Sparkify.ipynb # Notebook importing related packages, cleaning, processing the data, data analysis and ml pipeline (models). |- mini_sparkify_event_data.json # Sparkify user transactional log data provided by Udacity |- README.md # ReadME file

4. Defined Features and Models:

Churn is defined as Cancellation Confirmation events in mini_sparkify_event_data.json data.

Following 10 features are defined to build models

Random Forest Model performs the best on the dataset provided which provides a F1 score of 0.949.

Ankit Patel - Udacity Student in Data Science Nanodegree

7. Acknowledgement

Udacity for providing great online lessons in Data Science Nanodegree Program

COMMENTS

  1. Introducing the Udacity Data Scientist Nanodegree Program

    As a capstone project, students build their own data science portfolio project. Update: Our original Data Scientist Nanodegree program has been updated, and now is two separate programs to offer students more flexibility; the Intro to Machine Learning Nanodegree Program for those new to ML, and the Data Scientist Nanodegree program for those ...

  2. Data Science Course

    Data Scientist. Estimated 4 months to complete. You’ll master the skills necessary to become a successful Data Scientist. You’ll work on projects designed by industry experts, and learn to run data pipelines, design experiments, build recommendation systems, and deploy solutions to the cloud.

  3. Udacity Data Scientist Nanodegree Capstone Project

    This repository has all the code and report for my Udacity Data Scientist Nanodegree Capstone project. Project Overview : This data set contains simulated data that mimics customer behavior on the Starbucks rewards mobile app.

  4. wildgoosetamer/Udacity-Data-Science-Capstone

    This is my capstone project for the Udacity Data Scientists Nanodegree. This project has involved simulated data that mimics customer behavior on the Starbucks rewards mobile app. The data consists of customers' demographics attributes, promotions' details and timestamped customer transaction data.

  5. Starbuck’s Capstone Challenge. Capstone project of Udacity

    Towards Data Science. ·. 8 min read. ·. Nov 18, 2018. --. It’s a capstone project of the data scientist course in Udacity. The data set contains simulated data that mimics customer behavior on the Starbucks rewards mobile app. Once every few days, Starbucks sends out an offer to users of the mobile app.

  6. GitHub

    Udacity provides datasets of Sparkify user log transactions. Mini dataset (128MB) and the full dataset (12GB) are also available. In this project, I completed steps (methodology) below to implement model predicting potential users who will churn. Then, Sparkify can identify users who are likely cancel your service membership, then Sparkify can ...