coursera ibm data analyst capstone project github 2020

  • No suggested jump to results
  • Notifications

This repository contains materials for the IBM Data Analyst Capstone course from Coursera, including code, Jupyter notebooks and project files, organized by module. It's intended for educational purposes only. Contact me if you have any questions or need help with the materials.


Name already in use.

Use Git or checkout with SVN using the web URL.

Work fast with our official CLI. Learn more about the CLI .

Sign In Required

Please sign in to use Codespaces.

Launching GitHub Desktop

If nothing happens, download GitHub Desktop and try again.

Launching Xcode

If nothing happens, download Xcode and try again.

Launching Visual Studio Code

Your codespace will open once ready.

There was a problem preparing your codespace, please try again.

Latest commit

Ibm data analyst capstone project.

This repository contains all the material for the Coursera course "Data Analyst" from IBM. The course is focused on teaching the skills needed to become a data analyst, including data manipulation, visualization, and analysis.

As I continue the course, I will be uploading additional materials such as code, Jupyter notebooks, and project files.

The materials in this repository are organized by module, with each module containing a set of files and resources specific to that module's content.

If you are a student in the course, please feel free to use this repository as a resource while you work on the course projects. If you are not a student in the course, you are welcome to use the materials in this repository as a learning resource, but please note that they are intended for educational purposes only and may not be used for commercial purposes.

Please feel free to contact me if you have any questions or need help with the materials in this repository.


IBM Data Engineering Professional Certificate

Launch your new career in Data Engineering. Master SQL, RDBMS, ETL, Data Warehousing, NoSQL, Big Data and Spark with hands-on job-ready skills.

Rav Ahuja

Instructors: Rav Ahuja +14 more


Ramesh Sannareddy

Financial aid available

40,239 already enrolled

Professional Certificate - 13 course series

(2,963 reviews)

Recommended experience

Beginner level

Basic computer skills and a grounding in IT systems.  Comfort working in either Linux, Windows, or MacOS. No prior programming or data skills needed.

What you'll learn

Create, design, and manage relational databases and apply database administration (DBA) concepts to RDBMSes such as MySQL, PostgreSQL, and IBM Db2.

Develop and execute SQL queries using SELECT, INSERT, UPDATE, DELETE statements, database functions, stored procedures, Nested Queries, and JOINs. 

Demonstrate working knowledge of NoSQL & Big Data using MongoDB, Cassandra, Cloudant, Hadoop, Apache Spark, Spark SQL, Spark ML, Spark Streaming. 

Implement ETL & Data Pipelines with Bash, Airflow & Kafka; architect, populate, deploy Data Warehouses; create BI reports & interactive dashboards. 

Skills you'll gain

Details to know

coursera ibm data analyst capstone project github 2020

Add to your LinkedIn profile

Available in English

Subtitles: English, Arabic, Persian, Korean

See how employees at top companies are mastering in-demand skills


Prepare for a career in Data Engineering


Get exclusive access to career resources upon completion

Get free access to IBM’s People and Soft Skills Specialization

Improve your resume and LinkedIn with personalized feedback

Practice your skills with interactive tools and mock interviews

Plan your career move with Coursera’s job search guide

¹Lightcast™ Job Postings Report, United States, 1/1/22-12/31/22. ²Based on program graduate survey responses, United States 2021.

This Professional Certificate is for anyone who wants to develop job-ready skills, tools, and a portfolio for an entry-level data engineer position . Throughout the self-paced online courses, you will immerse yourself in the role of a data engineer and acquire the essential skills you need to work with a range of tools and databases to design, deploy, and manage structured and unstructured data .  

By the end of this Professional Certificate, you will be able to explain and perform the key tasks required in a data engineering role. You will use the Python programming language and Linux/UNIX shell scripts to extract, transform and load ( ETL ) data. You will work with Relational Databases (RDBMS) and query data using SQL statements. You will use NoSQL databases and unstructured data.  You will be introduced to Big Data and work with Big Data engines like Hadoop and Spark .  You will gain experience with creating Data Warehouses and utilize Business Intelligence tools to analyze and extract insights.   

This program does not require any prior data engineering, or programming experience.  

This program is ACE® recommended—when you complete, you can earn up to 12 college credits.

Applied Learning Project

Throughout this Professional Certificate, you will complete  hands-on labs and projects  to help you gain  practical experience with Python, SQL, relational databases, NoSQL databases, Apache Spark, building data pipelines, managing databases, and working with data warehouses. Projects:

Design a relational database to help a coffee franchise improve operations

Use SQL to query census, crime, and school demographic data sets.

Write a Bash shell script on Linux that backups changed files.

Set up, test, and optimize a data platform that contains MySQL, PostgreSQL, and IBM Db2 databases.

Analyze road traffic data to perform ETL and create a pipeline using Airflow and Kafka.

Design and implement a data warehouse for a solid-waste management company.

Move, query, and analyze data in MongoDB, Cassandra, and Cloudant NoSQL databases.

Train a machine learning model by creating an Apache Spark application.

Design, deploy, and manage  an  end-to-end data engineering platform.

Introduction to Data Engineering

List basic skills required for an entry-level data engineering role.

Discuss various stages and concepts in the data engineering lifecycle.

Describe and provide examples of data engineering technologies such as Relational Databases, NoSQL Data Stores, Big Data Engines, and others.

Summarize concepts in data security, governance, and compliance.

Python for Data Science, AI & Development

Describe Python Basics including Data Types, Expressions, Variables, and Data Structures.

Apply Python programming logic using Branching, Loops, Functions, Objects & Classes.

Demonstrate proficiency in using Python libraries such as Pandas, Numpy, and Beautiful Soup.

Access web data using APIs and web scraping from Python in Jupyter Notebooks.

Python Project for Data Engineering

Demonstrate your skills in Python for data engineering tasks 

Implement webscraping and use APIs to collect data in Python

Assume the role of a Data Engineer working on a real project

Extract, Transform and Load (ETL) data using Jupyter notebooks

Introduction to Relational Databases (RDBMS)

Describe data, databases, relational databases, and cloud databases.

Describe information and data models, relational databases, and relational model concepts (including schemas and tables). 

Explain an Entity Relationship Diagram and design a relational database for a specific use case.

Implement different relational model constraints.

Databases and SQL for Data Science with Python

Analyze data within a database using SQL and Python.

Create a relational database on Cloud and work with tables.

Write SQL statements including SELECT, INSERT, UPDATE, and DELETE.

Build more powerful queries with advanced SQL techniques like views, transactions, stored procedures and joins.

Hands-on Introduction to Linux Commands and Shell Scripting

Describe the Linux architecture and common Linux distributions and update and install software on a Linux system.

Perform common informational, file, content, navigational, compression, and networking commands in Bash shell.

Develop shell scripts using Linux commands, environment variables, pipes, and filters.

Schedule cron jobs in Linux with crontab and explain the cron syntax. 

Relational Database Administration (DBA)

Create, query, and configure databases and access and build system objects such as tables.

Perform basic database management including backing up and restoring databases as well as managing user roles and permissions. 

Monitor and optimize important aspects of database performance. 

Troubleshoot database issues such as connectivity, login, and configuration and automate functions such as reports, notifications, and alerts. 

ETL and Data Pipelines with Shell, Airflow and Kafka

Describe and contrast Extract, Transform, Load (ETL) processes and Extract, Load, Transform (ELT) processes.

Explain batch vs concurrent modes of execution.

Implement an ETL pipelinethrough shell scripting.

Describe data pipeline components, processes, tools, and technologies.

Getting Started with Data Warehousing and BI Analytics

Explore the architecture, features, and benefits of data warehouses, data marts, and data lakes and identify popular data warehouse system vendors.

Design and populate a data warehouse, and model and querydata using CUBE, ROLLUP, and materialized views.

Identify popular data analytics and business intelligence tools and vendors and create data visualizations using IBM Cognos Analytics.

Design and load data into a data warehouse, write aggregation queries, create materialized query tables, and create an analytics dashboard.

Introduction to NoSQL Databases

Differentiate between the four main categories of NoSQL repositories.

Describe the characteristics, features, benefits, limitations, and applications of the more popular Big Data processing tools.

Perform common tasks using MongoDB tasks including create, read, update, and delete (CRUD) operations.

Execute keyspace, table, and CRUD operationsin Cassandra.

Introduction to Big Data with Spark and Hadoop

Explainthe impact of Big Data including use cases, tools, and processing methods.

ExplainApache Hadoop architecture, ecosystem, and practices, and userelatedapplications including HDFS, HBase, Spark, and MapReduce.

Apply Spark programming basics, including parallel programming basics forDataFrames, data sets, and Spark SQL.

UseSpark’s RDDsanddata sets, optimizingSparkSQLusing Catalyst and Tungsten, anduseSpark’s development and runtime environment options.

Data Engineering and Machine Learning using Spark

Explain how streaming data and Spark Structured Streaming empower machine learning and AI tasks.

Define graph theory, describe Apache Spark GraphFrames, and identify data suitable for GraphFrames.

Describe how ETL processes work with Apache Spark and machine learning and extend that knowledge to Spark MLlib capabilities and related benefits.

Explain supervised learning, unsupervised learning, and clustering, and explain how to use the k-means clustering algorithm with Spark MLlib.

Data Engineering Capstone Project

Demonstrate proficiency in skills required for an entry-level data engineering role.

Design and implement various concepts and components in the data engineering lifecycle such as data repositories.

Showcase working knowledge with relational databases, NoSQL data stores, big data engines, data warehouses, and data pipelines.

Apply skills in Linux shell scripting, SQL, and Python programming languages to Data Engineering problems.

coursera ibm data analyst capstone project github 2020

IBM is the global leader in business transformation through an open hybrid cloud platform and AI, serving clients in more than 170 countries around the world. Today 47 of the Fortune 50 Companies rely on the IBM Cloud to run their business, and IBM Watson enterprise AI is hard at work in more than 30,000 engagements. IBM is also one of the world’s most vital corporate research organizations, with 28 consecutive years of patent leadership. Above all, guided by principles for trust and transparency and support for a more inclusive society, IBM is committed to being a responsible technology innovator and a force for good in the world. For more information about IBM visit:


Earn a career certificate

Add this credential to your LinkedIn profile, resume, or CV

Share it on social media and in your performance review


Get a head start on your degree

When you complete this Professional Certificate, you can earn college credit if you apply and are accepted into one of the following online degree programs.

University of North Texas

University of North Texas

Bachelor of Applied Arts and Sciences

15+ hours of study/wk per course

Why people choose Coursera for their career

coursera ibm data analyst capstone project github 2020

Open new doors with Coursera Plus

Unlimited access to 7,000+ world-class courses, hands-on projects, and job-ready certificate programs - all included in your subscription

Advance your career with an online degree

Earn a degree from world-class universities - 100% online

Join over 3,400 global companies that choose Coursera for Business

Upskill your employees to excel in the digital economy

Frequently asked questions

What is the refund policy.

If you subscribed, you get a 7-day free trial during which you can cancel at no penalty. After that, we don’t give refunds, but you can cancel your subscription at any time. See our full refund policy Opens in a new tab .

Can I just enroll in a single course?

Yes! To get started, click the course card that interests you and enroll. You can enroll and complete the course to earn a shareable certificate, or you can audit it to view the course materials for free. When you subscribe to a course that is part of a Certificate, you’re automatically subscribed to the full Certificate. Visit your learner dashboard to track your progress.

Is this course really 100% online? Do I need to attend any classes in person?

This course is completely online, so there’s no need to show up to a classroom in person. You can access your lectures, readings and assignments anytime and anywhere via the web or your mobile device.

How long does it take to complete the Specialization?

The Professional Certificate requires completion of 13 courses including 10 full courses, 2 mini-courses and a Capstone Project. Each full course typically contains 3-6 modules with an average effort of 2 to 4 hours per module. If learning part-time (e.g. 1 module per week), it would take about 1 year complete the entire certificate. If learning full-time (e.g. 1 module per day) the certificate can be completed in 3 to 4 months. 

What background knowledge is necessary?

This Professional Certificate is open for anyone with any job and academic background. It pre-reqs basic IT literacy and knowledge of IT infrastructure and familiarity working with Windows, Linux or MacOS. No prior computer programming experience is necessary, but is an asset, as is high school math. 

Do I need to take the courses in a specific order?

Yes, it is highly recommended to take the courses in the order they are listed, as they progressively build on concepts taught in previous courses.  

What will I be able to do upon completing the Specialization?

You will develop practical skills using hands-on labs and projects throughout the program. By the end you will have acquired skills and knowledge to enable you to become job ready for an entry level career in Data Engineering.

Can I get college credit for taking the IBM Data Engineering Professional Certificate?

Yes. The IBM Data Engineering Professional Certificate recently secured a credit recommendation from the American Council on Education's (ACE) Credit Recommendation, which is the industry standard for translating workplace learning to college credit. Learners can earn a recommendation of 12 college credits for completing the program. This aims to help open up additional pathways to learners who are interested in higher education and prepare them for entry-level jobs.

How do you share your proof of completion with the educational institutions for transferring credit?

To share proof of completion with schools, certificate graduates will receive an email prompting them to claim their Credlybadge, which contains the ACE®️credit recommendation.  Once claimed, you will receive a competency-based transcript that signifies the credit recommendation, which can be shared directly with a school from the Credly platform. Please note that the decision to accept specific credit recommendations is up to each institution and is not guaranteed. 

Where can I find more information on ACE credit recommendations?

Please see Coursera’s ACE Recommendations FAQ.

More questions

Coursera footer, learn something new.

Popular Certificate Categories

Popular Professional Certificates

Featured Certificate Articles



  1. IBM Data Analyst Capstone Project

    coursera ibm data analyst capstone project github 2020

  2. Free Online Course: IBM Data Analyst Capstone Project from Coursera

    coursera ibm data analyst capstone project github 2020


    coursera ibm data analyst capstone project github 2020


    coursera ibm data analyst capstone project github 2020

  5. GitHub

    coursera ibm data analyst capstone project github 2020

  6. Just finished my Coursera UX capstone project. Looking for feedback

    coursera ibm data analyst capstone project github 2020


  1. Capstone project EDA on hotel booking analysis

  2. similar case study on Capstone project 1 video1046038852

  3. IBM Advanced Data Science Specialization Helpline (1) 7.8.18 4 PM CEST #coursera

  4. DevMountain (QA Software Course) Final Solo Capstone Project

  5. Get that “I learned something new” feeling! 💁🏻‍♀️

  6. FFTCG Collector


  1. ibm-data-analyst-professional · GitHub Topics · GitHub

    Star 15 Code Issues Pull requests IBM Data Analyst Professional Certificate coursera-solutions coursera-answers ibm-data-analyst-professional Updated on Feb 8, 2022 Jupyter Notebook Aditya-Sahuji / IBM-Data-Analyst-Professional- Star 4 Code Issues Pull requests Capstone projects of the IBM Data Analyst Professional.

  2. shavilya/IBM-data-analyst-capstone-project

    This repository contains materials for the IBM Data Analyst Capstone course from Coursera, including code, Jupyter notebooks and project files, organized by module. It's intended for educational purposes only. Contact me if you have any questions or need help with the materials. - GitHub - shavilya/IBM-data-analyst-capstone-project: This repository contains materials for the IBM Data Analyst ...

  3. IBM Data Analyst Capstone Project

    You will perform the various tasks that professional data analysts do as part of their jobs, including: - Data collection from multiple sources - Data wrangling and data preparation - Exploratory data analysis - Statistical analysis and data mining - Data visualization with different charts and plots, and - Interactive dashboard creation.

  4. IBM Data Engineering Professional Certificate

    Set up, test, and optimize a data platform that contains MySQL, PostgreSQL, and IBM Db2 databases. Analyze road traffic data to perform ETL and create a pipeline using Airflow and Kafka. Design and implement a data warehouse for a solid-waste management company. Move, query, and analyze data in MongoDB, Cassandra, and Cloudant NoSQL databases.