- No suggested jump to results
- Notifications
This repository contains materials for the IBM Data Analyst Capstone course from Coursera, including code, Jupyter notebooks and project files, organized by module. It's intended for educational purposes only. Contact me if you have any questions or need help with the materials.

shavilya/IBM-data-analyst-capstone-project
Name already in use.
Use Git or checkout with SVN using the web URL.
Work fast with our official CLI. Learn more about the CLI .
- Open with GitHub Desktop
- Download ZIP
Sign In Required
Please sign in to use Codespaces.
Launching GitHub Desktop
If nothing happens, download GitHub Desktop and try again.
Launching Xcode
If nothing happens, download Xcode and try again.
Launching Visual Studio Code
Your codespace will open once ready.
There was a problem preparing your codespace, please try again.
Latest commit
Ibm data analyst capstone project.
This repository contains all the material for the Coursera course "Data Analyst" from IBM. The course is focused on teaching the skills needed to become a data analyst, including data manipulation, visualization, and analysis.
As I continue the course, I will be uploading additional materials such as code, Jupyter notebooks, and project files.
The materials in this repository are organized by module, with each module containing a set of files and resources specific to that module's content.
If you are a student in the course, please feel free to use this repository as a resource while you work on the course projects. If you are not a student in the course, you are welcome to use the materials in this repository as a learning resource, but please note that they are intended for educational purposes only and may not be used for commercial purposes.
Please feel free to contact me if you have any questions or need help with the materials in this repository.
- Jupyter Notebook 100.0%
- Online Degree Explore Bachelor’s & Master’s degrees
- MasterTrack™ Earn credit towards a Master’s degree
- University Certificates Advance your career with graduate-level learning
- Top Courses
- Join for Free

IBM Data Engineering Professional Certificate
Launch your new career in Data Engineering. Master SQL, RDBMS, ETL, Data Warehousing, NoSQL, Big Data and Spark with hands-on job-ready skills.

Instructors: Rav Ahuja +14 more
Instructors

Financial aid available
40,239 already enrolled
Professional Certificate - 13 course series
(2,963 reviews)
Recommended experience
Beginner level
Basic computer skills and a grounding in IT systems. Comfort working in either Linux, Windows, or MacOS. No prior programming or data skills needed.
What you'll learn
Create, design, and manage relational databases and apply database administration (DBA) concepts to RDBMSes such as MySQL, PostgreSQL, and IBM Db2.
Develop and execute SQL queries using SELECT, INSERT, UPDATE, DELETE statements, database functions, stored procedures, Nested Queries, and JOINs.
Demonstrate working knowledge of NoSQL & Big Data using MongoDB, Cassandra, Cloudant, Hadoop, Apache Spark, Spark SQL, Spark ML, Spark Streaming.
Implement ETL & Data Pipelines with Bash, Airflow & Kafka; architect, populate, deploy Data Warehouses; create BI reports & interactive dashboards.
Skills you'll gain
- Category: Relational Database Management Syste (RDBMS) Relational Database Management Syste (RDBMS)
- Category: ETL & Data Pipelines ETL & Data Pipelines
- Category: NoSQL and Big Data NoSQL and Big Data
- Category: Apache Spark Apache Spark
- Category: SQL SQL
Details to know

Add to your LinkedIn profile
Available in English
Subtitles: English, Arabic, Persian, Korean
See how employees at top companies are mastering in-demand skills

Prepare for a career in Data Engineering
- Receive professional-level training from IBM
- Demonstrate your proficiency in portfolio-ready projects
- Earn an employer-recognized certificate from IBM
- Qualify for in-demand job titles: Database Engineer, Data Engineer, Junior Data Engineer

Get exclusive access to career resources upon completion
Get free access to IBM’s People and Soft Skills Specialization
Improve your resume and LinkedIn with personalized feedback
Practice your skills with interactive tools and mock interviews
Plan your career move with Coursera’s job search guide
¹Lightcast™ Job Postings Report, United States, 1/1/22-12/31/22. ²Based on program graduate survey responses, United States 2021.
This Professional Certificate is for anyone who wants to develop job-ready skills, tools, and a portfolio for an entry-level data engineer position . Throughout the self-paced online courses, you will immerse yourself in the role of a data engineer and acquire the essential skills you need to work with a range of tools and databases to design, deploy, and manage structured and unstructured data .
By the end of this Professional Certificate, you will be able to explain and perform the key tasks required in a data engineering role. You will use the Python programming language and Linux/UNIX shell scripts to extract, transform and load ( ETL ) data. You will work with Relational Databases (RDBMS) and query data using SQL statements. You will use NoSQL databases and unstructured data. You will be introduced to Big Data and work with Big Data engines like Hadoop and Spark . You will gain experience with creating Data Warehouses and utilize Business Intelligence tools to analyze and extract insights.
This program does not require any prior data engineering, or programming experience.
This program is ACE® recommended—when you complete, you can earn up to 12 college credits.
Applied Learning Project
Throughout this Professional Certificate, you will complete hands-on labs and projects to help you gain practical experience with Python, SQL, relational databases, NoSQL databases, Apache Spark, building data pipelines, managing databases, and working with data warehouses. Projects:
Design a relational database to help a coffee franchise improve operations
Use SQL to query census, crime, and school demographic data sets.
Write a Bash shell script on Linux that backups changed files.
Set up, test, and optimize a data platform that contains MySQL, PostgreSQL, and IBM Db2 databases.
Analyze road traffic data to perform ETL and create a pipeline using Airflow and Kafka.
Design and implement a data warehouse for a solid-waste management company.
Move, query, and analyze data in MongoDB, Cassandra, and Cloudant NoSQL databases.
Train a machine learning model by creating an Apache Spark application.
Design, deploy, and manage an end-to-end data engineering platform.
Introduction to Data Engineering
List basic skills required for an entry-level data engineering role.
Discuss various stages and concepts in the data engineering lifecycle.
Describe and provide examples of data engineering technologies such as Relational Databases, NoSQL Data Stores, Big Data Engines, and others.
Summarize concepts in data security, governance, and compliance.
Python for Data Science, AI & Development
Describe Python Basics including Data Types, Expressions, Variables, and Data Structures.
Apply Python programming logic using Branching, Loops, Functions, Objects & Classes.
Demonstrate proficiency in using Python libraries such as Pandas, Numpy, and Beautiful Soup.
Access web data using APIs and web scraping from Python in Jupyter Notebooks.
Python Project for Data Engineering
Demonstrate your skills in Python for data engineering tasks
Implement webscraping and use APIs to collect data in Python
Assume the role of a Data Engineer working on a real project
Extract, Transform and Load (ETL) data using Jupyter notebooks
Introduction to Relational Databases (RDBMS)
Describe data, databases, relational databases, and cloud databases.
Describe information and data models, relational databases, and relational model concepts (including schemas and tables).
Explain an Entity Relationship Diagram and design a relational database for a specific use case.
Implement different relational model constraints.
Databases and SQL for Data Science with Python
Analyze data within a database using SQL and Python.
Create a relational database on Cloud and work with tables.
Write SQL statements including SELECT, INSERT, UPDATE, and DELETE.
Build more powerful queries with advanced SQL techniques like views, transactions, stored procedures and joins.
Hands-on Introduction to Linux Commands and Shell Scripting
Describe the Linux architecture and common Linux distributions and update and install software on a Linux system.
Perform common informational, file, content, navigational, compression, and networking commands in Bash shell.
Develop shell scripts using Linux commands, environment variables, pipes, and filters.
Schedule cron jobs in Linux with crontab and explain the cron syntax.
Relational Database Administration (DBA)
Create, query, and configure databases and access and build system objects such as tables.
Perform basic database management including backing up and restoring databases as well as managing user roles and permissions.
Monitor and optimize important aspects of database performance.
Troubleshoot database issues such as connectivity, login, and configuration and automate functions such as reports, notifications, and alerts.
ETL and Data Pipelines with Shell, Airflow and Kafka
Describe and contrast Extract, Transform, Load (ETL) processes and Extract, Load, Transform (ELT) processes.
Explain batch vs concurrent modes of execution.
Implement an ETL pipelinethrough shell scripting.
Describe data pipeline components, processes, tools, and technologies.
Getting Started with Data Warehousing and BI Analytics
Explore the architecture, features, and benefits of data warehouses, data marts, and data lakes and identify popular data warehouse system vendors.
Design and populate a data warehouse, and model and querydata using CUBE, ROLLUP, and materialized views.
Identify popular data analytics and business intelligence tools and vendors and create data visualizations using IBM Cognos Analytics.
Design and load data into a data warehouse, write aggregation queries, create materialized query tables, and create an analytics dashboard.

Introduction to NoSQL Databases
Differentiate between the four main categories of NoSQL repositories.
Describe the characteristics, features, benefits, limitations, and applications of the more popular Big Data processing tools.
Perform common tasks using MongoDB tasks including create, read, update, and delete (CRUD) operations.
Execute keyspace, table, and CRUD operationsin Cassandra.
Introduction to Big Data with Spark and Hadoop
Explainthe impact of Big Data including use cases, tools, and processing methods.
ExplainApache Hadoop architecture, ecosystem, and practices, and userelatedapplications including HDFS, HBase, Spark, and MapReduce.
Apply Spark programming basics, including parallel programming basics forDataFrames, data sets, and Spark SQL.
UseSpark’s RDDsanddata sets, optimizingSparkSQLusing Catalyst and Tungsten, anduseSpark’s development and runtime environment options.
Data Engineering and Machine Learning using Spark
Explain how streaming data and Spark Structured Streaming empower machine learning and AI tasks.
Define graph theory, describe Apache Spark GraphFrames, and identify data suitable for GraphFrames.
Describe how ETL processes work with Apache Spark and machine learning and extend that knowledge to Spark MLlib capabilities and related benefits.
Explain supervised learning, unsupervised learning, and clustering, and explain how to use the k-means clustering algorithm with Spark MLlib.
Data Engineering Capstone Project
Demonstrate proficiency in skills required for an entry-level data engineering role.
Design and implement various concepts and components in the data engineering lifecycle such as data repositories.
Showcase working knowledge with relational databases, NoSQL data stores, big data engines, data warehouses, and data pipelines.
Apply skills in Linux shell scripting, SQL, and Python programming languages to Data Engineering problems.

IBM is the global leader in business transformation through an open hybrid cloud platform and AI, serving clients in more than 170 countries around the world. Today 47 of the Fortune 50 Companies rely on the IBM Cloud to run their business, and IBM Watson enterprise AI is hard at work in more than 30,000 engagements. IBM is also one of the world’s most vital corporate research organizations, with 28 consecutive years of patent leadership. Above all, guided by principles for trust and transparency and support for a more inclusive society, IBM is committed to being a responsible technology innovator and a force for good in the world. For more information about IBM visit: www.ibm.com

Earn a career certificate
Add this credential to your LinkedIn profile, resume, or CV
Share it on social media and in your performance review

Get a head start on your degree
When you complete this Professional Certificate, you can earn college credit if you apply and are accepted into one of the following online degree programs.

University of North Texas
Bachelor of Applied Arts and Sciences
15+ hours of study/wk per course
Why people choose Coursera for their career

Open new doors with Coursera Plus
Unlimited access to 7,000+ world-class courses, hands-on projects, and job-ready certificate programs - all included in your subscription
Advance your career with an online degree
Earn a degree from world-class universities - 100% online
Join over 3,400 global companies that choose Coursera for Business
Upskill your employees to excel in the digital economy
Frequently asked questions
What is the refund policy.
If you subscribed, you get a 7-day free trial during which you can cancel at no penalty. After that, we don’t give refunds, but you can cancel your subscription at any time. See our full refund policy Opens in a new tab .
Can I just enroll in a single course?
Yes! To get started, click the course card that interests you and enroll. You can enroll and complete the course to earn a shareable certificate, or you can audit it to view the course materials for free. When you subscribe to a course that is part of a Certificate, you’re automatically subscribed to the full Certificate. Visit your learner dashboard to track your progress.
Is this course really 100% online? Do I need to attend any classes in person?
This course is completely online, so there’s no need to show up to a classroom in person. You can access your lectures, readings and assignments anytime and anywhere via the web or your mobile device.
How long does it take to complete the Specialization?
The Professional Certificate requires completion of 13 courses including 10 full courses, 2 mini-courses and a Capstone Project. Each full course typically contains 3-6 modules with an average effort of 2 to 4 hours per module. If learning part-time (e.g. 1 module per week), it would take about 1 year complete the entire certificate. If learning full-time (e.g. 1 module per day) the certificate can be completed in 3 to 4 months.
What background knowledge is necessary?
This Professional Certificate is open for anyone with any job and academic background. It pre-reqs basic IT literacy and knowledge of IT infrastructure and familiarity working with Windows, Linux or MacOS. No prior computer programming experience is necessary, but is an asset, as is high school math.
Do I need to take the courses in a specific order?
Yes, it is highly recommended to take the courses in the order they are listed, as they progressively build on concepts taught in previous courses.
What will I be able to do upon completing the Specialization?
You will develop practical skills using hands-on labs and projects throughout the program. By the end you will have acquired skills and knowledge to enable you to become job ready for an entry level career in Data Engineering.
Can I get college credit for taking the IBM Data Engineering Professional Certificate?
Yes. The IBM Data Engineering Professional Certificate recently secured a credit recommendation from the American Council on Education's (ACE) Credit Recommendation, which is the industry standard for translating workplace learning to college credit. Learners can earn a recommendation of 12 college credits for completing the program. This aims to help open up additional pathways to learners who are interested in higher education and prepare them for entry-level jobs.
How do you share your proof of completion with the educational institutions for transferring credit?
To share proof of completion with schools, certificate graduates will receive an email prompting them to claim their Credlybadge, which contains the ACE®️credit recommendation. Once claimed, you will receive a competency-based transcript that signifies the credit recommendation, which can be shared directly with a school from the Credly platform. Please note that the decision to accept specific credit recommendations is up to each institution and is not guaranteed.
Where can I find more information on ACE credit recommendations?
Please see Coursera’s ACE Recommendations FAQ.
More questions
Coursera footer, learn something new.
- Learn a Language
- Learn Accounting
- Learn Coding
- Learn Copywriting
- Learn Public Relations
- Boulder MS Data Science
- Illinois iMBA
- Illinois MS Computer Science
- UMich MS in Applied Data Science
Popular Certificate Categories
- Business Analytics Certificates
- Computer Science & IT Certificates
- Data Science Certificates
- DeepLearning.AI Certificates
- Google Certificates
- Google Cloud Certificates
- IBM Certificates
- Intuit Certificates
- Meta Certificates
- Salesforce Certificates
Popular Professional Certificates
- Google Data Analytics
- Google Digital Marketing & Ecommerce
- Google IT Automation with Python
- Google IT Support
- Google Project Managment
- Google UX Design
- IBM Data Analyst
- IBM Data Science
- Intuit Bookkeeping
- Meta Front-End Developer
Featured Certificate Articles
- CAPM Certification Guide
- CompTIA A+ Certification Guide
- Get Certified in Data Analytics
- Get SQL Certified
- PMI Certifications Guide
- PMP Certification Guide
- Popular Agile Certifications
- Popular Cloud Certifications
- Popular Cybersecurity Certifications
- Scrum Master Certifications
- What We Offer
- Coursera Plus
- Professional Certificates
- MasterTrack® Certificates
- For Enterprise
- For Government
- Become a Partner
- Coronavirus Response
- Free Courses
- All Courses
- Beta Testers
- Translators
- Teaching Center
- Accessibility
- Modern Slavery Statement


IMAGES
VIDEO
COMMENTS
Star 15 Code Issues Pull requests IBM Data Analyst Professional Certificate coursera-solutions coursera-answers ibm-data-analyst-professional Updated on Feb 8, 2022 Jupyter Notebook Aditya-Sahuji / IBM-Data-Analyst-Professional- Star 4 Code Issues Pull requests Capstone projects of the IBM Data Analyst Professional.
This repository contains materials for the IBM Data Analyst Capstone course from Coursera, including code, Jupyter notebooks and project files, organized by module. It's intended for educational purposes only. Contact me if you have any questions or need help with the materials. - GitHub - shavilya/IBM-data-analyst-capstone-project: This repository contains materials for the IBM Data Analyst ...
You will perform the various tasks that professional data analysts do as part of their jobs, including: - Data collection from multiple sources - Data wrangling and data preparation - Exploratory data analysis - Statistical analysis and data mining - Data visualization with different charts and plots, and - Interactive dashboard creation.
Set up, test, and optimize a data platform that contains MySQL, PostgreSQL, and IBM Db2 databases. Analyze road traffic data to perform ETL and create a pipeline using Airflow and Kafka. Design and implement a data warehouse for a solid-waste management company. Move, query, and analyze data in MongoDB, Cassandra, and Cloudant NoSQL databases.