Master AWS Data Engineering
Module 1: AWS Data Engineering Introduction
- What is Data Engineering
- Different Data Engineering Technologies
- Introduction of Python in AWS Data Engineering
- Introduction of SQL in AWS Data Engineering
- Introduction of Linux in AWS Data Engineering
Module 2: AWS Basic Services for AWS Data Engineering
- Cloud Computing Introduction
- AWS Fundamentals
- AWS EC2
- AWS IAM
- AWS VPC
Module 3: Linux for AWS Data Engineering
- Linux Introduction
- Linux Filesystem Architecture
- Linux Installation on EC2 Instance
- Connecting Linux Machine
- Basic Linux Commands
- Linux File & Directory Permissions
- Linux Filter Commands
Module 4: SQL for AWS Data Engineering
- SQL Introduction
- Provisioning MySQL using RDS
- Create, Alter, Drop Database
- Create, Alter, Insert tables
- Key Constraints
- Select Queries
- Joins
Module 5: Python for AWS Data Engineering
- Python Introduction
- Python Programming Introduction Interactive Mode Development & Script Mode Development
- Python Installation on Windows Operating System & First Python Program using Python Shell
- Anaconda Python Installation and First Python Application using Jupyter Notebook & Spider IDE
- Python Editors & IDE Software’s Installation & First Python Application using Editors & IDE’s – Notepad++, PyCharm IDE, VSCode, etc..
- Python Program Indentation rules & Examples
- Language Fundamentals with Different Examples (Identifiers, Keywords, Datatypes & Many More…)
- Flow Control Statements with Different Examples (if, if-else, for, while, break, continue, etc..)
- Collection Datatypes with Different Examples
- Modules, Packages & Libraries with Different Example
- Procedure Oriented Concepts like Functions & Lambda Functions with Different Examples
- Object Oriented Concepts like Class & Object with Different Examples
Â
Module 6: PySpark for AWS Data Engineering
- PySpark Foundation
- PySpark Core Programming – RDD Programming, Transformations & Actions
- PySpark SQL – DataFrames, Tables, DSL & Native SQL
- PySpark Streaming Programming
- PySpark Databases Integration
- PySpark S3 Integration
Â
Module 7: Storage with AWS S3
- AWS S3 Introduction
- AWS S3 Bucket using AWS Web Console
- Create AWS S3 Bucket
- Setup Data Set locally to upload into AWS s3
- Adding AWS S3 Buckets and Objects using AWS Web Console
- Version Control of AWS S3 Objects or Files
- AWS S3 Cross-Region Replication for fault tolerance
- Overview of AWS S3 Storage Classes or Storage Tiers
- Overview of Glacier in AWS s3
- Managing AWS S3 buckets and objects using AWS CLI
- AWS S3 Integration with PySpark
Â
Module 8: Data Processing with AWS Glue
- AWS Glue Components
- Create Crawler and Catalog Table
- Create and Run the Glue Job
- Create and Run Glue Trigger
- Create Glue Workflow
- Run Glue Workflow and Validate
- Setup Spark History Server on AWS
Â
- Build Glue Spark UI Container
- Update IAM Policy Permissions
- Start Glue Spark UI Container
- Steps for Creating Catalog Tables
- Create Glue Catalog Database
- Crawling Multiple Folders
- Crawling Multiple Folders
- Managing Glue Catalog
Â
Module 9: Querying with AWS Athena
- Amazon Athena Introduction
- Glue Catalog Databases and Tables
- Access Glue Catalog Databases and Tables using Athena Query Editor
- Create Database and Table using Athena
- Populate Data into Table using Athena
- Using CTAS to create tables using Athena
- Amazon Athena Architecture
- Amazon Athena Resources and relationship with Hive
- Create Partitioned Table using Athena
- Develop Query for Partitioned Column
- Insert into Partitioned Tables using Athena
- Validate Data Partitioning using Athena
- Drop Athena Tables and Delete Data Files
- Drop Partitioned Table using Athena
- Data Partitioning in Athena using CTAS
Â
Module 10: Real-time Data Processing with AWS Kinesis
- Building Streaming Pipeline using Kinesis
- Rotating Logs
- Setup Kinesis Firehose Agent
- Create Kinesis Firehose Delivery Stream
- Planning the Pipeline
- Create IAM Group and User
- Granting Permissions to IAM User using Policy
- Configure Kinesis Firehose Agent
- Start and Validate Agent
- Building Simple Steaming Pipeline by Integrating PySpark
- Setup Project for local development
- Deploy Project to AWS Lambda console
- Develop download functionality using requests
- Using 3rd party libraries in AWS Lambda
- Validating s3 access for local development
- Develop upload functionality to s3
- Validating using AWS Lambda Console
- Run using AWS Lambda Console
- Validating files incrementally
- Reading and Writing Bookmark using s3
- Maintaining Bookmark using s3
- Review the incremental upload logic
- Deploying lambda function
- Schedule Lambda Function using AWS Event Bridge
Â
Module 12: Data Warehousing with Amazon Redshift
- Amazon Redshift – Introduction
- Create Redshift Cluster using Free Trial
- Connecting to Database using Redshift Query Editor
- Get list of tables querying information schema
- Run Queries against Redshift Tables using Query Editor
- Create Redshift Table
- CRUD Operations
- Insert Data into Redshift Tables
- Update Data in Redshift Tables
- Delete data from Redshift tables
- Redshift Saved Queries using Query Editor
- Deleting Redshift Cluster
- Copy Data from s3 to Redshift – Introduction
- Setup Data in s3 for Redshift Copy
- Copy Database and Table for Redshift Copy Command
- Run Copy Command to copy data from s3 to Redshift Table
- Copy JSON Data from s3 to Redshift table using IAM Role
- Redshift Architecture
- Create multi-node Redshift Cluster
- Connect to Redshift Cluster using Query Editor
- Create Redshift Database
- Create Redshift Database User
- Create Redshift Database Schema
- Integrating PySpark with Redshift
- Representing data visually for maximum impact
- Benefits of data visualization
- Popular uses of data visualizations
- Understanding Amazon Quick Sight’s core concepts
- Standard versus enterprise edition
- SPICE – the in-memory storage and computation engine for Quick Sight
- Ingesting and preparing data from a variety of sources
- Preparing datasets in Quick Sight versus performing ETL outside of Quick Sight
- Creating and sharing visuals with Quick Sight analyses and dashboards
- Visual types in Amazon Quick Sight
Â
Module 14: Workflow Orchestration with AWS Step Functions
- Introduction to AWS Step Functions
- Setting Up Step Functions on AWS
- Core Concepts: States, Tasks, and State Machines
- Integrating with AWS Services
- Developing & Deploying Step Functions Workflows
- Monitoring, Logging, and Error Handling
- Best Practices for Workflow Design
- End-to-End Serverless Data Pipeline with Step Functions
- Scaling, Optimization & Cost Management
- Hands-on Project: Orchestrate a Real-Time ETL Pipeline using Step Functions
Â
Module 15: End to End Realtime Project
- End to End Project Work
- Resume Guidance
- Interview FAQs
- Mock Interview (On-Request)
