AWS DATA ENGINEERING

Master AWS Data Engineering

Module 1: AWS Data Engineering Introduction

  • What is Data Engineering
  • Different Data Engineering Technologies
  • Introduction of Python in AWS Data Engineering
  • Introduction of SQL in AWS Data Engineering
  • Introduction of Linux in AWS Data Engineering

Module 2: AWS Basic Services for AWS Data Engineering

  • Cloud Computing Introduction
  • AWS Fundamentals
  • AWS EC2
  • AWS IAM
  • AWS VPC

Module 3: Linux for AWS Data Engineering

  • Linux Introduction
  • Linux Filesystem Architecture
  • Linux Installation on EC2 Instance
  • Connecting Linux Machine
  • Basic Linux Commands
  • Linux File & Directory Permissions
  • Linux Filter Commands

Module 4: SQL for AWS Data Engineering

  • SQL Introduction
  • Provisioning MySQL using RDS
  • Create, Alter, Drop Database
  • Create, Alter, Insert tables
  • Key Constraints
  • Select Queries
  • Joins

Module 5: Python for AWS Data Engineering

  • Python Introduction
  • Python Programming Introduction Interactive Mode Development & Script Mode Development
  • Python Installation on Windows Operating System & First Python Program using Python Shell
  • Anaconda Python Installation and First Python Application using Jupyter Notebook & Spider IDE
  • Python Editors & IDE Software’s Installation & First Python Application using Editors & IDE’s – Notepad++, PyCharm IDE, VSCode, etc..
  • Python Program Indentation rules & Examples
  • Language Fundamentals with Different Examples (Identifiers, Keywords, Datatypes & Many More…)
  • Flow Control Statements with Different Examples (if, if-else, for, while, break, continue, etc..)
  • Collection Datatypes with Different Examples
  • Modules, Packages & Libraries with Different Example
  • Procedure Oriented Concepts like Functions & Lambda Functions with Different Examples
  • Object Oriented Concepts like Class & Object with Different Examples

 

Module 6: PySpark for AWS Data Engineering

  • PySpark Foundation
  • PySpark Core Programming – RDD Programming, Transformations & Actions
  • PySpark SQL – DataFrames, Tables, DSL & Native SQL
  • PySpark Streaming Programming
  • PySpark Databases Integration
  • PySpark S3 Integration

 

Module 7: Storage with AWS S3

  • AWS S3 Introduction
  • AWS S3 Bucket using AWS Web Console
  • Create AWS S3 Bucket
  • Setup Data Set locally to upload into AWS s3
  • Adding AWS S3 Buckets and Objects using AWS Web Console
  • Version Control of AWS S3 Objects or Files
  • AWS S3 Cross-Region Replication for fault tolerance
  • Overview of AWS S3 Storage Classes or Storage Tiers
  • Overview of Glacier in AWS s3
  • Managing AWS S3 buckets and objects using AWS CLI
  • AWS S3 Integration with PySpark

 

Module 8: Data Processing with AWS Glue

  • AWS Glue Components
  • Create Crawler and Catalog Table
  • Create and Run the Glue Job
  • Create and Run Glue Trigger
  • Create Glue Workflow
  • Run Glue Workflow and Validate
  • Setup Spark History Server on AWS

 

  • Build Glue Spark UI Container
  • Update IAM Policy Permissions
  • Start Glue Spark UI Container
  • Steps for Creating Catalog Tables
  • Create Glue Catalog Database
  • Crawling Multiple Folders
  • Crawling Multiple Folders
  • Managing Glue Catalog

 

Module 9: Querying with AWS Athena

  • Amazon Athena Introduction
  • Glue Catalog Databases and Tables
  • Access Glue Catalog Databases and Tables using Athena Query Editor
  • Create Database and Table using Athena
  • Populate Data into Table using Athena
  • Using CTAS to create tables using Athena
  • Amazon Athena Architecture
  • Amazon Athena Resources and relationship with Hive
  • Create Partitioned Table using Athena
  • Develop Query for Partitioned Column
  • Insert into Partitioned Tables using Athena
  • Validate Data Partitioning using Athena
  • Drop Athena Tables and Delete Data Files
  • Drop Partitioned Table using Athena
  • Data Partitioning in Athena using CTAS

 

Module 10: Real-time Data Processing with AWS Kinesis

  • Building Streaming Pipeline using Kinesis
  • Rotating Logs
  • Setup Kinesis Firehose Agent
  • Create Kinesis Firehose Delivery Stream
  • Planning the Pipeline
  • Create IAM Group and User
  • Granting Permissions to IAM User using Policy
  • Configure Kinesis Firehose Agent
  • Start and Validate Agent
  • Building Simple Steaming Pipeline by Integrating PySpark
  • Setup Project for local development
  • Deploy Project to AWS Lambda console
  • Develop download functionality using requests
  • Using 3rd party libraries in AWS Lambda
  • Validating s3 access for local development
  • Develop upload functionality to s3
  • Validating using AWS Lambda Console
  • Run using AWS Lambda Console
  • Validating files incrementally
  • Reading and Writing Bookmark using s3
  • Maintaining Bookmark using s3
  • Review the incremental upload logic
  • Deploying lambda function
  • Schedule Lambda Function using AWS Event Bridge

 

Module 12: Data Warehousing with Amazon Redshift

  • Amazon Redshift – Introduction
  • Create Redshift Cluster using Free Trial
  • Connecting to Database using Redshift Query Editor
  • Get list of tables querying information schema
  • Run Queries against Redshift Tables using Query Editor
  • Create Redshift Table
  • CRUD Operations
  • Insert Data into Redshift Tables
  • Update Data in Redshift Tables
  • Delete data from Redshift tables
  • Redshift Saved Queries using Query Editor
  • Deleting Redshift Cluster
  • Copy Data from s3 to Redshift – Introduction
  • Setup Data in s3 for Redshift Copy
  • Copy Database and Table for Redshift Copy Command
  • Run Copy Command to copy data from s3 to Redshift Table
  • Copy JSON Data from s3 to Redshift table using IAM Role
  • Redshift Architecture
  • Create multi-node Redshift Cluster
  • Connect to Redshift Cluster using Query Editor
  • Create Redshift Database
  • Create Redshift Database User
  • Create Redshift Database Schema
  • Integrating PySpark with Redshift
  • Representing data visually for maximum impact
  • Benefits of data visualization
  • Popular uses of data visualizations
  • Understanding Amazon Quick Sight’s core concepts
  • Standard versus enterprise edition
  • SPICE – the in-memory storage and computation engine for Quick Sight
  • Ingesting and preparing data from a variety of sources
  • Preparing datasets in Quick Sight versus performing ETL outside of Quick Sight
  • Creating and sharing visuals with Quick Sight analyses and dashboards
  • Visual types in Amazon Quick Sight

 

Module 14: Workflow Orchestration with AWS Step Functions

  • Introduction to AWS Step Functions
  • Setting Up Step Functions on AWS
  • Core Concepts: States, Tasks, and State Machines
  • Integrating with AWS Services
  • Developing & Deploying Step Functions Workflows
  • Monitoring, Logging, and Error Handling
  • Best Practices for Workflow Design
  • End-to-End Serverless Data Pipeline with Step Functions
  • Scaling, Optimization & Cost Management
  • Hands-on Project: Orchestrate a Real-Time ETL Pipeline using Step Functions

 

Module 15: End to End Realtime Project

  • End to End Project Work
  • Resume Guidance
  • Interview FAQs
  • Mock Interview (On-Request)