Azure Databricks for Data Engineers Online Course
This course takes you through the Azure Databricks platform, starting with the basics of data engineering and Apache Spark integration before guiding you in setting up an Azure account and Databricks workspace. Through hands-on practice, you’ll create Spark clusters, work with notebooks, and manage data using DBFS. You’ll also explore Unity Catalog for secure data management, Delta Lake for processing, and Delta Live Tables for building scalable pipelines. Finally, you’ll learn to automate workflows using Databricks Repos, Workflows, REST API, and CLI, equipping you with the skills to streamline and scale data projects.
Who should take this course?
This course is designed for cloud engineers, DevOps professionals, software developers, and solution architects who want to master Infrastructure as Code (IaC) using AWS Cloud Development Kit. It’s also well-suited for anyone aiming to automate AWS infrastructure deployment and improve scalability, reliability, and efficiency in cloud projects.
What you will learn
- Create and manage Databricks workspaces and Spark clusters
- Utilize Databricks notebooks and magic commands for efficient data processing
- Implement secure data management with Unity Catalog
- Perform advanced data operations with Delta Lake and Delta Tables
- Automate data workflows using Databricks Repos, Workflows, REST API, and CLI
- Execute a comprehensive data engineering project from start to finish
Course Outline
Before you start
- Course Prerequisites
- About the Course
- How to access Course Material and Resources
- Note for Students - Before Start
Introduction
- Introduction to Data Engineering
- Apache Spark to Data Engineering Platform
- Introduction to Databricks Platform
Getting Started
- What will you learn in this section
- Creating Azure Cloud Account
- Azure Portal Overview
- Creating Databricks Workspace Service
- Introduction to Databricks Workspace
- Azure Databricks Platform Architecture
Working in Databricks Workspace
- What will you learn in this section
- How to create Spark Cluster
- Working with Databricks Notebook
- Notebook Magic Commands
- Databricks Utilities Package
Working with Databricks File System - DBFS
- What will you learn in this section
- Introduction to DBFS
- Working with DBFS Root
- Mounting ADLS to DBFS
Working with Unity Catalog
- What will you learn in this section
- Introduction to Unity Catalog
- Setup Unity Catalog
- Unity Catalog User Provisioning
- Working with Securable Objects
Working with Delta Lake and Delta Tables
- What will you learn in this section
- Introduction to Delta Lake
- Creating Delta Table
- Sharing data for External Delta Table
- Reading Delta Table
- Delta Table Operations
- Delta Table Time Travel
- Convert Parquet to Delta
- Delta Table Schema Validation
- Delta Table Schema Evolution
- Look Inside Delta Table
- Delta Table Utilities and Optimization
Working with Databricks Incremental Ingestion Tools
- What will you learn in this section
- Architecture and Need for Incremental Ingestion
- Using Copy Into with Manual Schema Evolution
- Using Copy Into with Automatic Schema Evolution
- Streaming Ingestion with Manual Schema Evolution
- Streaming Ingestion with Automatic Schema Evolution
- Introduction to Databricks Autoloader
- Autoloader with Automatic Schema Evolution
Working with Databricks Delta Live Tables (DLT)
- What will you learn in this section
- Introduction to Databricks DLT
- Understand DLT Use Case Scenario
- Setup DLT Scenario Dataset
- Creating DLT Workload in SQL
- Creating DLT Pipeline for your Workload
- Creating DLT Workload in Python
Databricks Project and Automation Features
- What will you learn in this section
- Working with Databricks Repos
- Working with Databricks Workflows
- Working with Databricks Rest API
- Working with Databricks CLI
Capstone Project
- Project Scope and Background
- Taking out the operational requirement
- Storage Design
- Implement Data Security
- Implement Resource Policies
- Decouple Data Ingestion
- Design Bronze Layer
- Design Silver and Gold Layer
- Setup your environment
- Create a workspace
- Create and Storage Layer
- Setup Unity Catalog
- Create Metadata Catalog and External Locations
- Setup your source control
- Start Coding
- Test your code
- Load historical data
- Ingest into bronze layer
- Process the silver layer
- Handling multiple updates
- Implementing Gold Layer
- Creating a run script
- Preparing for Integration testing
- Creating Test Data Producer
- Creating Integration Test for Batch mode
- Creating Integration Test for Stream mode
- Implementing CI CD Pipeline
- Develop Build Pipeline
- Develop Release Pipeline
- Creating Databricks CLI Script
Final Word