Apache Cassandra Online Course
Apache Cassandra is a powerful NoSQL database known for its decentralized architecture, high fault tolerance, scalability, and cost-efficiency—making it a key technology in modern cloud-based systems. With recent enhancements in security, Cassandra is now well-suited for enterprise-level applications.
In this course, you’ll explore how Cassandra addresses the limitations of traditional relational databases when it comes to handling large-scale, high-performance environments. You’ll get acquainted with core Cassandra concepts, key terminologies, and the roles of various components within the system.
You’ll then dive into building a multi-node Cassandra cluster, gaining insights into the responsibilities of each component and understanding the complete data flow during operations where speed, accuracy, and reliability are essential.
Who should take this Course?
The Apache Cassandra Online Course is ideal for database administrators, backend developers, data engineers, and system architects who want to learn how to work with highly scalable, distributed NoSQL databases. It’s also suitable for IT professionals handling large volumes of data and looking to build fault-tolerant applications. A basic understanding of databases, data modeling, and SQL is recommended for a smoother learning experience.
Course Table of Contents
Introduction to Cassandra
- The Course Overview
- What Is Apache Cassandra?
- Key Space, Table Schema, Partition Key, and Clustering Key
- Start a Single Node Cassandra Database
- Introduction to Cqlsh Command Line Client
- Loading and Reading Data
Cassandra Distributed Architecture
- Node and Ring Structure
- Replication and Consistency Model
- Racks and Datacenters
- CAP Theorem
- Gossip
- Read Repair, Hinted Handoff
Diagnostics
- Understanding Files in the Data Directory
- Use Nodetool to Examine Performance Statistics
- System and Output Logs
- JMX to Monitor Metrics
- Choosing the Appropriate Compaction Strategy
Data Modelling Principles
- Primary Key and Cluster Ordering
- Denormalization and Design for the Read Performance
- Optimizing for BlindWrites
Data Modelling in Cassandra
- Collection Types
- Static Columns
- Indexes, Materialized Views
- Data Aggregation
- compareAndSet
- Counter Type
Optimization of Data
- The Impact of Frequent Updates and Delete
- Wide Rows and Primary Key Considerations
- Load Testing with CQL Stress
- Logged and Unlogged Batching
Integrating Cassandra Database with Your Application
- A Maven Project Using the Java Driver
- Connection Information for the Driver
- Basic Statements
- Using Prepared Statements
- Understanding Errors
Overview of Apache Spark
- A What Is Apache Spark and Spark Architecture
- Get Started with Spark
- Working with Spark’s Data Structures – RDD, Data Frame, and Dataset
- Setting Up the Spark Connector
Connecting Spark with Cassandra
- Writing Data to Cassandra from Spark
- Reading Data from Cassandra Using Spark RDD
- Join, Aggregate Data Using Spark Data Frame API and Spark SQL
- Cassandra Aware Partitioning in Spark
Integrate Cassandra with Spark Streaming
- Use Cases for Near Real Time Stream Processing Using Spark Streaming
- Advanced Stream Receiver Using Kafka Connectors
- Stateless and Stateful Transformations
- Persistence of Live Stream on to Cassandra