Introduction to Keyspaces and Tables in Apache Cassandra
diagram of Cassandra keyspace and Cassandra tables
Apache Cassandra is a powerful NoSQL distributed database designed for high scalability, fault tolerance, and high availability. To efficiently manage and organize data, Cassandra uses two main structures — Keyspaces and Tables.
In this guide, we’ll explore what keyspaces and tables are, their structure, and demonstrate their use with a practical “Developer” table example.
A keyspace in Cassandra is the top-level container that defines how data is stored and replicated across the cluster. It’s similar to a database in relational systems (like MySQL or PostgreSQL).
Each keyspace contains one or more tables, and its settings determine how the data inside those tables is distributed and replicated.
Replication Strategy – Defines how data is copied across nodes.
Replication Factor – Number of data copies maintained for redundancy.
Durable Writes – Ensures data is written to disk before confirming success.
You can create a keyspace in CQL (Cassandra Query Language) as follows:
CREATE KEYSPACE company_data
WITH REPLICATION = {
'class': 'SimpleStrategy',
'replication_factor': 3
};
✅ This command:
Creates a keyspace named company_data
Uses the SimpleStrategy (for single data center)
Replicates each piece of data three times for fault tolerance
To use the keyspace:
USE company_data;
Cassandra supports two key replication strategies:
SimpleStrategy – Suitable for single data center deployments.
WITH REPLICATION = {'class': 'SimpleStrategy', 'replication_factor': 3};
NetworkTopologyStrategy – Recommended for multi-data center clusters.
WITH REPLICATION = {
'class': 'NetworkTopologyStrategy',
'DC1': 3,
'DC2': 2
};
A table is where Cassandra stores actual data. Unlike relational databases, Cassandra tables are denormalized and optimized for fast reads and writes based on access patterns.
Each table is defined by:
Primary Key (composed of partition and clustering keys)
Columns for storing values
Schema optimized for scalability and distributed storage
Now that the keyspace company_data
is created, let’s define a table for storing developer information.
CREATE TABLE developer (
developer_id UUID PRIMARY KEY,
name TEXT,
email TEXT,
programming_language TEXT,
experience_years INT,
city TEXT
);
developer_id
– Unique identifier for each developer (Primary Key)
name
, email
, city
– Basic developer information
programming_language
– Primary skill or language
experience_years
– Years of professional experience
Let’s insert some developer records:
INSERT INTO developer (developer_id, name, email, programming_language, experience_years, city)
VALUES (uuid(), 'Alice Johnson', '[email protected]', 'Python', 5, 'New York');
INSERT INTO developer (developer_id, name, email, programming_language, experience_years, city)
VALUES (uuid(), 'Rahul Sharma', '[email protected]', 'Java', 7, 'Bangalore');
INSERT INTO developer (developer_id, name, email, programming_language, experience_years, city)
VALUES (uuid(), 'Maria Lopez', '[email protected]', 'C++', 4, 'London');
You can fetch data from the developer table using:
SELECT * FROM developer;
Sample Output:
developer_id | name | programming_language | experience_years | city | |
---|---|---|---|---|---|
11e1... | Alice Johnson | [email protected] | Python | 5 | New York |
22f2... | Rahul Sharma | [email protected] | Java | 7 | Bangalore |
33g3... | Maria Lopez | [email protected] | C++ | 4 | London |
In Cassandra, the Primary Key determines how data is distributed across nodes.
Partition Key: Decides which node stores the data.
Clustering Columns: Sort data within a partition.
Example:
CREATE TABLE project (
project_id UUID,
developer_id UUID,
project_name TEXT,
start_date DATE,
PRIMARY KEY (developer_id, project_id)
);
Here:
developer_id
is the partition key
project_id
is the clustering column
This setup groups all projects by developer, sorted by project ID.
Use one keyspace per application for clarity.
Choose replication factor ≥ 3 for high availability.
Design tables for queries, not normalization.
Avoid large partitions — distribute data evenly.
Use UUIDs for unique identifiers.
In Cassandra, keyspaces define how data is replicated, and tables define how it’s stored and queried.
Using examples like the developer table, you can see how Cassandra structures data for performance, scalability, and availability.
By designing well-structured keyspaces and tables, you can build applications that handle massive amounts of data efficiently — perfect for real-world, enterprise-grade systems.