Replication Strategies in Cassandra
Cassandra Replication Strategy Diagram — SimpleStrategy vs NetworkTopologyStrategy Overview
Overview
In a distributed database like Apache Cassandra, data replication ensures high availability, fault tolerance, and data consistency across multiple nodes in the cluster.
Replication strategies define how and where copies of data (replicas) are stored.
Understanding these strategies is crucial for designing a reliable and scalable Cassandra architecture.
This article explains SimpleStrategy and NetworkTopologyStrategy, how they work, and when to use each.
Replication in Cassandra refers to storing multiple copies of data across different nodes in the cluster to ensure no data is lost if a node fails.
The Replication Factor (RF) determines how many copies of the data are maintained.
If RF = 3 → Cassandra keeps 3 copies of each row on 3 different nodes.
Term | Description |
---|---|
Replication Factor (RF) | Number of data copies in the cluster. |
Replica | Each copy of data stored on a node. |
Replication Strategy | The algorithm Cassandra uses to decide where replicas are stored. |
Data Center (DC) | Logical grouping of nodes (often based on physical or cloud region). |
Cassandra supports two main replication strategies:
SimpleStrategy
NetworkTopologyStrategy
SimpleStrategy
is used for single data center deployments or development environments.
It places replicas in a clockwise direction around the ring, starting from the node responsible for the data.
CREATE KEYSPACE my_keyspace
WITH replication = {
'class': 'SimpleStrategy',
'replication_factor': 3
};
In this case:
The first replica is placed on the node determined by the partition key’s hash.
The next two replicas are stored on the next two nodes in the ring.
Best for single data center setups.
Suitable for testing or local development environments.
Not aware of data center or rack topology.
Should not be used in production across multiple data centers.
NetworkTopologyStrategy
is the recommended replication strategy for production environments and multi-data-center clusters.
It allows you to define different replication factors per data center, providing fine-grained control over fault tolerance and availability.
CREATE KEYSPACE my_keyspace
WITH replication = {
'class': 'NetworkTopologyStrategy',
'US_EAST': 3,
'EU_WEST': 2
};
Here:
3 replicas are stored in the US_EAST
data center.
2 replicas are stored in the EU_WEST
data center.
Ideal for multi-region or multi-data-center deployments.
Ensures data locality (queries served from nearest DC).
Maintains fault tolerance across geographical zones.
Each data center uses its own replication factor and ensures replicas are distributed across different racks (physical machines or availability zones) to minimize the impact of hardware failure.
Cassandra’s snitch component provides the topology information (rack and data center locations).
Imagine a global application hosted in two data centers — one in India and one in the US.
CREATE KEYSPACE user_data
WITH replication = {
'class': 'NetworkTopologyStrategy',
'India_DC': 3,
'US_DC': 3
};
✅ Advantages:
If India_DC fails, users are automatically served from US_DC.
Local queries (from India) are faster since data is available nearby.
Feature | SimpleStrategy | NetworkTopologyStrategy |
---|---|---|
Use Case | Single Data Center | Multi Data Center |
Rack Awareness | ❌ No | ✅ Yes |
Performance | Faster (for single DC) | Optimized for local queries |
Fault Tolerance | Limited | High (cross-DC) |
Best For | Development, Testing | Production, Global Apps |
Environment | Recommended Strategy | Reason |
---|---|---|
Local / Dev | SimpleStrategy | Easy setup |
Production (1 DC) | NetworkTopologyStrategy | Rack awareness |
Production (Multiple DCs) | NetworkTopologyStrategy | Geo-redundancy |
You can change replication settings for an existing keyspace:
ALTER KEYSPACE my_keyspace
WITH replication = {
'class': 'NetworkTopologyStrategy',
'Asia_DC': 3,
'Europe_DC': 2
};
And then run repair:
nodetool repair my_keyspace
to synchronize the new replicas.
Always use NetworkTopologyStrategy in production.
Choose replication factor ≥ 3 for fault tolerance.
Distribute replicas across different racks.
Avoid SimpleStrategy in multi-data-center setups.
Re-run nodetool repair
after changing replication settings.
Environment | Recommended RF | Notes |
---|---|---|
Dev/Test | 1–2 | Not fault-tolerant |
Production (Single DC) | 3 | Standard for high availability |
Production (Multi-DC) | 3 per DC | Ensures global redundancy |
Replication is the backbone of Cassandra’s fault tolerance and scalability.
Choosing the right strategy ensures data durability, low latency, and high availability.
Use SimpleStrategy for single data centers or testing.
Use NetworkTopologyStrategy for production, multi-DC setups.
Always tune replication factor and snitch settings according to your topology.
By carefully planning replication, you ensure Cassandra continues to deliver the high availability and reliability it’s known for.