Token assignment
In Cassandra terminology, the start of the hash range is called a token, and until version 1.2, each node was assigned a single token in the manner discussed in the previous section. Version 1.2 introduced the option to use vnodes, as the feature is officially termed. vnodes became the default option in the 2.0 release.
Cassandra determines where to place data using the tokens assigned to each node. Nodes learn about these token assignments via gossip. Additional replicas are then placed based on the configured replication strategy and snitch. More details about replica placement can be found in Chapter 3, Replication.
Manually assigned tokens
If you're running a version prior to 1.2 or if you have chosen not to use vnodes, you will have to assign tokens manually. This is accomplished by setting the initial_token
in cassandra.yaml
.
Manual token assignment introduces a number of potential issues:
- Adding and removing nodes: When the size of the ring changes, all tokens must be recomputed and then assigned to their nodes using
nodetool move
. This causes a significant amount of administrative overhead for a large cluster. - Node rebuilds: In case of a node rebuild, only a few nodes can participate in bootstrapping the replacement, leading to significant service degradation. We'll discuss this in detail later in this chapter.
- Hotspots: In some cases, the relatively large range assigned to each node can cause hotspots if data is not evenly distributed.
- Heterogeneous clusters: With every node assigned a single token, the expectation is that all nodes will hold the same amount of data. Attempting to subdivide ranges to deal with nodes of varying sizes is a difficult and error-prone task.
Because of these issues, the use of vnodes is highly recommended for any new installation. For existing installations, migrating to vnodes will improve the performance, reliability, and administrative requirements of your cluster, especially during topology changes and failure scenarios.
If you must continue to manually assign tokens, make sure to set the correct value for initial_token
whenever any nodes are added or removed. Failure to do so will almost always result in an unbalanced ring. For information about how to generate tokens, refer to the DataStax documentation at http://www.datastax.com/documentation/cassandra/1.2/cassandra/configuration/configGenTokens_c.html.
You can then use the values you generate as the initial_token
settings for your nodes, with each node getting one of the values. It's best to always assign your tokens to the nodes in the same order to avoid unnecessary shuffling of data.
vnodes
The concept behind vnodes is straightforward. Instead of a single token assigned to each node, it is now possible to specify the number of tokens using the num_tokens
configuration property in cassandra.yaml
. The default value is 256, which is sufficient for most use cases.
The following diagram illustrates a cluster without vnodes compared to one with vnodes enabled:
In the preceding diagram, each numbered node is represented as a slice of the ring, where the tokens are represented as letters. Note that tokens are assigned randomly. Remember that the letters represent ranges of data. You'll notice that there are more ranges than nodes after enabling vnodes, and each node now owns multiple ranges.
While technically the cluster remains available during topology changes and node rebuilds, the level of degraded service has the potential to impact availability if the system remains under significant load. vnodes offer a simple solution to the problems associated with manually assigned tokens. Let's examine the reasons why this is the case.
There are many reasons to change the size of a cluster. Perhaps you're increasing capacity for an anticipated growth in data or transaction volume, or maybe you're adding a data center for increased availability.
Considering that the objective is to handle greater load or provide additional redundancy, any significant performance degradation while adding or bootstrapping a new node is unacceptable as it counteracts these goals. Often in modern high-scale applications, slow is the same as unavailable. Equally important is to ensure that new nodes receive a balanced share of the data.
vnodes improve the bootstrapping process substantially because:
- More nodes can participate in data transfer: Since the token ranges are more dispersed throughout the cluster, adding a new node involves ranges from a greater number of the existing nodes. As a result, machines involved in the transfer end up under less load than without vnodes, thus increasing availability of those ranges.
- Token assignment is automatic: Cassandra handles the allocation of tokens, so there's no need to manually recalculate and reassign a new token for every node in the cluster. As a result, the ring becomes naturally balanced on its own.
Rebuilding a node is a relatively common operation in a large cluster, as nodes will fail for a variety of reasons. Cassandra provides a mechanism to automatically rebuild a failed node using replicated data.
When each node owns only a single token, that node's entire data set is replicated to a number of nodes equal to the replication factor minus one. For example, with a replication factor of three, all the data on a given node will be replicated to two other nodes (replication will be covered in detail in Chapter 3, Replication). However, Cassandra will only use one replica in the rebuild operation.
So in this case, a rebuild operation involves three nodes, placing a high load on all three. Imagine that we have a six-node cluster, and node 2 has failed, requiring a rebuild. In the following diagram, note that each node only contains replicas for three tokens, preventing two of the nodes from participating in the rebuild:
In the rebuilding of node 2, only nodes 1, 3, and 4 can participate because they contain the required replicas. We can assume that reads and writes continue during this process. With one node down and three working hard to rebuild it, we now have only two out of six nodes operating at full capacity! Even worse, token ranges A and B reside entirely on nodes that are being taxed by this process, which can result in overburdening the entire cluster due to slow response times for these operations.
vnodes provide significant benefits over manual token management for the rebuild process, as they distribute the load over many more nodes. This concept is the same as the benefit gained during the bootstrapping process. Since each node contains replicas for a larger (and random) variety of the available tokens, Cassandra can use these replicas in the rebuild process. Consider the following diagram of the same rebuild using vnodes:
With vnodes, all nodes can participate in rebuilding node 2 because the tokens are spread more evenly across the cluster. In the preceding diagram, you can see that rebuilding node 2 now involves the entire cluster, thus distributing the workload more evenly. This means each individual node is doing less work than without vnodes, resulting in greater operational stability.
While it might be straightforward to initially build your Cassandra cluster with machines that are all identical, at some point older machines will need to be replaced with newer ones. This can create issues while manually assigning tokens since it can become difficult to effectively choose the right tokens to produce a balanced result. This is especially problematic when adding or removing nodes, as it would become necessary to recompute the tokens to achieve a proper balance.
vnodes ease this effort by allowing you to specify a number of tokens, instead of having to determine specific ranges. It is much easier to choose a proportionally larger number for newer, more powerful nodes than it is to determine proper token ranges.