Consistent hashing – Knowledge and References

Explore chapters and articles related to this topic

Distributed Consensus

Published in Satya Prakash Yadav, Dharmendra Prasad Mahato, Nguyen Thi Dieu Linh, Distributed Artificial Intelligence, 2020

Satya Prakash Yadav

The distributed hash tables use the concept of consistent hashing. The reason for this is the leader agent requires that data is uniformly distributed over the network. A key goes through a randomization a function made with the help of the hash algorithm. The idea for this approach is equal opportunity to nodes for being chosen to store the key and value pair. The complete data is not stored on the leader agent, therefore to access a node that has the key/pair value; a node needs a routing layer. The routing table and its construction, how the nodes join and leave the network is different from the algorithm of leader agent. The routing layer will store less information about the node on the network. Therefore, the routing information has to be spread across a small number of peers. Since less information is stored on the routing table, therefore, the multiple nodes have to check to find the key. The common complexity of lookup is O (log n) where n is the total nodes in the network. This means if there are a million nodes then 20 would need to be found.

Containers and Microservices

View Chapter

Purchase Book

Published in Haishi Bai, Zen of Cloud, 2019

Haishi Bai

Partitions can be either static or dynamic. Static partitioning requires service operators to carefully plan for system capacity before services are deployed, because once a service is deployed with a fixed set of partition, it can't be rescaled to use more or less partitions. Dynamic partitioning allows a service to be dynamically repartitioned as the workload changes. Obviously, dynamic partitioning is much preferable. However, dynamic partitioning is harder to implement and may have performance implications. When a partition changes, data in different replica sets need to be shuffled to maintain separate replicas. Figure 3.11 shows that initially data is sharded into four partitions based on data keys ranging from 1 to 100. When the same set of data is repartitioned to five partitions, compositions of all partitions are affected, hence the entire data set need to be rearranged. Consistent hashing allows only a subset of the keys needs to be remapped, hence to drastically reduce data movements.

Distributed Data Structures (DDSs)

View Chapter

Purchase Book

Published in Suman Saha, Shailendra Shukla, Advanced Data Structures, 2019

Suman Saha, Shailendra Shukla

Most DHTs use some variant of consistent hashing or rendezvous hashing to map keys to nodes. The two algorithms appear to have been devised independently and simultaneously to solve the distributed hash table problem. Both consistent hashing and rendezvous hashing have the essential property that removal or addition of one node changes only the set of keys owned by the nodes with adjacent IDs, and leaves all other nodes unaffected. Contrast this with a traditional hash table in which addition or removal of one bucket causes nearly the entire keyspace to be remapped.

Cluster-based distributed dynamic cuckoo filter system for Redis

View Article

Journal Information

Published in International Journal of Parallel, Emergent and Distributed Systems, 2020

Peng Li, Baozhou Luo, Wenjun Zhu, He Xu

Besides, Devine et al. proposed a consistent hashing algorithm [20], which is now widely used in distributed systems. The basic cogitation of consistent hashing algorithm is to use the same hashing algorithm to map data and nodes (servers that store data) into a circular hash address space so that when removing or adding nodes, we can change the mapping between the existing service request and the processing request server as small as possible. Consistent hashing algorithm is a distributed algorithm, which is more commonly used for data distribution in distributed storage [21,22]. Memcached [23] client also chooses this algorithm to solve the problem of evenly distributing key-value to many memcached servers.