Sharding – Knowledge and References

Explore chapters and articles related to this topic

Big Data Computing

Published in Vivek Kale, Parallel Computing Architectures and APIs, 2019

In cloud technology, sharding is used to refer to the technique of partitioning a table among multiple independent databases by row. However, the partitioning of data by row in relational databases is not new and is referred to as horizontal partitioning in parallel database technology. The distinction between sharding and horizontal partitioning is that horizontal partitioning is done transparently to the application by the database, whereas sharding is explicit partitioning done by the application. However, the two techniques have started converging, since traditional database vendors have started offering support for more sophisticated partitioning strategies. Since sharding is similar to horizontal partitioning, we will first discuss different horizontal partitioning techniques. It can be seen that a good sharding technique depends on both the organization of the data and the type of queries expected.

Framework for Visualization of GeoSpatial Query Processing by Integrating MongoDB with Spark

View Chapter

Purchase Book

Published in Qurban A. Memon, Shakeel Ahmed Khoja, Data Science, 2019

S. Vasavi, P. Vamsi Krishna, Anu A. Gokhale

Our previous work [16,18] is on geospatial querying by integrating Cassandra with Hadoop. Geospatial querying within Hadoop environment faced significant data moments because of no proper scheduling mechanism for job execution. We extended the same framework for integrating MongoDB with Spark with visualization. The main objective of this chapter is to propose “GeoMongoSpark Architecture” for efficient processing of spatial queries in a distributed environment that takes advantage of integrating MongoDB with Spark for geospatial querying by using geospatial indexing and sharding techniques and visualizing the geospatial query processing. The main objectives are listed as follows: To understand the need for integrating MongoDB and Spark and propose a new architecture for the integration.Collect benchmark datasets for testing the proposed architecture.Apply various sharding techniques to partition the data.Compare the performance evaluation of various sharding techniques.Use Tableau for visualization.

Big Data Computing

View Chapter

Purchase Book

Published in Vivek Kale, Digital Transformation of Enterprise Architecture, 2019

Vivek Kale

In cloud technology, sharding is used to refer to the technique of partitioning a table among multiple independent databases by row. However, partitioning of data by row in relational databases is not new and is referred to as horizontal partitioning in parallel database technology. The distinction between sharding and horizontal partitioning is that horizontal partitioning is done transparently to the application by the database, whereas sharding is explicit partitioning done by the application. However, the two techniques have started converging, since traditional database vendors have started offering support for more sophisticated partitioning strategies. Since sharding is similar to horizontal partitioning, we first discuss different horizontal partitioning techniques. It can be seen that a good sharding technique depends on both the organization of the data and the type of queries expected.

Storing, preprocessing and analyzing tweets: finding the suitable noSQL system

View Article

Journal Information

Published in International Journal of Computers and Applications, 2022

Souad Amghar, Safae Cherdal, Salma Mouline

Another scalability factor is data auto-sharding. Auto-sharding is the capacity of a database system to distribute data across nodes automatically. In Redis, data have to be distributed by the user [32], which makes Redis scalability poor [14]. However, Cassandra [33], Couchbase [34] and MongoDB [35] provide automatic data distribution.

Big Data Retrieval Using Locality-Sensitive Hashing with Document-Based NoSQL Database

View Article

Journal Information

Published in IETE Journal of Research, 2021

N.R. Gayathiri, A.M. Natarajan

Horizontal Scaling or Sharding: This splits the data within the dataset and allocates the data over the servers-shards. Each shard can be considered as an individual database or at the same time shards can also make up a single database.