Explore chapters and articles related to this topic
Big Data Computing
Published in Vivek Kale, Parallel Computing Architectures and APIs, 2019
NoSQL databases have been classified into four subcategories: Column family stores: An extension of the key–value architecture with columns and column families; the overall goal was to process distributed data over a pool of infrastructure, for example, HBase and Cassandra.Key–value pairs: This model is implemented using a hash table where there is a unique key and a pointer to a particular item of data, creating a key–value pair, for example, Voldemort.Document databases: This class of databases is modeled after Lotus Notes and is similar to key–value stores. The data is stored as a document and is represented in JSON or XML formats. The biggest design feature is the flexibility to list multiple levels of key–value pairs, for example, Riak and CouchDB.Graph databases: Based on the graph theory, this class of database supports scalability across a cluster of machines. The complexity of representation for extremely complex sets of documents is evolving, for example, Neo4J.
Big Data Computing
Published in Vivek Kale, Digital Transformation of Enterprise Architecture, 2019
NoSQL databases have been classified into four subcategories: Column family stores: An extension of the key–value architecture with columns and column families; the overall goal was to process distributed data over a pool of infrastructure, for example, HBase and Cassandra.Key–value pairs: This model is implemented using a hash table where there is a unique key and a pointer to a particular item of data creating a key–value pair, for example, Voldemort.Document databases: This class of databases is modeled after Lotus Notes and similar to key–value stores. The data are stored as a document and is represented in JSON or XML formats. The biggest design feature is the flexibility to list multiple levels of key–value pairs, for example, Riak and CouchDB.Graph databases: Based on the graph theory, this class of database supports the scalability across a cluster of machines. The complexity of representation for extremely complex sets of documents is evolving, for example, Neo4J.
Storage and databases for big data
Published in Jun Deng, Lei Xing, Big Data in Radiation Oncology, 2019
Tomas Skripcak, Uwe Just, Ida Schönfeld, Esther G.C. Troost, Mechthild Krause
Column family databases extend the idea of key–value stores by a concept in which the value represents one or more column families. Each column family group contains a set of columns, where each column is a triplet consisting of a column name, value, and timestamp capturing the moment of data insertion into the database, as depicted in Figure 3.4b. Similar to the key–value store, column family databases provide an effective way for storing sparse data because columns without values do not need to occupy any storage space. No general purpose query language is supported out of the box. The column family data can be retrieved by row key, and columns are sorted according to column name at the time of their insertion into the column family. Thus, it is critical to consider the key and column naming aspects during data model design to achieve good query performance. This type of database performs very well in aggregation queries (e.g., sum, count, avg) that need to access column values across the whole row key range. The embedded time dimension allows for easy storage of time-series data or repeating measurement instruments data. Column family databases are normally deployed on distributed file system and can leverage parallel processing paradigms (such as MapReduce), which are discussed later in this chapter.
A scalable cloud-based cyberinfrastructure platform for bridge monitoring
Published in Structure and Infrastructure Engineering, 2019
Seongwoon Jeong, Rui Hou, Jerome P. Lynch, Hoon Sohn, Kincho H. Law
The Cassandra database is built upon a column family data model consisting of ‘keyspace,’ ‘column family’, ‘row’ and ‘column’, which are analogous to ‘database’, ‘table’, ‘tuple’ and ‘attribute’ of relational database, respectively. One important feature of the Casandra database schema is that it follows closely the BrIM schema for bridge monitoring applications. Figure 7 presents data mapping between the BrIM schema of the ‘FELine’ object and the corresponding column family schema ‘FELine’. The database schema contains the data entities of ‘FELine’, as well as ‘child’ and ‘parent’ entities to record the hierarchical relation between the objects. As such, bridge information stored in the column-oriented database can be mapped to hierarchical BrIM objects.
Building an efficient storage model of spatial-temporal information based on HBase
Published in Journal of Spatial Science, 2019
Ke Wang, Guolin Liu, Min Zhai, Zhiwei Wang, Chuanyi Zhou
With the characteristics of sparse data storage, persistence and multidimensional mapping, an HBase table represents a mapping relationship that can be used to locate specific data by row, row+time-stamp or row+column (column family: column modifier) (George 2011). HBase logically organizes the data into nested mappings, and its sparseness allows for white space when the data are stored. HBase has only one RowKey for determining the storage column. Row information includes RowKey, TimeStamp and family Information. A column family contains the contents of several columns. Therefore, a row can be expressed in this form (i.e. RowKey, TimeStamp, Column family). The logical view of HBase is shown in Table 1.