Explore chapters and articles related to this topic
SQL-on-Hadoop Systems
Published in Kuan-Ching Li, Hai Jiang, Albert Y. Zomaya, Big Data Management and Processing, 2017
Alfredo Cuzzocrea, Rim Moussa, Soror Sahri
In terms of storage layouts, there are two different types of storage layouts for big data management, namely (i) row-oriented stores and (ii) column-oriented stores. A column-oriented DBMS stores data tables as columns of data. The main difference between a columnar database and a traditional row-oriented database is centered around performance, efficient memory management through loading only useful attributes, storage necessities which are reduced through compression of repeating values, and schema modifying techniques. Recent years have seen the introduction of a number of column-oriented database systems, including MonetDB [5], C-Store [6], and VectorWise [7]. Column-oriented storage layouts are well-suited for OLAP-like workloads, while row-oriented storage layouts are well-suited for OLTP workloads.
Databases
Published in Ian Foster, Rayid Ghani, Ron S. Jarmin, Frauke Kreuter, Julia Lane, Big Data and Social Science, 2020
A column-oriented DBMS stores data tables by columns rather than by rows, as is common practice in relational DBMSs. This approach has advantages in settings where aggregates must frequently be computed over many similar data items, e.g., in clinical data analysis. Google Cloud BigTable and Amazon RedShift are two cloud-hosted column-oriented NoSQL databases. HBase and Cassandra are two open source systems with similar characteristics. (Confusingly, the term column oriented also often is used to refer to SQL database engines that store data in columns instead of rows, e.g., Google BigQuery, HP Vertica, Terradata, and the open source MonetDB. Such systems are not to be confused with column-based NoSQL databases.)
Storage, System Security and Access Control for Big Data IoT
Published in Naveen Chilamkurti, T. Poongodi, Balamurugan Balusamy, Blockchain, Internet of Things, and Artificial Intelligence, 2021
T. Lucia Agnes Beena, T. Kokilavani, D. I. George Amalarethinam
The column-oriented data store method of storage representation is suitable for data mining and data analysis applications [11]. This method stores data in a tabular form of columns; the performance of the columnar approach is better than the conventional row-wise database systems. The column-oriented data store efficiently reads and writes data. The map reduce model is compatible with the column-oriented data store. HBase and Cassandra are examples for column-oriented database mechanism. Figure 13.3 shows the column-oriented data store-based data representation.
Integrating memory-mapping and N-dimensional hash function for fast and efficient grid-based climate data query
Published in Annals of GIS, 2021
Mengchao Xu, Liang Zhao, Ruixin Yang, Jingchao Yang, Dexuan Sha, Chaowei Yang
In 2005, the prototype of MonetDB was introduced as a main memory database system uses a column-at-a time execution model for the data warehouse. Although MonetDB is not a native array database but a full-fledged relational DBMS (Idreos et al. 2012), it provided useful thoughts and ideas for processing array models. It is a column-oriented database and each column, or BAT (Binary Association Table), in the database is implemented as a C-array on storage level. (Boncz, Zukowski, and Nes 2005) MonetDB has focused on optimizing the major components of traditional database architecture to make better use of modern hardware in database applications that support analyse massive data volumes (Boncz, Kersten, and Manegold 2008). In 2008, Cornacchia et al. also introduced an example of using a matrix framework with a Sparse Relational Array Mapping (SRAM) by Information Retrieval (IR) researchers, they used MonetDB/X100 as the relational backend, which provided fast response and good precision. The matrix framework is based on the array abstraction, and by mapping them onto the relational model and develop array queries, MonetDB allows them to optimize the performance and developed a high-performance IR application.
A scalable cloud-based cyberinfrastructure platform for bridge monitoring
Published in Structure and Infrastructure Engineering, 2019
Seongwoon Jeong, Rui Hou, Jerome P. Lynch, Hoon Sohn, Kincho H. Law
The Cassandra database is built upon a column family data model consisting of ‘keyspace,’ ‘column family’, ‘row’ and ‘column’, which are analogous to ‘database’, ‘table’, ‘tuple’ and ‘attribute’ of relational database, respectively. One important feature of the Casandra database schema is that it follows closely the BrIM schema for bridge monitoring applications. Figure 7 presents data mapping between the BrIM schema of the ‘FELine’ object and the corresponding column family schema ‘FELine’. The database schema contains the data entities of ‘FELine’, as well as ‘child’ and ‘parent’ entities to record the hierarchical relation between the objects. As such, bridge information stored in the column-oriented database can be mapped to hierarchical BrIM objects.
Building an efficient storage model of spatial-temporal information based on HBase
Published in Journal of Spatial Science, 2019
Ke Wang, Guolin Liu, Min Zhai, Zhiwei Wang, Chuanyi Zhou
With the rapid development of geographic information systems (GIS), spatial-temporal data are increasing dramatically (Ranjan et al. 2016). The construction of a smart city requires the effective organization and management of complex spatial-temporal information, such as coordinate information, text information and radio frequency identification (RFID). Consequently, more and more GIS applications have been used to build smart cities (Huang et al. 2016). Traditional spatial data storage systems are built by relational databases that are extended for spatial data. Generally, these systems are used to store structured data, but there is a certain lack of storage suitable for massive and unstructured data. Based on this, key aspects in the further development of GIS are the storing and processing of massive spatial-temporal data more clearly and completely (Wang et al. 2014, 2015, Deng et al. 2015, Ma et al. 2015). Therefore, the emerging Not Only SQL (NoSQL) provides technical support for the storage of massive and unstructured spatial-temporal data. The cloud platform (Karun and Chitharanjan 2013), which has high availability and real-time reading and writing characteristics that support massive data storage and management, plays an important role in big data processing. As a cloud storage database, the Hadoop database (HBase) is a distributed, fault-tolerant, highly scalable, column-oriented, NoSQL database. It is an improvement of the Hadoop distributed file system (HDFS) (White and Cutting 2009) and creates a massively scalable and high-performance platform that handles heterogeneous data, including non-textual data types (George 2011). It provides a platform and method for users to manage GIS spatial-temporal data.