HBase – Knowledge and References

Explore chapters and articles related to this topic

Hadoop Framework: Big Data Management Platform for Internet of Things

Published in Lavanya Sharma, Pradeep K Garg, From Visual Surveillance to Internet of Things, 2019

Pallavi H. Bhimte, Pallavi Goel, Dileep Kumar Yadav, Dharmendra Kumar

HBase is an open-source nonrelational database that gives real-time read/write access to many large datasets. HBase is Java-based not only SQL-based. HBase allows for dynamic changes, and can be utilized for standalone applications. HBase tables containing rows and columns are labeled. Each intersection of a row and column is versioned. This version is auto-signed and time stamped by default. The row keys are byte arrays and column keys are grouped into families. Each family has a specific name having certain information, like name, age, date of birth, and so on [22]. The data that are stored into the table are stored horizontally. All data are stored in a single table. For every single region, there exists a MemStore. Without any joining, families are created for similar kinds of data stored in the tables of HBase. In columns, versions can be created for updating data; by default it is three. For every version, there is a unique timestamp. Using row keys, any cell in the column family can be accessed. HBase is fault-tolerant, fast, and usable. Replication is possible across the data center, so lost data can be regained.

Home Automation in Cloud-Based IoT

View Chapter

Purchase Book

Published in Fadi Al-Turjman, The Cloud in IoT-enabled Spaces, 2019

Fadi Al-Turjman, Mohamad Sanwal

HBase is a column-oriented NoSQL database utilized in Hadoop, in which the client can store substantial quantities of code lines and sections. HBase performs write/read activities. It additionally supports record-level updates, which are not conceivable using HDFS. HBase gives parallel data capacity by means of the hidden data file frameworks over the cloud servers. It is an open-source code to handle data in petabytes in thousands of nodes. It has the following features: (1) compatible with Java API for client access, (2) bloom filters and block cache for real-time queries, (3) linear and modular scalability, (4) strictly consistent reads and writes, (5) extensible JIRB (Interactive JRuby) shell, (6) supports exporting metrics via Hadoop, and (7) convenient base classes for backing MapReduce tasks with HBase tables.

Remote Sensing Data Organization and Management in a Cloud Computing Environment

View Chapter

Purchase Book

Published in Lizhe Wang, Jining Yan, Yan Ma, Cloud Computing in Remote Sensing, 2019

Lizhe Wang, Jining Yan, Yan Ma

The data stored in HBase are organized as tables. Each table contains a large number of sorted rows, each of which has a unique row key and a number of column families. There are different numbers of qualifiers within each column family, and at the intersections of rows and column qualifiers are the table cells. Table cells are used to store actual data and often have multiple versions specified by timestamp. A table cell can be unique identified by the following sequence: (Table, Row Key, Column Family: Column Qualify, Timestamp). Tables stored in HBase are horizontally split into a number of regions, each of which is assigned to specific regions determined by HBase Master. Regions assigned to each region server are further vertically divided into many files stored in HDFS according to the column families.

Big data driven cycle time parallel prediction for production planning in wafer manufacturing

View Article

Journal Information

Published in Enterprise Information Systems, 2018

Junliang Wang, Jungang Yang, Jie Zhang, Xiaoxi Wang, Wenjun (Chris) Zhang

The basic platform installs the Hadoop series software to enable the big data computing in parallel. The Hadoop components contain MapReduce, Hadoop Distributed File System (HDFS), HBase, Hive, and Yarn, etc (Wa et al. 1985). The MapReduce is a highly effective and efficient parallel tool for large-scale data analysis. In this cycle time forecasting system, the parallel training process is developed with the MapReduce model. The HDFS is the primary distributed storage used by Hadoop applications, which stores the raw data files for the cycle time forecasting. The HDFS cluster in the case study primarily consists of a NameNode that manages the file system metadata and three DataNodes that store the actual data. HBase is a big table-like structured storage system for Hadoop HDFS, which provides real-time read/write access to the structured data. Moreover, Hive is a data warehouse infrastructure which allows SQL-like querying of data (in any format) stored in Hadoop. In the basic platform, Yarn is a package manager to manage the codes and files for cycle time forecasting. Furthermore, the basic platform is designed to implement the parallel DP-RBFN forecasting algorithm. From the view of algorithm deploy, many parallelized platforms are applicative, such as Spark (Yavuz et al. 2016), and Disco (Papadimitriou and Sun 2009), since they all are developed based on MapReduce model.

Building an efficient storage model of spatial-temporal information based on HBase

View Article

Journal Information

Published in Journal of Spatial Science, 2019

Ke Wang, Guolin Liu, Min Zhai, Zhiwei Wang, Chuanyi Zhou

With the rapid development of geographic information systems (GIS), spatial-temporal data are increasing dramatically (Ranjan et al. 2016). The construction of a smart city requires the effective organization and management of complex spatial-temporal information, such as coordinate information, text information and radio frequency identification (RFID). Consequently, more and more GIS applications have been used to build smart cities (Huang et al. 2016). Traditional spatial data storage systems are built by relational databases that are extended for spatial data. Generally, these systems are used to store structured data, but there is a certain lack of storage suitable for massive and unstructured data. Based on this, key aspects in the further development of GIS are the storing and processing of massive spatial-temporal data more clearly and completely (Wang et al. 2014, 2015, Deng et al. 2015, Ma et al. 2015). Therefore, the emerging Not Only SQL (NoSQL) provides technical support for the storage of massive and unstructured spatial-temporal data. The cloud platform (Karun and Chitharanjan 2013), which has high availability and real-time reading and writing characteristics that support massive data storage and management, plays an important role in big data processing. As a cloud storage database, the Hadoop database (HBase) is a distributed, fault-tolerant, highly scalable, column-oriented, NoSQL database. It is an improvement of the Hadoop distributed file system (HDFS) (White and Cutting 2009) and creates a massively scalable and high-performance platform that handles heterogeneous data, including non-textual data types (George 2011). It provides a platform and method for users to manage GIS spatial-temporal data.