Apache Hive – Knowledge and References

Explore chapters and articles related to this topic

SQL-on-Hadoop Systems

Published in Kuan-Ching Li, Hai Jiang, Albert Y. Zomaya, Big Data Management and Processing, 2017

Alfredo Cuzzocrea, Rim Moussa, Soror Sahri

Apache Hive [29] is an open-source data warehousing solution built on top of Hadoop and released by Facebook. Hive supports queries expressed in HiveQL, an SQL-like declarative language, which are compiled into MapReduce jobs. Figure 9.4 illustrates HiveQL code for TPC-H business question Q5. Similar to traditional databases, Hive stores data in tables, where each table consists of a number of rows, and each row consists of a specified number of columns. Each column has an associated type. The latter is either primitive type (integer, float, etc.) or complex type (map, list, struct). HiveQLsupports analysis expressed as MapReduce programs by users and in the programming language of their choice.

Big Data Analytics in Healthcare Data Processing

View Chapter

Purchase Book

Published in Punit Gupta, Dinesh Kumar Saini, Rohit Verma, Healthcare Solutions Using Machine Learning and Informatics, 2023

Tanveer Ahmed, Rishav Singh, Ritika Singh

Apache Hive: Hive by Apache Hive is a data warehouse architecture for querying and analyzing large amounts of Hadoop HDFS data. It is a Hadoop ETL tool (extract, transform, and load). Hive is a Hadoop-based data warehouse that uses a declarative language is known as Hive query language (HiveQL) which allow SQL programmers to easily analyze data [23].

Programming models and systems for Big Data analysis

View Article

Journal Information

Published in International Journal of Parallel, Emergent and Distributed Systems, 2019

Loris Belcastro, Fabrizio Marozzo, Domenico Talia

Apache Hive16 is a popular data warehouse system built on top of Hadoop, which has been designed with the main aim of providing a scalable solution for managing and processing very large amounts of data (up to petabytes). Nowadays, it is used and supported by most important IT companies, such as Facebook, Yahoo, eBay, Netflix. It has a large community of developers that collectively ensure a rapid development of the system. Hive provides a declarative SQL-like language, called Hive Query Language (HiveQL), which can be used to easily develop scripts for querying data stored on HDFS. The HiveQL syntax is very similar to SQL syntax, since it reuses the main concepts of relational databases (e.g. table, row, column). Each data manipulation query is automatically translated into a MapReduce job, which allows developers to deal with Big Data without having to write long and complex programs directly in MapReduce.

Real-time Twitter data analysis using Hadoop ecosystem

View Article

Journal Information

Published in Cogent Engineering, 2018

Anisha P. Rodrigues, Niranjan N. Chiplunkar

Apache Hive is a data warehousing software that address how data is structured and queried. Hive has a declarative SQL like language, i.e. HiveQL or HQL for querying data. Traditional SQL queries can easily be implemented using HiveQL. In Hive, queries are implicitly converted to the mapper and reducer job. The advantages of using Hive are the features it provides, i.e. fast, scalable, extensible, etc. People who are not good at programming too can go in for this to analyze data on HDFS (KadharBasha & Balamurugan, 2017).