Explore chapters and articles related to this topic
One Platform Rules All
Published in Kuan-Ching Li, Hai Jiang, Albert Y. Zomaya, Big Data Management and Processing, 2017
Oozie is a workflow scheduler system to manage Apache Hadoop jobs. Oozie supports several types of Hadoop jobs such as MapReduce jobs, pig scripts, Hive queries, and Sqoop data import and export jobs. Oozie workflows are triggered by time (frequency) and/or data availability.
Real-time Twitter data analysis using Hadoop ecosystem
Published in Cogent Engineering, 2018
Anisha P. Rodrigues, Niranjan N. Chiplunkar
Social media dependency is inevitable, which has resulted in the generation of an abundant amount of data sets, making the method of processing and analyzing of data a challenge. Extensive dependence on social media data such as Twitter data, e-commerce data, etc. have gained much attention in the area of sentiment analysis. Hadoop proves to be an efficient framework for huge data analysis since Hadoop operates in a fault-tolerant manner. In addition to this, Hadoop can be integrated with Apache Pig, Hive, Oozie, Zookeeper, Sqoop, etc., which promises improved efficiency and performance of Hadoop. Pig Latin and HiveQL languages ease the complexity of writing complex MapReduce programs. In the proposed work, Hadoop framework has been used which is integrated with Apache Flume to fetch data from Twitter and Apache Pig and Hive are used to perform analysis on extracted Twitter data. First, recent trends in the extracted tweets were determined and then sentiment analysis was performed on the retrieved data. The experiment was performed on two Hadoop Ecosystem components, i.e. Pig and Hive and execution time were recorded. From the experimental results, conclusion can be drawn that Pig is more efficient than Hive as Pig takes less time for execution than Hive.