Apache Pig – Knowledge and References

Explore chapters and articles related to this topic

Big Data, Cloud, Semantic Web, and Social Network Technologies

Published in Bhavani Thuraisngham, Murat Kantarcioglu, Latifur Khan, Secure Data Science, 2022

Bhavani Thuraisngham, Murat Kantarcioglu, Latifur Khan

Apache Pig is a scripting platform for analyzing and processing large datasets. Apache Pig enables Hadoop users to write complex MapReduce transformations using simple scripting language called Pig Latin. Pig converts Pig Latin script to a MapReduce job. The MapReduce jobs are then executed by Hadoop for the data stored in HDFS. Pig Latin programming is similar to specifying a query execution plan. That is, the Pig Latin scripts can be regarded to be an execution plan. This makes it simpler for the programmers to carry out their tasks. More details on Pig can be found in [PIG].

High-End Tools and Technologies for Managing Data in the Age of Big Data

View Chapter

Purchase Book

Published in Chiranji Lal Chowdhary, Intelligent Systems, 2019

N. Nivedhitha, P. M. Durai Raj Vincent

Apache PIG is a platform that works on top of HDFS to provide an abstraction over MapReduce in order to reduce its complexity (Vaddeman, 2016). The data manipulation operations can be performed with ease using PIG on Hadoop. PIG has a simple language to analyze data called as PIG Latin that has rich sets of operations and data types. The developers write a script using PIG Latin which is then converted to a series of MapReduce jobs internally which makes the developer’s job easy. Apache PIG handles all kinds of data and also automatically optimizes the data before its execution.

Learner to Advanced: Big Data Journey

View Chapter

Purchase Book

Published in Vijender Kumar Solanki, Vicente García Díaz, J. Paulo Davim, Handbook of IoT and Big Data, 2019

Meenu Gupta, Neha Singla

For processing large datasets, Apache Pig raises the level of abstraction. As the programmer, MapReduce permits you to identify a map function, which can be followed by a diminish function, which is then working out how to fit your information processing into this prototype. It usually needs multiple MapReduce stages that may also be a challenge.

Real-time Twitter data analysis using Hadoop ecosystem

View Article

Journal Information

Published in Cogent Engineering, 2018

Anisha P. Rodrigues, Niranjan N. Chiplunkar

The two key data access components of Hadoop Ecosystem are Apache Pig and Apache Hive. Hadoop’s basic programming layer is MapReduce but these components ease the writing of complex Java MapReduce program. Apache Pig is an abstraction over Map Reduce and it provides a high-level procedural data flow language for processing the data known as Pig Latin. Programmers use Pig Latin and write Pig scripts to analyze the data which is internally converted to Map Reduce jobs. Pig reduces the complexity of writing long lines of codes, and using built operators, users can develop their own function.

Programming models and systems for Big Data analysis

View Article

Journal Information

Published in International Journal of Parallel, Emergent and Distributed Systems, 2019

Loris Belcastro, Fabrizio Marozzo, Domenico Talia

Apache Pig15 is a high-level Apache open source project for executing data flow applications on top of Hadoop. It was originally developed by Yahoo! for easing the development of Big Data analysis applications and then moved into the Apache Software Foundation in 2007.