Pig – Knowledge and References

Explore chapters and articles related to this topic

Hadoop Framework: Big Data Management Platform for Internet of Things

Published in Lavanya Sharma, Pradeep K Garg, From Visual Surveillance to Internet of Things, 2019

Pallavi H. Bhimte, Pallavi Goel, Dileep Kumar Yadav, Dharmendra Kumar

Pig is a procedural data flow with high-level scripting language mainly used for programming with Apache Hadoop. It can be used by those who are not familiar with Java and are familiar with SQL scripting language, also known as Pig Latin. There is a user-defined function (UDF) facility in Pig that invokes code in many languages, like JRuby, Jython, and Java. Pig operates on the client side of a cluster and supports the Avro file format. It can easily handle large amounts of data and can be used for ETL data pipeline; research on raw data can be processed again and again. Data in Pig Latin can be loaded, stored, streamed, filtered, grouped, joined, combined, split, and sorted. A Pig program can be run in three different ways: Script: A file containing Pig Latin commands.Grunt: As a command interpreter, it executes the command.Embedded: A pig program can also be executed as part of a Java program. Pig is used in Dataium to sort and prepare data, in “People you may know” in LinkedIn, and in PayPal to analyze transactional data and attempt to prevent fraud.

One Platform Rules All

View Chapter

Purchase Book

Published in Kuan-Ching Li, Hai Jiang, Albert Y. Zomaya, Big Data Management and Processing, 2017

Xiongpai Qin, Keqin Li

Pig implements the procedural script language of Pig Latin. Programs written in Pig Latin are translated into MapReduce jobs and run upon the Hadoop platform. Some data-processing work, such as the join operation, is not straightforward to write in the form of MapReduce jobs. Pig Latin provides primitives such as join to facilitate writing of complex data-processing programs. Just like Hive, Pig is used to do offline data analysis. The difference between the two is that Hive uses a declarative language of HQL and Pig uses a procedural language of Pig Latin.

Big Data Development Platform for Engineering Applications

View Chapter

Purchase Book

Published in Satya Bir Singh, Alexander V. Vakhrushev, A. K. Haghi, Nanomechanics and Micromechanics, 2020

Heru Susanto, Fang-Yie Leu

Other than Hadoop, MapReduce is also preferable for large processing of big data in the cloud environment. This software allows large amounts of data sets to be processed and stored parallel in the cluster. Tools present in MapReduce called Hive and Pig makes data processing more feasible to process large data sets easily. Hashem et al. [34] mentioned that cluster computing provides good support to manage data growth within the context of big data.

Real-time Twitter data analysis using Hadoop ecosystem

View Article

Journal Information

Published in Cogent Engineering, 2018

Anisha P. Rodrigues, Niranjan N. Chiplunkar

The two key data access components of Hadoop Ecosystem are Apache Pig and Apache Hive. Hadoop’s basic programming layer is MapReduce but these components ease the writing of complex Java MapReduce program. Apache Pig is an abstraction over Map Reduce and it provides a high-level procedural data flow language for processing the data known as Pig Latin. Programmers use Pig Latin and write Pig scripts to analyze the data which is internally converted to Map Reduce jobs. Pig reduces the complexity of writing long lines of codes, and using built operators, users can develop their own function.

Programming models and systems for Big Data analysis

View Article

Journal Information

Published in International Journal of Parallel, Emergent and Distributed Systems, 2019

Loris Belcastro, Fabrizio Marozzo, Domenico Talia

Apache Pig15 is a high-level Apache open source project for executing data flow applications on top of Hadoop. It was originally developed by Yahoo! for easing the development of Big Data analysis applications and then moved into the Apache Software Foundation in 2007.