Explore chapters and articles related to this topic
Hadoop Framework: Big Data Management Platform for Internet of Things
Published in Lavanya Sharma, Pradeep K Garg, From Visual Surveillance to Internet of Things, 2019
Pallavi H. Bhimte, Pallavi Goel, Dileep Kumar Yadav, Dharmendra Kumar
Pig is a procedural data flow with high-level scripting language mainly used for programming with Apache Hadoop. It can be used by those who are not familiar with Java and are familiar with SQL scripting language, also known as Pig Latin. There is a user-defined function (UDF) facility in Pig that invokes code in many languages, like JRuby, Jython, and Java. Pig operates on the client side of a cluster and supports the Avro file format. It can easily handle large amounts of data and can be used for ETL data pipeline; research on raw data can be processed again and again. Data in Pig Latin can be loaded, stored, streamed, filtered, grouped, joined, combined, split, and sorted. A Pig program can be run in three different ways: Script: A file containing Pig Latin commands.Grunt: As a command interpreter, it executes the command.Embedded: A pig program can also be executed as part of a Java program. Pig is used in Dataium to sort and prepare data, in “People you may know” in LinkedIn, and in PayPal to analyze transactional data and attempt to prevent fraud.
One Platform Rules All
Published in Kuan-Ching Li, Hai Jiang, Albert Y. Zomaya, Big Data Management and Processing, 2017
Pig implements the procedural script language of Pig Latin. Programs written in Pig Latin are translated into MapReduce jobs and run upon the Hadoop platform. Some data-processing work, such as the join operation, is not straightforward to write in the form of MapReduce jobs. Pig Latin provides primitives such as join to facilitate writing of complex data-processing programs. Just like Hive, Pig is used to do offline data analysis. The difference between the two is that Hive uses a declarative language of HQL and Pig uses a procedural language of Pig Latin.
Big Data Development Platform for Engineering Applications
Published in Satya Bir Singh, Alexander V. Vakhrushev, A. K. Haghi, Nanomechanics and Micromechanics, 2020
Other than Hadoop, MapReduce is also preferable for large processing of big data in the cloud environment. This software allows large amounts of data sets to be processed and stored parallel in the cluster. Tools present in MapReduce called Hive and Pig makes data processing more feasible to process large data sets easily. Hashem et al. [34] mentioned that cluster computing provides good support to manage data growth within the context of big data.
Real-time Twitter data analysis using Hadoop ecosystem
Published in Cogent Engineering, 2018
Anisha P. Rodrigues, Niranjan N. Chiplunkar
The two key data access components of Hadoop Ecosystem are Apache Pig and Apache Hive. Hadoop’s basic programming layer is MapReduce but these components ease the writing of complex Java MapReduce program. Apache Pig is an abstraction over Map Reduce and it provides a high-level procedural data flow language for processing the data known as Pig Latin. Programmers use Pig Latin and write Pig scripts to analyze the data which is internally converted to Map Reduce jobs. Pig reduces the complexity of writing long lines of codes, and using built operators, users can develop their own function.
Programming models and systems for Big Data analysis
Published in International Journal of Parallel, Emergent and Distributed Systems, 2019
Loris Belcastro, Fabrizio Marozzo, Domenico Talia
Apache Pig15 is a high-level Apache open source project for executing data flow applications on top of Hadoop. It was originally developed by Yahoo! for easing the development of Big Data analysis applications and then moved into the Apache Software Foundation in 2007.