Friday, August 16, 2019

Pig UDF (User Defined Functions)

To specify custom processing, Pig provides support for user-defined functions (UDFs). Thus, Pig allows us to create our own functions. Currently, Pig UDFs can be implemented using the following programming languages: -
  • Java
  • Python
  • Jython
  • JavaScript
  • Ruby
  • Groovy
Among all the languages, Pig provides the most extensive support for Java functions. However, limited support is provided to languages like Python, Jython, JavaScript, Ruby, and Groovy.

Example of Pig UDF

In Pig,
  • All UDFs must extend "org.apache.pig.EvalFunc"
  • All functions must override the "exec" method.
Let's see an example of a simple EVAL Function to convert the provided string to uppercase.
UPPER.java
  1. package com.hadoop;  
  2.   
  3. import java.io.IOException;  
  4.   
  5. import org.apache.pig.EvalFunc;  
  6. import org.apache.pig.data.Tuple;  
  7.   
  8. public class TestUpper extends EvalFunc<String>   {  
  9.     public String exec(Tuple input) throws IOException {    
  10.         if (input == null || input.size() == 0)    
  11.         return null;    
  12.         try{    
  13.                     String str = (String)input.get(0);    
  14.         return str.toUpperCase();    
  15.         }catch(Exception e){    
  16.         throw new IOException("Caught exception processing input row ", e);    
  17.                 }    
  18.             }  
  19. }  
  • Create the jar file and export it into the specific directory. For that ,right click on project - Export - Java - JAR file - Next.
Apache Pig UDF
  • Now, provide a specific name to the jar file and save it in a local system directory.
Apache Pig UDF
  • Create a text file in your local machine and insert the list of tuples.
  1. $ nano pigsample  
Apache Pig UDF
  • Upload the text files on HDFS in the specific directory.
  1. $ hdfs dfs -put pigexample /pigexample  
  • Create a pig file in your local machine and write the script.
  1. $ nano pscript.pig  
Apache Pig UDF
  • Now, run the script in the terminal to get the output.
  1. $pig pscript.pig  
Apache Pig UDF
Here, we got the desired output.

No comments:

Post a Comment

Lab 09: Publish and subscribe to Event Grid events

  Microsoft Azure user interface Given the dynamic nature of Microsoft cloud tools, you might experience Azure UI changes that occur after t...