Use case: Using Pig find the most occurred start letter.
Solution:
Case 1: Load the data into bag named "lines". The entire line is stuck to element line of type character array.
Case 2: The text in the bag lines needs to be tokenized this produces one word per row.
Case 3: To retain the first letter of each word type the below command .This commands uses substring method to take the first character.
Case 4: Create a bag for unique character where the grouped bag will contain the same character for each occurrence of that character.
Case 5: The number of occurrence is counted in each group.
Case 6: Arrange the output according to count in descending order using the commands below.
Case 7: Limit to One to give the result.
Case 8: Store the result in HDFS . The result is saved in output directory under sonoo folder.
No comments:
Post a Comment