Tag Archives: Macro

Including external Pig files into Pig Latin scripts

In one of my projects, we had huge number of Pig scripts which dealt with data from a single source. The schema for this common data source is quite complex and changes every few months. Since this schema was present in all Pig files, when ever it changes, it was a real pain to update all Pig scripts.

I was looking for a way to separate out the schema into a separate Pig file and then include it in all other Pig scripts, like how you import a class in Java, instead of copy pasting it into all Pig files.

After some quick web searches, I found that from Pig 0.9 and above this feature is indeed available in Pig itself. It’s called macros. All you need to do is to just include the following line in your Pig script where you need it to be included.

import 'other-file.pig'

You can either give relative path in the above line or set the search path as well from where Pig should include the scripts. If you want to include the search path, then you can do something like this.

set pig.import.search.path '/usr/local/pig,/grid/pig';
import 'external-file.pig';

Now my Pig scripts are organized properly. Hope this helps you as well 🙂

Posted in Hadoop/Pig | Tagged , , | 1 Comment