Parsing JSON in Pig UDF written in Python

When I wrote about using Python to write UDF functions for Pig, I mentioned that Pig would internally be using Jython to parse the code, but 99% of time this shouldn’t be an issue. But I hit the other 1% recently 🙂

I had a small piece of Python code that used the built-in json module to parse JSON data. I converted that into a UDF function and when I tried to call it from Pig, I was getting “module not found” exception. After some quick checks, I found that the latest stable version of Jython is 2.5.x and json module was added from 2.6

After some web searches, I came across jyson through a blog post about using JSON in Jython. jyson is an Java implementation of JSON codec for Jython 2.5 which can be used as a drop-in replacement for Python’s built-in json module.

I downloaded jyson jar and then added it to Pig’s Dpig.additional.jars property. In the Python code, I changed the import statement to import com.xhaus.jyson.JysonCodec as json. After that everything started to work again 🙂

Related posts

Tags: , , , ,

1 Comments so far

Follow up comments through RSS Feed | Post a comment

Leave a Reply

Your email address will not be published. Required fields are marked *