Category Archives: Python

Posts about Python

Automatically send unique errors (with count) from Apache error log as email

Sometime back I wrote about a simple awk script that I wrote which allowed me to find unique errors from Apache error log files.

After I wrote that script I found myself executing that script every day in the morning to figure out if there were any errors in my sites. After a couple of days I wrote another script to automatically parse the error log file and email me if there were any errors. As usual I thought of writing about it here so that it would be useful for someone else as well 🙂

Continue reading »

Posted in Python, Unix/Server Stuff | Tagged , , | 5 Comments

Read data from Google Sheet into a Python Pandas DataFrame

Recently I have done lot of data analysis in Python (more details about this in another post) and have started to like Pandas a lot. The other day I had to process some data from a Google Sheet and was wondering whether I could read the data as a Pandas DataFrame and after a quick search found the gspread package and within a few lines of code I was able to read data from Google Sheet into a Pandas DataFrame.

Continue reading »

Posted in Python | Tagged , , | 4 Comments

Error while starting Jython in Mac

After I figured out how I can use Python to create Pig UDF functions, I got interested in Jython and wanted to play around with it. So I installed it in my Mac through homebrew by executing the following command.

brew install jython

Everything got installed properly, I was able to run Jython after setting
up the following lines in my [bashrc](https://github.com/sudar/dotfiles)

export JYTHON_HOME=$(brew --prefix jython)/libexec
export PATH=$PATH:$JYTHON_HOME/bin

But there was one small annoyance. Every time I was staring Jython, I was getting the following strange warning.

expr: syntax error

After a couple of web searches, I stumbled upon an email thread in the Jython-users mailing list. Basically it was due to a bug in the bash script that is used to start Jython. I opened /usr/local/Cellar/jython/2.5.3/libexec/bin/jython file and changed the line, which fixed the error.

if expr "$link" : '/' > /dev/null; then to if expr "$link" : '[/]' > /dev/null; then

I am exploring Jython more and will keep you guys updated, if I find something interesting.

Posted in Python | Tagged , , , | 1 Comment

Parsing JSON in Pig UDF written in Python

When I wrote about using Python to write UDF functions for Pig, I mentioned that Pig would internally be using Jython to parse the code, but 99% of time this shouldn’t be an issue. But I hit the other 1% recently 🙂

I had a small piece of Python code that used the built-in json module to parse JSON data. I converted that into a UDF function and when I tried to call it from Pig, I was getting “module not found” exception. After some quick checks, I found that the latest stable version of Jython is 2.5.x and json module was added from 2.6

After some web searches, I came across jyson through a blog post about using JSON in Jython. jyson is an Java implementation of JSON codec for Jython 2.5 which can be used as a drop-in replacement for Python’s built-in json module.

I downloaded jyson jar and then added it to Pig’s Dpig.additional.jars property. In the Python code, I changed the import statement to import com.xhaus.jyson.JysonCodec as json. After that everything started to work again 🙂

Posted in Hadoop/Pig, Python | Tagged , , , , | 1 Comment

Writing Pig UDF functions using Python

Recently I was working with Pig (the apache one, not the animal 😉 ) and needed to implement a complex logic. Instead of struggling to write it in Pig, I decided to write a UDF (User defined Function). Also, I was too lazy to copy paste lot of boilerplate code to write the UDF in Java and decided to write it in Python. Long time readers might know that ever since I learned Python (around 7 years ago), I have been a huge fan.

In the end, I found that it was too easy to write UDF’s using Python, when compared with writing them in Java. I thought of writing about it here so that it would be helpful and will act as a starting point for people who also want to write their own UDF using Python.

Python vs Jython

Well, before we start, one thing that we have to keep in mind is that, even though we would be writing our code in Python, Pig will internally execute the code using Jython. 99% of time there will not be any difference, but it is good to keep that in mind.

Python code

First in the python side all we need to do to expose a Python function as a UDF, is to just specify a decorator to it.

Let’s say we have the following Python function that returns the length of the argument that is passed to it.

All we need to expose this function as a UDF is to add the @outputSchema decorator. So the code becomes

When data is passed from Pig to Python, it is passed as bytearray. Most of the time, this shouldn’t be a problem. But there are times when this could be a problem. In those cases, we can just convert it into proper string before we consume it. So the final code would look like this

Pig code

In the Pig side, we should do two things.

  • Register the UDF
  • Call the UDF 😉

Register the UDF

As I said in the beginning, Pig internally will use Jython to parse Python code. So we first need to register our Python file using the REGISTER statement. We can just say REGISTER 'udf.py' USING jython as pyudf

Call UDF

Once we register the UDF using the REGISTER statement, we can then call the UDF function using the alias that we created.

Here is the complete code in the Pig side.

And believe me, that’s all you need to do to write Pig UDF functions using Python. No more unneeded Java classes, boilerplate code or Jar creation process 🙂

Posted in Hadoop/Pig, Python | Tagged , , | 8 Comments

Using ez430 to control PPTs in Mac

After seeing me using my ez430, programmable watch to control ppt’s in my Mac, a couple of people asked me explain how I do it. As usual, instead of sending separate emails, I thought of documenting it here, so that it would be useful for others.

The following are the steps you need to follow.

  • Plugin the ez430 USB dongle to the USB port of your Mac
  • Pair the USB dongle and your ez430 watch
  • Install python and Serial library
  • Download or clone the ez430 tools repo from github
  • Open up your ppt presentation
  • Open terminal, navigate to the directory where you have ez430 tools package files and then type the command python ppt-mac.py
  • And you are done 🙂

Let me know if you face any issues.

Posted in Gadgets, Python | Tagged , , | 6 Comments

One month with Python

It’s been a month since I started playing with Python and I was able to tame it and now it has become a good pet. 😉

Many guys asked me what my favorite IDE for Python is after seeing my crush for IDE’s and my interest with Python.

Well, initially I was using the default IDLE which came with Python distribution and after getting a hold of the syntax I started using PyDEV, a plugin for Eclipse. PyDEV lets you to use all the features of Eclipse for developing both Python and Jython programs.

If you are looking for a decent, free IDE for Python then PyDEV will surely suit your needs.

I have solved some exercise problems from books I have used to learn Python and will try to post them here after refining them a bit.

Any one else playing with Python?

Posted in Python | Tagged , | 5 Comments

My First two weeks with Python

After getting lot of recommendations I have started with Python and here are my thought on it after fiddling with it for the past two weeks.

With in a couple of hours into Python it became evident to me that I have taken the right decision. It’s really very powerful and also you write very minimal but robust code when compared with other languages. Python has the right balance between syntax easiness and powerfulness. Like Java the designers of Python have designed it such a way so that it discourages bad coding but at the same time have not made it too bulky in syntax.

The following are some of the features which I like in Python, presented in no particular order

  • No statement delimiter – not even braces to represent block of code, just plain indentations.
  • Very very powerful built-in library – It has almost all the features that you want.
  • Built in support for complex data types like list and dictionary.
  • Short hand multiple assignment statements.
  • Powerful introspection
  • List comprehensions

and the list goes on…

I have already finished reading Swaroop’s Byte of Python and am currently half way through Mark’s Dive into Python.

If you are a beginner to Python that Byte of Python is an excellent book and once you get a hang of the syntax you can read Dive into Python which teaches you the advanced concepts.

Posted in Python | Tagged , , | 3 Comments

What should I learn next, Python or Ruby?

It’s been quite some time since I learned a new skill and this feeling has started to haunt my mind. I have had thing feeling quite a few times before and every time it resulted in me adding a new skill to my resume. In 2004 it was PHP and in 2005 it made me to (re)learn Javascript for Ajax and in 2006, almost 7 months have gone 🙁 and so I have decided to get my hands dirty on something new.

The two new things top on my list are Python and Ruby and I have to decide upon either one of them. So guys, I am looking for your suggestion since I am new to both of them. If you have used any one of them then do share those experience with me.

Right now I am slightly biased towards Python, after knowing that even Jeremy is learning it. More over both Drive into Python by Mark and the Yahoo Python Developer center could assist me. 😉

So will the snake swallow the gem?

Posted in Python | Tagged , | 14 Comments