Tag Archives: command line

Automatically send unique errors (with count) from Apache error log as email

Sometime back I wrote about a simple awk script that I wrote which allowed me to find unique errors from Apache error log files.

After I wrote that script I found myself executing that script every day in the morning to figure out if there were any errors in my sites. After a couple of days I wrote another script to automatically parse the error log file and email me if there were any errors. As usual I thought of writing about it here so that it would be useful for someone else as well 🙂

Continue reading »

Posted in Python, Unix/Server Stuff | Tagged , , | 5 Comments

How to print unique errors(with count) from Apache error logs

After I moved to WordPress Multisite recently, I had to closely watch the Apache error log file for a couple of days to make sure that there weren’t any configuration mismatch.

Initially I used run the tail command to view the errors, but then I quickly realized that if there is a frequent error, then it becomes extremely difficult to find out other non frequent errors.

Command

After some fiddling with the awk and sort commands, I came up with the following one liner that prints the unique errors with their count.

Explanation

awk script is executed for each line in the error log file.

Field Separator

The -F directive is used to specify field separator. If you look at a single line in the apache error log file, you will notice that each field is enclosed by [ and ].

awk allows us to specify multiple field separators by enclosing them between []. That’s what we are specifying in -F[\[\]]

Filtering out lines that contain error

After splitting the fields, we need to filter out the lines that contain the term error in the fourth field. That’s what happens in the next part of the command. Basically the part $4 == "error"{print $7} prints the seventh field, which has the real error message if the fourth field is equal to the string “error”.

Sorting the error messages

The next step is to sort all the error messages. This is done by piping the output to the sort command.

Finding unique error messages and counting them

The next step is to find the unique error messages and then count them. This is done by piping the output to the uniq command. The -c flag does the counting.

Sorting the messages by frequency

The last step is to sort the messages again by frequency and print them. This is done by the last sort command. The -n flag sorts by numbers and the flag -r does the sorting by descending order.

Posted in Unix/Server Stuff | Tagged , , , , , | Leave a comment

How To Unfreeze Frozen SSH Tabs In Mac Terminal App

I use the Mac Terminal App a lot to ssh into remote servers. It all works great except for a single annoyance. If I leave the ssh tab open without any activity for sometime or put my Mac to sleep, then the tab gets frozen and becomes unresponsive. It takes a couple of minutes for it to log out of the server and become responsive again, so that I can reuse the tab again.

Whenever this happens, I have to close the unresponsive tab and open a new one, or wait for 5-10 minutes. I did some research about it and found that Mac Terminal App provides an escape command that can be used to forcefully log out of the remote server. All you need to do is to type the following command in the unresponsive tab.

~.

(Tilda followed by a dot)

I tried it and it works like a charm. It immediately logs you out of the frozen ssh tab and you can reuse your tab again without waiting for it to become responsive again.

Hope this small trick is useful for others as well 🙂

Posted in Unix/Server Stuff | Tagged , , , | 4 Comments

Sum up all values of a column in a text file

Recently, I had to sum up all integers of a column in a text file (similar to how you do in excel). After some digging up, I came up with a awk one liner to do it.

Following my tradition of documenting one liners, I am going to document this one as well 🙂

Input and output data

Here is the super simplified version of input data that I was using.

I wanted to find the sum of all values present in the 3rd column. So in the case, the output that I was expecting was 145

Command

Here is the awk one liner, which does this.

Explanation

awk script is executed for each line and the first part of the command creates a variable s that stores the sum of all values in the 3rd column.

When the end of file is reached, the second part of the command is executed, which just prints the value of the variable.

Field separator

If the columns are separated by a comma or by any other non whitespace character, then you have to just specify it by adding FS=',' to the above command.

The more I dig deeper into awk, the more I like it and it is really fascinating to see how much you can do with this tool.

I learned a lot about awk and hopefully this teaches something to you as well 😉

Posted in Unix/Server Stuff | Tagged , , | 7 Comments

Count the number of empty lines in a file using grep

Recently I had a need to count the number of empty lines in a text file. After some digging into the man pages of grep I was able to come up with a one liner which was able to do it.

Following my tradition of documenting one liners, I am going to document this one as well 🙂

Assumption

By empty line, I mean any line which either has no characters or has only whitespace (space, tab) characters.

Command

For the impatient in you, here is the actual command.

Explanation

  • -P '\S' – This selects all lines that have a non whitespace character
  • -c – Print the count of matching lines
  • -v – Select only the non-matching lines

So, we are first matching all lines that have a non whitespace character and then use -v option to ignore them and then -c option to print the count instead of the actual line.

If we wanted the count of all non-empty lines, then we just have to remove the -v option from the above command.

Hope this is helpful. Happy Grep’ing 😉

Posted in Unix/Server Stuff | Tagged , , | 7 Comments

Passing command-line arguments to Pig scripts

In pretty much every Pig script that you will be writing, you will have to specify at least two locations – the input and the output locations. If you are going to use multiple inputs or have to register multiple jars for UDF, then this is bound to increase.

I run most of my Pig scripts through a shell script and I was looking for a way to pass in these locations at runtime instead of hard coding them in the Pig script. After a bit of research, I found that Pig has the ability to accept command-line parameters and there are in fact multiple options to pass them. I thought of documenting them here so that I know where to look when I need to 🙂

Parameter Placeholder

First, we need to create a place holder for the parameter that needs to be replaced inside the Pig script. Let’s say you have the following line in your Pig script where you are loading an input file.

INPUT = LOAD '/data/input/20130326'

In the above statement, if you want to replace date part dynamically, then have to create a placeholder for it.

INPUT = LOAD '/data/input/$date'

Individual Parameters

To pass individual parameters to the Pig script we can use the -param option while invoking the Pig script. So the syntax would be

pig -param date=20130326 -f myfile.pig

If you want to pass two parameters then you can add one more -param option.

pig -param date=20130326 -param date2=20130426 -f myfile.pig

Param File

If there are lot of parameters that needs to be passed, or if we needed a more flexible way to do it, then we can place all of them in a single file and pass the file name using the -param_file option.

The param file uses the simple ini file format where every line contains the param name and the value. We can specify comments using the # character.

date=20130326
date2=20130426

We can pass the param file using the following syntax

pig -param_file=myfile.ini -f myfile.pig

Default Statement

We can also assign a default value to a parameter inside the Pig script using the default statement like below

%default date '20130326'

Processing Order

One good thing about parameter substitution in Pig is that you can pass in value for the same parameter using multiple options simultaneously. Pig will pick them up in the following order.

  • The default statement takes the lowest precedence.
  • The values passed using -param_file takes the next precedence.
    • If there are multiple entries for the same param is present in a file, then the one which comes later takes more precedence.
    • If there are multiple param files, then the files that are specified later will take more precedence.
  • The values that are passed using the -param option takes the next precedence.
    • If multiple values are specified for the same param, then the ones which are specified later takes more precedence.

Debugging

Sometimes, the precedence might be little confusing, especially if you have multiple files and multiple params. Pig also provides a -debug option to debug this kind of scenario’s. If you invoke Pig with this option, then it will generate a file with extension .substitued in the current directory with the place holders replaced with the correct values.

What I use?

I follow this convention while passing params in Pig and it has worked nicely for me so far.

I specify a default value using the default statement and then pass actual values using the -param_file option. If I am in a hurry and just want to test something locally, then I use -param option, but generally I try to put them in a separate ini file so that I can check-in the options as well.

Posted in Hadoop/Pig | Tagged , , | 1 Comment

Remove duplicate lines based on a field

Recently while working on formatting some data files for further processing, I had to remove duplicate lines from the file based on a particular field. After trying out cut and grep commands, I was finally able to solve it with a very concise awk command/script.

The command was so concise but still was packed with so much information and it helped me to learn more about the awk scripting language. I thought of writing about it here so that it is useful for others and also I know where to search for it, when I needed it 🙂

Feel free to use it in whatever way you want, if it solves your problem as well.

Input and output data

Let me first explain the input data I had and the output that I was expecting.

Consider a file which has the following lines. Each line has four fields.

Now assume that we want to remove duplicate lines by comparing only the second field. We want the output to look like this.

Command

Get ready for the surprise. The actual command is just this.

Explanation

awk script execution and printing

awk script is executed for each line and if the result is true then the line is printed. If the result is false then the line is not printed.

Associative arrays

The awk language supports associate arrays, similar to the ones found in PHP. The script x[$2]++ fills up an associate array. The key used here is $2 which refers to the second field and x is the variable name. You can use any name for it.

The array is populated for every line. This is how the array would look like after each line.

Conditional evaluation

The ! operator results in a boolean evaluation which determines whether a particular line should be passed on to the output (printed) or not.

When the field is not present in the array, then it results in a zero value which is false. The ! (not) operator evaluates it to non-zero, which results in a true value and the line is passed on to the output (printed). When a duplicate is found, the array returns a non-zero count, which is true, but the ! converts it to false and that line is not passed on to the output.

The expanded version of the above command would be

But what is the fun in using the expanded version 😉

Field separator

In the input file that I had, the fields were separated by whitespace, so I didn’t have to specify the field separators. But if you are using a non-whitespace field separators, then you can specify it by adding FS="," to the above command.

This one-liner actually thought me that awk supports a full programming language that can be used to create scripts and also increased my understanding of the way awk command works. Hopefully this teaches something for you as well 🙂

I know that this is already a concise version, but if you think that this can be improved, then do let me know.

Posted in Unix/Server Stuff | Tagged , , | 5 Comments

Parse Apache error log and list down all missing images

Recently I had to parse Apache error log and find out all images that are missing.

After referring to a couple of man pages, I came up with this one liner. I am sure I will need it again, so thought of noting it here so that I know where to look when I need it again 😉

Feel free to use it in whatever way you want, if it solves your problem as well.

Assumption

This assumes that each line in your apache error log looks like this.

Explanation

Filter all 404 lines from the log file

The first step is to filter all lines that that contain the “File does not exist” text. This is done by using sed.

  • By default, sed prints out all lines. This is prevented by the -n.
  • The second option is the regular expression followed by the p flag. This option prints out all lines which match the text.
  • The third option is the name of the error log file.

Extract the last column of matching lines

The next step is to retrieve the file name from the matching lines. This is done by using awk.

  • By default awk uses space as the delimiter and splits the lines into different columns. If you look at each line, we want the last column.
  • NF is a special variable which points to the last column.
  • print $NF prints the last column

Filter only images

The next step is to filter out only the images. This is done again by using sed.

  • I use -n again to prevent sed from printing all lines.
  • The -r is added, so that we can use extended regular expression
  • The regular expression (jpg|jpeg|png|gif)$ filters out all images and p at the end prints out only lines that match

Sort and find uniques

The sort and uniq commands sort the list and find the unique lines.

Write to a file

The final output is written to a file by using the redirection > operator. If you want to append to a file then we may have to use >> operator.

More to come

It is really amazing like how you can combine these tools to do amazing things. I am planning to document other one liners which I end up creating to solve my problems. So stay tuned 🙂

Also if you think this can be improved, then do let me know as well.

Posted in Unix/Server Stuff | Tagged , , , | 5 Comments

Handling FTP usernames with @ in them

During my recent FTP adventures, I also found that some shared hosting sites give you an FTP username with the ‘@’ symbol in them. It is fine as long as you are going to use a GUI client to connect to FTP. But if you try using the commandline or Finder in Mac, you will have issues since the ‘@’ symbol is also used to separate the username from the host.

After some research I found that the ‘@’ symbol in the username can be replaced with ‘+’ while specifying it in the command line. I tested it with both wput and the Finder in Mac and it worked perfectly in both.

So remember, the next time you try to connect to FTP server from command line and you have a ‘@’ symbol in the username, then replace it with the ‘+’ symbol. Happy FTP’ing 😉

Posted in Unix/Server Stuff | Tagged , , | 4 Comments