Category Archives: Unix/Server Stuff

All Unix and server related stuff.

Count the number of empty lines in a file using grep

Recently I had a need to count the number of empty lines in a text file. After some digging into the man pages of grep I was able to come up with a one liner which was able to do it.

Following my tradition of documenting one liners, I am going to document this one as well :)

Assumption

By empty line, I mean any line which either has no characters or has only whitespace (space, tab) characters.

Command

For the impatient in you, here is the actual command.

Explanation

  • -P '\S' – This selects all lines that have a non whitespace character
  • -c – Print the count of matching lines
  • -v – Select only the non-matching lines

So, we are first matching all lines that have a non whitespace character and then use -v option to ignore them and then -c option to print the count instead of the actual line.

If we wanted the count of all non-empty lines, then we just have to remove the -v option from the above command.

Hope this is helpful. Happy Grep’ing ;)

Posted in Unix/Server Stuff | Tagged , , | 6 Comments

Remove duplicate lines based on a field

Recently while working on formatting some data files for further processing, I had to remove duplicate lines from the file based on a particular field. After trying out cut and grep commands, I was finally able to solve it with a very concise awk command/script.

The command was so concise but still was packed with so much information and it helped me to learn more about the awk scripting language. I thought of writing about it here so that it is useful for others and also I know where to search for it, when I needed it :)

Feel free to use it in whatever way you want, if it solves your problem as well.

Input and output data

Let me first explain the input data I had and the output that I was expecting.

Consider a file which has the following lines. Each line has four fields.

Now assume that we want to remove duplicate lines by comparing only the second field. We want the output to look like this.

Command

Get ready for the surprise. The actual command is just this.

Explanation

awk script execution and printing

awk script is executed for each line and if the result is true then the line is printed. If the result is false then the line is not printed.

Associative arrays

The awk language supports associate arrays, similar to the ones found in PHP. The script x[$2]++ fills up an associate array. The key used here is $2 which refers to the second field and x is the variable name. You can use any name for it.

The array is populated for every line. This is how the array would look like after each line.

Conditional evaluation

The ! operator results in a boolean evaluation which determines whether a particular line should be passed on to the output (printed) or not.

When the field is not present in the array, then it results in a zero value which is false. The ! (not) operator evaluates it to non-zero, which results in a true value and the line is passed on to the output (printed). When a duplicate is found, the array returns a non-zero count, which is true, but the ! converts it to false and that line is not passed on to the output.

The expanded version of the above command would be

But what is the fun in using the expanded version ;)

Field separator

In the input file that I had, the fields were separated by whitespace, so I didn’t have to specify the field separators. But if you are using a non-whitespace field separators, then you can specify it by adding FS="," to the above command.

This one-liner actually thought me that awk supports a full programming language that can be used to create scripts and also increased my understanding of the way awk command works. Hopefully this teaches something for you as well :)

I know that this is already a concise version, but if you think that this can be improved, then do let me know.

Posted in Unix/Server Stuff | Tagged , , | 3 Comments

Parse Apache error log and list down all missing images

Recently I had to parse Apache error log and find out all images that are missing.

After referring to a couple of man pages, I came up with this one liner. I am sure I will need it again, so thought of noting it here so that I know where to look when I need it again ;)

Feel free to use it in whatever way you want, if it solves your problem as well.

Assumption

This assumes that each line in your apache error log looks like this.

Explanation

Filter all 404 lines from the log file

The first step is to filter all lines that that contain the “File does not exist” text. This is done by using sed.

  • By default, sed prints out all lines. This is prevented by the -n.
  • The second option is the regular expression followed by the p flag. This option prints out all lines which match the text.
  • The third option is the name of the error log file.

Extract the last column of matching lines

The next step is to retrieve the file name from the matching lines. This is done by using awk.

  • By default awk uses space as the delimiter and splits the lines into different columns. If you look at each line, we want the last column.
  • NF is a special variable which points to the last column.
  • print $NF prints the last column

Filter only images

The next step is to filter out only the images. This is done again by using sed.

  • I use -n again to prevent sed from printing all lines.
  • The -r is added, so that we can use extended regular expression
  • The regular expression (jpg|jpeg|png|gif)$ filters out all images and p at the end prints out only lines that match

Sort and find uniques

The sort and uniq commands sort the list and find the unique lines.

Write to a file

The final output is written to a file by using the redirection > operator. If you want to append to a file then we may have to use >> operator.

More to come

It is really amazing like how you can combine these tools to do amazing things. I am planning to document other one liners which I end up creating to solve my problems. So stay tuned :)

Also if you think this can be improved, then do let me know as well.

Posted in Unix/Server Stuff | Tagged , , , | 4 Comments

Contributing to project hosted in Github

As most of you know, I host most of my projects which I have released as open source in Github. I am open for collaboration and generally accept most of the pull requests that people send me.

Recently I noted that not all people who want to contribute are proficient with git or Github. So here is a small guide to help people who are interested in contributing to projects hosted at Github.

Before we proceed just keep in mind, that this is not the only way to do it. But if you are just starting out using git or Github, then this is a good starting point.

Fork the project in Github

The first step is to fork the project at Github. Go to the project that you want to contribute to and then click the fork button near the top right corner.
github-fork

Clone the forked project to your machine

When you fork the project, Github creates the forked project in your account. Once you forked the project, you need to clone it to your machine.

You can use the following command to do it.

git clone git@github.com:sudar/wp-irc.git

Replace it with your actual username and project name

Create a new branch

The next step is to create a new branch. You can use the following command to do that.

git checkout -b branch_name

Commit your changes

After you have made the changes, you have to commit them to the new branch

git add file_name

or

git add -p file_name

Then do

git commit -m "Your commit message"

Push the change to Github

Next you have to push the changes to Github. You can do it by using the following command.

git push -u origin branch_name

Send a pull request

Now go to Github and send a pull request, by clicking the pull request button in Github. The owner of the repo will be notified and he may choose to accept or reject the request.

Keeping your repo upto date

Once the pull request is accepted, you can merge the changes back to your repo by using the following commands.

git remote add upstream git://github.com/sudar/wp-irc.git

git checkout master

git pull --rebase upstream master

git push origin master

Posted in Unix/Server Stuff | Tagged , , | 2 Comments

Handling FTP usernames with @ in them

During my recent FTP adventures, I also found that some shared hosting sites give you an FTP username with the ‘@’ symbol in them. It is fine as long as you are going to use a GUI client to connect to FTP. But if you try using the commandline or Finder in Mac, you will have issues since the ‘@’ symbol is also used to separate the username from the host.

After some research I found that the ‘@’ symbol in the username can be replaced with ‘+’ while specifying it in the command line. I tested it with both wput and the Finder in Mac and it worked perfectly in both.

So remember, the next time you try to connect to FTP server from command line and you have a ‘@’ symbol in the username, then replace it with the ‘+’ symbol. Happy FTP’ing ;)

Posted in Unix/Server Stuff | Tagged , , | 4 Comments

Excluding .svn folders while transferring entire folder by FTP

Recently I had to transfer an entire folder, with lot of sub-folders to an FTP server. I know that there are lot of FTP GUI tools available that can do it, but I wanted to do it in command line so that I can script it.

I searched for the solution and came across an excellent tool called wput, which does exactly that very easily. It is very similar to wget, but instead of downloading the content, it allows you to upload it.

I installed it using apt-get and was trying to upload the entire directory. It was at this point I realized that I want to exclude all the .svn folders.

I again started searching for an answer. I even posted about it in stackoverflow, but couldn’t find a solution. I then went over the man page of wput and hidden inside was this gem, which allowed you decide on which files to include/exclude from the directory.

I thought of posting it here, so that it is useful for others and also I know where to find it when I need it next time.

So all you need is just one line. If you have not installed wput before, the install it using one of the following commands based on your operating system.

Posted in Unix/Server Stuff | Tagged , , , | 3 Comments

Installing PHP 5.3.x in Ubuntu through apt-get or aptitude

Recently, I wanted to play around with some stuff which is available only in PHP 5.3.x (more about it later in a separate blog post) and so was looking for a way to install it on my Ubuntu server, where this blog is running.

After poking around a bit, I found that Karmic Ubuntu hasn’t upgraded to PHP 5.3.x yet and the only way to do is to compile from PHP source. Even though I am pretty comfortable doing it, I didn’t wanted to do it, because it is very difficult to upgrade at a later point in time.

I was continuing my research and then found that it is in fact possible to install PHP 5.3.x though apt-get or aptitude. I thought of documenting it here, so that it would be useful for others who want to do the same thing.

Adding dotdeb to the source list

First you should add dotdeb repository to your apt-get source list. Add the following two lines to your /etc/apt-get/sources.list file

sudo vim /etc./apt-get/sources.list

Adding dotdeb keys to keyring

Dotdeb packages are GPGsigned. Issue the following commands to add the keys to key-ring

Install PHP5 packages

Then issue the following command to retrieve the updated package list. I am using aptitude here; you can use apt-get as well.

sudo aptitude update

sudo aptitude upgrade

And then you can install PHP5 packages (and modules) using the normal install command.

sudo aptitude install php5 libapache2-mod-php5

Installing php5-dev package

The above method will install all php5-* packages, but php5-dev has some dependency issues with libtool packages. In order to solve that you have to manually install libtool v1.5.26. To do that use the following commands.

Now it’s time to enjoy the new features that are available in PHP5.3 :)

Posted in Unix/Server Stuff, Web Programming | Tagged , , | 17 Comments

Moved to Linode for hosting

Long time readers of my blog will know that I moved to Slicehost around 15 months back. I was pretty happy with their service, but now I have left them and have moved all my sites to Linodelinode_logo_gray

Both Slicehost and Linode are good but when compared with Linode, Slicehost was slightly costlier. I realized it after reading the comparison done by David. I bought an account in Linode for testing and was quite happy with it. But I was lazy to move all my sites, since it involved some work.

The recent announcement by Linode to give 33% additional disk space wooed me enough and I gave in. :)

Now I am getting some additional features with Linode but for less cost. ;)

Posted in Unix/Server Stuff | Tagged , , | 4 Comments

Changing the default config editor in Ubuntu

If you are following the articles at Slicehost, then you may notice that PickledOnion uses nano as the default editor. I somehow like vi more than nano (not ready for a debate ;-) ) and was looking for a way to make vi the default config editor. After some googling, I found how to do it.

I am writing it down here so that all I have to remember is that I just need to search my blog if I need to do it again in future.

Okay the command you have to use is (I am assuming that you are not logged in as root, which is the recommended approach)

sudo update-alternatives --config editor

And then press the number corresponding to the editor which you want to use. Below is the screenshot of how it looked in my slice.

Changing default config editor in Ubuntu

Posted in Unix/Server Stuff | Tagged , , , , , | 2 Comments

Rotating Apache log files using Cronolog

I must confess that I am a stats freak. If you are a long time reader of my blog, then you would have known that by now yourself. ;-) This explains the reason why I want to preserve my Apache log files in spite of using a variety of stat services like Google Analytics, WordPress stats, statscounter, performancing metrics (before it was closed).

The default Apache configuration preserves the log files only for the last 10 days, but I wanted to permanently archive this files. After some searches in Google I came across an excellent program called Cronolog. Cronolog is a simple filter program which writes each log entry to a separate log file named after the filename format specified. You can use a variety of parameters like current date, time etc to define the filename template.

First we have to install cronolog, either by using aptitude or by downloading it from its download page. Then you have to change the log file name path in the virtual host file. (In Ubuntu Gusty, the virtual host files are situated in the path /etc/apache2/sites-enabled). I am using the following file format for this blog
# Custom log file locations
LogLevel warn
ErrorLog "|/usr/sbin/cronolog /path/to/logs/%Y/%m/%Y-%m-%d-sudarmuthu.com-error.log"
CustomLog "|/usr/sbin/cronolog /path/to/logs/%Y/%m/%Y-%m-%d-sudarmuthu.com-access.log" combined

which will store my log files in separate folders for each year and for each month, like the below hierarchy
/2007/12/2007-11-01-sudarmuthu.com-access.log
/2007/12/2007-11-02-sudarmuthu.com-access.log
......
/2008/01/2008-01-01-sudarmuthu.com-access.log
/2008/01/2008-01-02-sudarmuthu.com-access.log
......

You can use a variety of modifiers for the filename and I have documented some of them in the below table. You can get more information from its documentation.

Specifier Description
Time fields
%H hour (00..23)
%I hour (01..12)
%p the locale’s AM or PM indicator
%M minute (00..59)
%S second (00..61, which allows for leap seconds)
%X the locale’s time representation (e.g.: “15:12:47″)
%Z time zone (e.g. GMT), or nothing if the time zone cannot be determined
Date fields
%a the locale’s abbreviated weekday name (e.g.: Sun..Sat)
%A the locale’s full weekday name (e.g.: Sunday .. Saturday)
%b the locale’s abbreviated month name (e.g.: Jan .. Dec)
%B the locale’s full month name, (e.g.: January .. December)
%c the locale’s date and time (e.g.: "Sun Dec 15 14:12:47 GMT 1996")
%d day of month (01 .. 31)
%j day of year (001 .. 366)
%m month (01 .. 12)
%U week of the year with Sunday as first day of week (00..53, where week 1 is the week containing the first Sunday of the year)
%W week of the year with Monday as first day of week (00..53, where week 1 is the week containing the first Monday of the year)
%w day of week (0 .. 6, where 0 corresponds to Sunday)
%x locale’s date representation (e.g. today in Britain: “15/12/96″)
%y year without the century (00 .. 99)
%Y year with the century (1970 .. 2038)

Posted in Unix/Server Stuff | Tagged , , , , | 1 Comment