Parse Apache error log and list down all missing images

Recently I had to parse Apache error log and find out all images that are missing.

After referring to a couple of man pages, I came up with this one liner. I am sure I will need it again, so thought of noting it here so that I know where to look when I need it again 😉

Feel free to use it in whatever way you want, if it solves your problem as well.

Assumption

This assumes that each line in your apache error log looks like this.

Explanation

Filter all 404 lines from the log file

The first step is to filter all lines that that contain the “File does not exist” text. This is done by using sed.

  • By default, sed prints out all lines. This is prevented by the -n.
  • The second option is the regular expression followed by the p flag. This option prints out all lines which match the text.
  • The third option is the name of the error log file.

Extract the last column of matching lines

The next step is to retrieve the file name from the matching lines. This is done by using awk.

  • By default awk uses space as the delimiter and splits the lines into different columns. If you look at each line, we want the last column.
  • NF is a special variable which points to the last column.
  • print $NF prints the last column

Filter only images

The next step is to filter out only the images. This is done again by using sed.

  • I use -n again to prevent sed from printing all lines.
  • The -r is added, so that we can use extended regular expression
  • The regular expression (jpg|jpeg|png|gif)$ filters out all images and p at the end prints out only lines that match

Sort and find uniques

The sort and uniq commands sort the list and find the unique lines.

Write to a file

The final output is written to a file by using the redirection > operator. If you want to append to a file then we may have to use >> operator.

More to come

It is really amazing like how you can combine these tools to do amazing things. I am planning to document other one liners which I end up creating to solve my problems. So stay tuned 🙂

Also if you think this can be improved, then do let me know as well.

Related posts

Tags: , , ,

3 Comments so far

Follow up comments through RSS Feed | Post a comment

  • Any particular reason to not use grep here, instead of sed? From the explanation, grep would also do the same thing. In fact, with little modification to the regex, it can be done with a single grep call (or maybe a single sed).

  • I’m trying to use that line grabbing results for a specific period of time:
    [code]
    sed -n ‘/01\/Jan\/2014/,/30\/Jan\/2014/ p’|sed -n ‘/File does not exist/p’ /var/log/apache2/error.log|awk ‘{ print $NF }’|sed -nr ‘/(jpg|jpeg|png|gif)$/p’ | sort| uniq > images-missing.txt
    [/code]

    It does what is intended, but the script has to be interrupted by CTRL+C, otherwise it goes forever.

1 Trackbacks/Pingbacks so far

Leave a Reply

Your email address will not be published. Required fields are marked *