Recently I had to parse Apache error log and find out all images that are missing.
After referring to a couple of man
pages, I came up with this one liner. I am sure I will need it again, so thought of noting it here so that I know where to look when I need it again 😉
Feel free to use it in whatever way you want, if it solves your problem as well.
Assumption
This assumes that each line in your apache error log looks like this.
Explanation
Filter all 404 lines from the log file
The first step is to filter all lines that that contain the “File does not exist” text. This is done by using sed
.
- By default,
sed
prints out all lines. This is prevented by the-n
. - The second option is the regular expression followed by the
p
flag. This option prints out all lines which match the text. - The third option is the name of the error log file.
Extract the last column of matching lines
The next step is to retrieve the file name from the matching lines. This is done by using awk
.
- By default
awk
uses space as the delimiter and splits the lines into different columns. If you look at each line, we want the last column. NF
is a special variable which points to the last column.print $NF
prints the last column
Filter only images
The next step is to filter out only the images. This is done again by using sed
.
- I use
-n
again to preventsed
from printing all lines. - The
-r
is added, so that we can use extended regular expression - The regular expression
(jpg|jpeg|png|gif)$
filters out all images andp
at the end prints out only lines that match
Sort and find uniques
The sort
and uniq
commands sort the list and find the unique lines.
Write to a file
The final output is written to a file by using the redirection >
operator. If you want to append to a file then we may have to use >>
operator.
More to come
It is really amazing like how you can combine these tools to do amazing things. I am planning to document other one liners which I end up creating to solve my problems. So stay tuned 🙂
Also if you think this can be improved, then do let me know as well.
Any particular reason to not use grep here, instead of sed? From the explanation, grep would also do the same thing. In fact, with little modification to the regex, it can be done with a single grep call (or maybe a single sed).
The main reason is that I know sed more than grep, apart from that there is no particular reason.
I’m trying to use that line grabbing results for a specific period of time:
[code]
sed -n ‘/01\/Jan\/2014/,/30\/Jan\/2014/ p’|sed -n ‘/File does not exist/p’ /var/log/apache2/error.log|awk ‘{ print $NF }’|sed -nr ‘/(jpg|jpeg|png|gif)$/p’ | sort| uniq > images-missing.txt
[/code]
It does what is intended, but the script has to be interrupted by CTRL+C, otherwise it goes forever.