Recently I had to parse Apache error log and find out all images that are missing.
After referring to a couple of
man pages, I came up with this one liner. I am sure I will need it again, so thought of noting it here so that I know where to look when I need it again
Feel free to use it in whatever way you want, if it solves your problem as well.
This assumes that each line in your apache error log looks like this.
Filter all 404 lines from the log file
The first step is to filter all lines that that contain the “File does not exist” text. This is done by using
- By default,
sedprints out all lines. This is prevented by the
- The second option is the regular expression followed by the
pflag. This option prints out all lines which match the text.
- The third option is the name of the error log file.
Extract the last column of matching lines
The next step is to retrieve the file name from the matching lines. This is done by using
- By default
awkuses space as the delimiter and splits the lines into different columns. If you look at each line, we want the last column.
NFis a special variable which points to the last column.
print $NFprints the last column
Filter only images
The next step is to filter out only the images. This is done again by using
- I use
-nagain to prevent
sedfrom printing all lines.
-ris added, so that we can use extended regular expression
- The regular expression
(jpg|jpeg|png|gif)$filters out all images and
pat the end prints out only lines that match
Sort and find uniques
uniq commands sort the list and find the unique lines.
Write to a file
The final output is written to a file by using the redirection
> operator. If you want to append to a file then we may have to use
More to come
It is really amazing like how you can combine these tools to do amazing things. I am planning to document other one liners which I end up creating to solve my problems. So stay tuned
Also if you think this can be improved, then do let me know as well.