I have been aware of AWK as a "general purpose programming language that is designed for processing text-based data" since college. But until today I never sat down to learn how to use it.

I had a log file that I needed to glean some information from, and turn into Excel spreadsheets with graphs. Here is what the text file looked like:

127:  "Information","web-8","10/06/08","17:40:35","BHSTAFFING","Template RunScheduledTask2 has completed sucessfuly. 1 Messages(s) Sent - Execution Time: 0.24 (min)"
134:  "Information","web-10","10/06/08","17:45:31","BHSTAFFING","Template RunScheduledTask2 has completed sucessfuly. 5 Messages(s) Sent - Execution Time: 0.18 (min)"
137:  "Information","web-7","10/06/08","17:50:27","BHSTAFFING","Template RunScheduledTask2 has completed sucessfuly. 1 Messages(s) Sent - Execution Time: 0.12 (min)"
143:  "Information","web-10","10/06/08","17:55:29","BHSTAFFING","Template RunScheduledTask2 has completed sucessfuly. 4 Messages(s) Sent - Execution Time: 0.14 (min)"
146:  "Information","web-10","10/06/08","18:00:26","BHSTAFFING","Template RunScheduledTask2 has completed sucessfuly. 1 Messages(s) Sent - Execution Time: 0.09 (min)"
...

What I wanted was a comma delimited file with date, time and message count. With some trial and error, and a lot of help from the official AWK manual, I was able to come up with the following command-line script:

gawk "BEGIN { FS=\",\" }; {sub(/^.*\. /, \"\", $6); sub(/ .*/, \"\", $6); print $3 \",\" $4 \",\" $6}" input.txt  > output.txt

Viola! Nice clean CSV data. The code sets the column delimiter to a comma, replaces everything before the space and after the next space in column 6 (using a regex), and then prints out columns 3, 4 and 6 with commas between. This was using gawk, an open-source GNU licensed implementation with a windows port.

10/6/2008,16:10:54,5
10/6/2008,16:15:47,7
10/6/2008,16:20:51,7
10/6/2008,16:25:47,15
10/6/2008,16:30:51,5

Finally, the Excel graph in all it's glory: