The AWK Way
Today I was given the task of converting few hundred files (743 to be exact) into CSV format. The filename is prefixed with hostname with a fix suffix and the content contains all the local user names. The task is to put them in rows with hostname in the 1st column and usernames in the 2nd column onwards. One more requirement is to exclude a few users in the output.
My initial solution is very much unix shell script-based. Although this is an one-off 'throw-away' solution, it is pretty inefficient because there is a lot of process creation within a for loop. It took 1 min 39.453 sec.
After some thoughts, I reckoned it is possible to do it efficiently in just AWK. With the help of some of the built-in variables like FILENAME / NR / FNR, we can process all the input files within a single AWK code. The below code works in Cygwin. The runtime for the AWK code is 2.797 sec, that's 35 times faster !
$ ls *txt
host1_root.txt host2_root.txt host3_root.txt host4_root.txt
$ paste *txt
usera usere userm userx
userb userx userx userw
userc userf usern usery
userd userg usero userz
userdx usery userp
userdy userh userx
userz userq
useri userqx
userj userr
userk userz
userl users
userx usert
usery
$ cat a.awk
#! /usr/bin/awk -f
BEGIN {
suffix="_root.txt"
len=length(suffix)
}
#
# print CR if first line in input file except first file
FNR==1 && NR>1 {
printf("\n")
}
#
# print hostname
FNR==1 {
host=substr(FILENAME, 0, length(FILENAME)-len)
printf("%s", host)
}
#
# print users, but exclude certain users
$0 !~ /^(userx|usery|userz)$/ {
printf(",%s", $0)
}
$ ./a.awk *.txt
host1,usera,userb,userc,userd,userdx,userdy
host2,usere,userf,userg,userh,useri,userj,userk,userl
host3,userm,usern,usero,userp,userq,userqx,userr,users,usert
host4,userw
Labels: awk, Cygwin, shell script


1 Comments:
Sounds like a job for Python!
Post a Comment
<< Home