The AWK Way
Today I was given the task of converting few hundred files (743 to be exact) into CSV format. The filename is prefixed with hostname with a fix suffix and the content contains all the local user names. The task is to put them in rows with hostname in the 1st column and usernames in the 2nd column onwards. One more requirement is to exclude a few users in the output.
My initial solution is very much unix shell script-based. Although this is an one-off 'throw-away' solution, it is pretty inefficient because there is a lot of process creation within a for loop. It took 1 min 39.453 sec.
After some thoughts, I reckoned it is possible to do it efficiently in just AWK. With the help of some of the built-in variables like FILENAME / NR / FNR, we can process all the input files within a single AWK code. The below code works in Cygwin. The runtime for the AWK code is 2.797 sec, that's 35 times faster !
$ ls *txt host1_root.txt host2_root.txt host3_root.txt host4_root.txt $ paste *txt usera usere userm userx userb userx userx userw userc userf usern usery userd userg usero userz userdx usery userp userdy userh userx userz userq useri userqx userj userr userk userz userl users userx usert usery $ cat a.awk #! /usr/bin/awk -f BEGIN { suffix="_root.txt" len=length(suffix) } # # print CR if first line in input file except first file FNR==1 && NR>1 { printf("\n") } # # print hostname FNR==1 { host=substr(FILENAME, 0, length(FILENAME)-len) printf("%s", host) } # # print users, but exclude certain users $0 !~ /^(userx|usery|userz)$/ { printf(",%s", $0) } $ ./a.awk *.txt host1,usera,userb,userc,userd,userdx,userdy host2,usere,userf,userg,userh,useri,userj,userk,userl host3,userm,usern,usero,userp,userq,userqx,userr,users,usert host4,userw
Labels: awk, Cygwin, shell script