Friday, June 20, 2008

New Trick to Parse Field Separated File

After so many years of using UNIX, I am still learning new tricks in programming shell script. Today, I am going to show you a trick which will save you a lot of repetitive coding in extracting fields from a field-separated file such as /etc/passwd.

I used to do it this way:

$ while read line
do
    user=`echo $line | cut -d: -f1`
    uid=`echo $line | cut -d: -f3`
    gid=`echo $line | cut -d: -f4`
    echo "User=$user ($uid:$gid)"
done < /etc/passwd

User=root (0:0)
User=daemon (1:1)
User=bin (2:2)
User=sys (3:3)
User=adm (4:4)
User=lp (71:8)
User=uucp (5:5)
User=nuucp (9:9)
User=dladm (15:3)
User=smmsp (25:25)
User=listen (37:4)
User=gdm (50:50)
User=webservd (80:80)
User=postgres (90:90)
User=nobody (60001:60001)
User=noaccess (60002:60002)
User=nobody4 (65534:65534)
User=chihung (100:1)

New way to do thing:

$ while IFS=: read user x uid gid dummy
do
    echo "User=$user ($uid:$gid)"
done < /etc/passwd
 
User=root (0:0)
User=daemon (1:1)
User=bin (2:2)
User=sys (3:3)
User=adm (4:4)
User=lp (71:8)
User=uucp (5:5)
User=nuucp (9:9)
User=dladm (15:3)
User=smmsp (25:25)
User=listen (37:4)
User=gdm (50:50)
User=webservd (80:80)
User=postgres (90:90)
User=nobody (60001:60001)
User=noaccess (60002:60002)
User=nobody4 (65534:65534)
User=chihung (100:1)

The above trick is to set the IFS environment variable as colon (IFS=:) for "read" command. Any name=value pair set before a command will be used as environment variable within the command and there will not have any side effect to your existing shell. Below shows how you can set the environment variables to command "env". Also the man page of sh (man sh in Solaris) describing the usage of IFS

$ /bin/sh
$ MY_a=a MY_b=b MY_c=c env | grep MY
MY_a=a
MY_b=b
MY_c=c

$ env | grep MY

$ man sh
 ....
     read name ...

         One line is read from the standard input and, using  the
         internal  field  separator, IFS (normally space or tab),
         to delimit word boundaries, the first word  is  assigned
         to  the  first name, the second word to the second name,
         and so forth, with leftover words assigned to  the  last
         name.  Lines can be continued using \newline. Characters
         other than newline can be quoted by preceding them  with
         a  backslash. These backslashes are removed before words
         are assigned to names, and no interpretation is done  on
         the  character  that  follows  the backslash. The return
         code is 0, unless an EOF is encountered.

Now I can write cleaner and efficient script to parse any field-separated type of file.

Labels:

0 Comments:

Post a Comment

<< Home