Saturday, October 04, 2008

Regular Expression

My colleague was asking me how to list out the settings in the default /etc/squid/squid.conf. This file has over 4325 lines with 4030 comments, 260 blank lines and only 35 settings (in RedHat). You can see the configuration file is heavily commented and it is extremely hard to locate the settings.

Regular expression is to the rescue. Here I am going to walk you through how we can come to the final solution

  1. grep -v # /etc/squid/squid.conf
    This will give you those lines without (-v) the occurrence of '#', but this will miss lines such as "acl Safe_ports port 80 # http"
  2. egrep -v '^#' /etc/squid/squid.conf
    This "Extended Grep" is able to understand regular expression in the pattern. ^ is an anchor and it represents the start of the line. '^#' means matching lines start with #. What if my setting starts with a blank space and follows by the comment
  3. egrep -v '^[ \t]*#' /etc/squid/squid.conf
    Anything inside the square bracket matches a single character that is contained within the brackets. In our case, the character set is a space and a tab. Since we cannot represent a tab as a literal character, we have to represent it as a escape sequence "\t". [ \t]* matches the preceding element (blank space) zero or more times. Although we can get rid of the comment, we still have a lot blank lines to deal with.
  4. egrep -v '^[ \t]*#' /etc/squid/squid.conf | egrep -v '^$'
    How about taking advantage of a pipe to run through the previous step's output and apply another 'egrep' to get rid of the blank line. ^$ are anchors, start of line and end of line, i.e. no character in the line. Ok, that's what we want, but can we do with just a single egrep. Of course we can.
  5. egrep -v '(^[ \t]*#|^$)' /etc/squid/squid.conf
    With the ability of grouping "()" and choice "|", we are telling egrep that match either comment or blank line. What if the blank lines are not really blank, but contains spaces or tabs
  6. egrep -v '(^[ \t]*#|^[ \t]*$)' /etc/squid/squid.conf
    This will do the job!

If your command understands POSIX compliant regular expression, you can write it in a more compact syntax:
egrep -v '(^\s*#|^\s*$)' /etc/squid/squid.conf
\s is equivalent to [ \t\r\n\v\f], this character set is called whitespace characters (space, tab, carriage return, newline, vertical tab, form feed)

Regular expression is definitely your life saver if you need to manlipulate data. Do you know that lots of other commands have regular expression support built-in. Run this to find out what commands(1) has this support:

cd /usr/share/man/man1
for i in *gz
do
    zgrep -li regexp $i
done

BTW, sed (stream editor) can do the same job but without applying an inverted match (-v):
sed -e '/^\s*#/d;/^\s*$/d' /etc/squid/squid.conf

Labels:

0 Comments:

Post a Comment

<< Home