Saturday, May 23, 2009

Highlight Those Files With Space, Backslash or Non-Printable Characters

In UNIX, if your filename contains space, blackslash or non-printable characters, you may have issue with applications that are not space/backslash/non-printable aware. Recently I realised that Netbackup will fail to backup files with filename ends with space. Space is not a non-printable character and it is pretty hard to identify them when you simply do a ls listing.

ls has a -b flag that is able to print those non-printable characters to be in the octal \ddd notation. It is possible to recursively list subdirectories using ls -R -b to find out file names with non-printable characters with \ddd octal string. Together with the -1 (minus one) option, we can print one entry per line of output. The output of ls -1Rb can be piped to a regular expression grep to single out those problematic filenames.
ls -1Rb | egrep '\\[0-7][0-7][0-7]|[\\ ]' will grab filenames with \ddd octal, blackslash or space.

You can even give those special characters some colour which I blogged about it before. Here is the script that will highlight these characters. ^[ means "Escape" and you need to type Ctrl-V followed by Esc to get that.

#! /bin/sh


ls -1Rb ${1:-.} | \
nawk '
/:$/ {
        sub(":","")
        d=$0
        next
}
$0 != "" {
        printf("%s/%s\n",d,$0)
}' | \
egrep '\\[0-7][0-7][0-7]|[\\ ]' | \
sed '
# non-printable character in octal \ddd
s/\(\\[0-7][0-7][0-7]\)/^[[31m\1^[[0m/g

# space
s/\([ ]\)/^[[42m\1^[[0m/g

# blackslash but not \ddd in octal
s/\(\\\)\([^0-7][^0-7][^0-7]\)/^[[34m\1^[[0m\2/g
'


Labels:

1 Comments:

Blogger Raymond Tay said...

I love this tip!

9:29 PM  

Post a Comment

<< Home