Friday, August 03, 2007

Sort Files Appended with "DD-Mth-YYYY"

If you think making a copy of whatever file just by appending the date, you may want to use the date format like yyyy-mm-dd (2007-07-07) rather than dd-Mth-yyyy (7-Jul-2007). Imagine you have tonnes of these files and you want to sort them based on date in the filename, but not in alphabetically order. You are better off appending the date in "yyyy-mm-dd" format than "dd-Mth-yyyy".

OK, how can I sort these "dd-Mth-yyyy" files based on the date, but not alphabetical order.

Let's create lots of files with this notation. You can do this in Bash shell

$ touch index.php.{1,23,17,29,12,4,6,8}-{jul,aug,jun,apr,may,oct}-2007

$ ls
index.php.1-apr-2007   index.php.17-may-2007  index.php.4-jul-2007
index.php.1-aug-2007   index.php.17-oct-2007  index.php.4-jun-2007
index.php.1-jul-2007   index.php.23-apr-2007  index.php.4-may-2007
index.php.1-jun-2007   index.php.23-aug-2007  index.php.4-oct-2007
index.php.1-may-2007   index.php.23-jul-2007  index.php.6-apr-2007
index.php.1-oct-2007   index.php.23-jun-2007  index.php.6-aug-2007
index.php.12-apr-2007  index.php.23-may-2007  index.php.6-jul-2007
index.php.12-aug-2007  index.php.23-oct-2007  index.php.6-jun-2007
index.php.12-jul-2007  index.php.29-apr-2007  index.php.6-may-2007
index.php.12-jun-2007  index.php.29-aug-2007  index.php.6-oct-2007
index.php.12-may-2007  index.php.29-jul-2007  index.php.8-apr-2007
index.php.12-oct-2007  index.php.29-jun-2007  index.php.8-aug-2007
index.php.17-apr-2007  index.php.29-may-2007  index.php.8-jul-2007
index.php.17-aug-2007  index.php.29-oct-2007  index.php.8-jun-2007
index.php.17-jul-2007  index.php.4-apr-2007   index.php.8-may-2007
index.php.17-jun-2007  index.php.4-aug-2007   index.php.8-oct-2007

$ echo $SHELL
/bin/bash

Let's use the normal sorting tool to see what do we get

$ ls -1 | sort
index.php.1-apr-2007
index.php.1-aug-2007
index.php.1-jul-2007
index.php.1-jun-2007
index.php.1-may-2007
index.php.1-oct-2007
index.php.12-apr-2007
index.php.12-aug-2007
index.php.12-jul-2007
index.php.12-jun-2007
index.php.12-may-2007
index.php.12-oct-2007
index.php.17-apr-2007
index.php.17-aug-2007
index.php.17-jul-2007
index.php.17-jun-2007
index.php.17-may-2007
index.php.17-oct-2007
index.php.23-apr-2007
index.php.23-aug-2007
index.php.23-jul-2007
index.php.23-jun-2007
index.php.23-may-2007
index.php.23-oct-2007
index.php.29-apr-2007
index.php.29-aug-2007
index.php.29-jul-2007
index.php.29-jun-2007
index.php.29-may-2007
index.php.29-oct-2007
index.php.4-apr-2007
index.php.4-aug-2007
index.php.4-jul-2007
index.php.4-jun-2007
index.php.4-may-2007
index.php.4-oct-2007
index.php.6-apr-2007
index.php.6-aug-2007
index.php.6-jul-2007
index.php.6-jun-2007
index.php.6-may-2007
index.php.6-oct-2007
index.php.8-apr-2007
index.php.8-aug-2007
index.php.8-jul-2007
index.php.8-jun-2007
index.php.8-may-2007
index.php.8-oct-2007

Hey, that's not I want. I know. Below script will do the sort properly. What it does is to create a hash array in AWK with "yyyy-mm-dd" as the index and the original file name as the value. At the end of the AWK, print out both the index as well as the value. Sort the result based on the index and pipe the sorted result to AWK to print the second field which is the original file name. The result will be file name sorted based on the date.

$ cat sort2.sh
#! /bin/sh


if [ $# -ne 1 ]; then
      echo "Usage: $0 "
      exit 1
fi
prefix=$1


PATH=/usr/bin:/bin
export PATH


ls -1 | awk '
BEGIN {
      map["jan"]="01"; map["feb"]="02"; map["mar"]="03";
      map["apr"]="04"; map["may"]="05"; map["jun"]="06";
      map["jul"]="07"; map["aug"]="08"; map["sep"]="09";
      map["oct"]="10"; map["nov"]="11"; map["dec"]="12";
}
/^'$prefix'/ {
      filename=$0
      sub("^'$prefix'","",filename)
      split(filename,a,"-")
      mth=tolower(a[2])
      ind=sprintf("%s-%s-%02d",a[3],map[mth],a[1])
      sort[ind]=$0
}
END {
      for (i in sort) {
              print i, sort[i]
      }
}' | sort -k 1,1 | awk '{print $2}'

$ ./sort.sh index.php.
index.php.1-apr-2007
index.php.4-apr-2007
index.php.6-apr-2007
index.php.8-apr-2007
index.php.12-apr-2007
index.php.17-apr-2007
index.php.23-apr-2007
index.php.29-apr-2007
index.php.1-aug-2007
index.php.4-aug-2007
index.php.6-aug-2007
index.php.8-aug-2007
index.php.12-aug-2007
index.php.17-aug-2007
index.php.23-aug-2007
index.php.29-aug-2007
index.php.1-jul-2007
index.php.4-jul-2007
index.php.6-jul-2007
index.php.8-jul-2007
index.php.12-jul-2007
index.php.17-jul-2007
index.php.23-jul-2007
index.php.29-jul-2007
index.php.1-jun-2007
index.php.4-jun-2007
index.php.6-jun-2007
index.php.8-jun-2007
index.php.12-jun-2007
index.php.17-jun-2007
index.php.23-jun-2007
index.php.29-jun-2007
index.php.1-may-2007
index.php.4-may-2007
index.php.6-may-2007
index.php.8-may-2007
index.php.12-may-2007
index.php.17-may-2007
index.php.23-may-2007
index.php.29-may-2007
index.php.1-oct-2007
index.php.4-oct-2007
index.php.6-oct-2007
index.php.8-oct-2007
index.php.12-oct-2007
index.php.17-oct-2007
index.php.23-oct-2007
index.php.29-oct-2007

BTW, all the above were tested on my Cygwin. In case you are not familiar with AWK, do not use "index" as the variable name, AWK will complain without telling you that "index" is actually a function with AWK. Initially I used "index", now is being replaced with "ind"

$ ./sort.sh index.php.
awk: cmd. line:18:      index=sprintf("%s-%s-%02d",a[3],mth,a[1])
awk: cmd. line:18:           ^ syntax error
awk: cmd. line:19:      sort[index]=$0
awk: cmd. line:19:                ^ syntax error
awk: cmd. line:19: fatal: invalid subscript expression

Labels: ,

0 Comments:

Post a Comment

<< Home