Friday, March 23, 2007

ZFS on 48 Disks without X4500

For the past few months, I have the opportunity to work with a number of SunFire X4500 (a.k.a Thumper) running the lastest Solaris 10 11/06 with raidz2 and spares implemented in ZFS. After the implementation for the customer, I do not have opportunity to 'play' with it again. Even I have the opporunity to work on it, it will be unwise to try out all the cool stuff from Solaris 10 using customer's production server.

So, how to simulate an environment with 48 disks using an old Sun Netra T1 105. My T1 configuration is:

  • Memory: 256 MB
  • Disk: 2x 18GB (all partitions are mirrored)
  • CPU: 1x 440MHz UltraSPARC-IIi
  • Patch: Recommended and Security Patch, Mar 12 2007, especially for these patches
    • 124204 - zfs memory leak for large file
    • 120068 - vulnerability in telnetd

Make 48 disks (files) with mkfile (1M)

# mkdir /zdisk

# cd /zdisk

# for i in c{0,1,2,3,4,5}t{0,1,2,3,4,5,6,7}d0
do
  mkfile 100m $i
done

# ls /zdisk
c0t0d0  c0t5d0  c1t2d0  c1t7d0  c2t4d0  c3t1d0  c3t6d0  c4t3d0  c5t0d0  c5t5d0
c0t1d0  c0t6d0  c1t3d0  c2t0d0  c2t5d0  c3t2d0  c3t7d0  c4t4d0  c5t1d0  c5t6d0
c0t2d0  c0t7d0  c1t4d0  c2t1d0  c2t6d0  c3t3d0  c4t0d0  c4t5d0  c5t2d0  c5t7d0
c0t3d0  c1t0d0  c1t5d0  c2t2d0  c2t7d0  c3t4d0  c4t1d0  c4t6d0  c5t3d0
c0t4d0  c1t1d0  c1t6d0  c2t3d0  c3t0d0  c3t5d0  c4t2d0  c4t7d0  c5t4d0

Create a RAIDZ2 (double parity) with 7 sets of 6 HDDs RAIDZ2. You can see every RAIDZ2 group cuts across all the controllers, thanks to Joyeur blog

# zpool create zpool \
raidz2 /zdisk/c{0,1,2,3,4,5}t0d0 \
raidz2 /zdisk/c{0,1,2,3,4,5}t1d0 \
raidz2 /zdisk/c{0,1,2,3,4,5}t2d0 \
raidz2 /zdisk/c{0,1,2,3,4,5}t3d0 \
raidz2 /zdisk/c{0,1,2,3,4,5}t4d0 \
raidz2 /zdisk/c{0,1,2,3,4,5}t5d0 \
raidz2 /zdisk/c{0,1,2,3,4,5}t6d0 \
spare  /zdisk/c{0,1,2,3,4,5}t7d0

# zpool list
NAME                    SIZE    USED   AVAIL    CAP  HEALTH     ALTROOT
zpool                  3.91G    288K   3.91G     0%  ONLINE     -

# zpool status
  pool: zpool
 state: ONLINE
 scrub: none requested
config:

        NAME               STATE     READ WRITE CKSUM
        zpool              ONLINE       0     0     0
          raidz2           ONLINE       0     0     0
            /zdisk/c0t0d0  ONLINE       0     0     0
            /zdisk/c1t0d0  ONLINE       0     0     0
            /zdisk/c2t0d0  ONLINE       0     0     0
            /zdisk/c3t0d0  ONLINE       0     0     0
            /zdisk/c4t0d0  ONLINE       0     0     0
            /zdisk/c5t0d0  ONLINE       0     0     0
          raidz2           ONLINE       0     0     0
            /zdisk/c0t1d0  ONLINE       0     0     0
            /zdisk/c1t1d0  ONLINE       0     0     0
            /zdisk/c2t1d0  ONLINE       0     0     0
            /zdisk/c3t1d0  ONLINE       0     0     0
            /zdisk/c4t1d0  ONLINE       0     0     0
            /zdisk/c5t1d0  ONLINE       0     0     0
          raidz2           ONLINE       0     0     0
            /zdisk/c0t2d0  ONLINE       0     0     0
            /zdisk/c1t2d0  ONLINE       0     0     0
            /zdisk/c2t2d0  ONLINE       0     0     0
            /zdisk/c3t2d0  ONLINE       0     0     0
            /zdisk/c4t2d0  ONLINE       0     0     0
            /zdisk/c5t2d0  ONLINE       0     0     0
          raidz2           ONLINE       0     0     0
            /zdisk/c0t3d0  ONLINE       0     0     0
            /zdisk/c1t3d0  ONLINE       0     0     0
            /zdisk/c2t3d0  ONLINE       0     0     0
            /zdisk/c3t3d0  ONLINE       0     0     0
            /zdisk/c4t3d0  ONLINE       0     0     0
            /zdisk/c5t3d0  ONLINE       0     0     0
          raidz2           ONLINE       0     0     0
            /zdisk/c0t4d0  ONLINE       0     0     0
            /zdisk/c1t4d0  ONLINE       0     0     0
            /zdisk/c2t4d0  ONLINE       0     0     0
            /zdisk/c3t4d0  ONLINE       0     0     0
            /zdisk/c4t4d0  ONLINE       0     0     0
            /zdisk/c5t4d0  ONLINE       0     0     0
          raidz2           ONLINE       0     0     0
            /zdisk/c0t5d0  ONLINE       0     0     0
            /zdisk/c1t5d0  ONLINE       0     0     0
            /zdisk/c2t5d0  ONLINE       0     0     0
            /zdisk/c3t5d0  ONLINE       0     0     0
            /zdisk/c4t5d0  ONLINE       0     0     0
            /zdisk/c5t5d0  ONLINE       0     0     0
          raidz2           ONLINE       0     0     0
            /zdisk/c0t6d0  ONLINE       0     0     0
            /zdisk/c1t6d0  ONLINE       0     0     0
            /zdisk/c2t6d0  ONLINE       0     0     0
            /zdisk/c3t6d0  ONLINE       0     0     0
            /zdisk/c4t6d0  ONLINE       0     0     0
            /zdisk/c5t6d0  ONLINE       0     0     0
        spares
          /zdisk/c0t7d0    AVAIL
          /zdisk/c1t7d0    AVAIL
          /zdisk/c2t7d0    AVAIL
          /zdisk/c3t7d0    AVAIL
          /zdisk/c4t7d0    AVAIL
          /zdisk/c5t7d0    AVAIL

errors: No known data errors

Let's go for a test drive with ZFS. First, I will create a zfs file system (zfs1) without compression (by default) and try to simulate a corrupted disk. We then 'scrub' it and 'replace' the corrupted disk with a new disk. You can see the MD5 hash of the file created before the corruption is the same throughout the whole process (before corruption, after corruption, replace faulty disk)

=
# zfs create zpool/zfs1

# dd if=/dev/urandom of=/zpool/zfs1/somefile.bin bs=1024 count=1000
1000+0 records in
1000+0 records out

# digest -a md5 /zpool/zfs1/somefile.bin
c61163bc590222cfbc0576b933b9ba53

# dd if=/dev/zero of=/zdisk/c5t6d0 bs=1024 count=10
10+0 records in
10+0 records out

# zpool scrub zpool

# zpool status
  pool: zpool
 state: DEGRADED
status: One or more devices could not be used because the label is missing or
        invalid.  Sufficient replicas exist for the pool to continue
        functioning in a degraded state.
action: Replace the device using 'zpool replace'.
   see: http://www.sun.com/msg/ZFS-8000-4J
 scrub: resilver stopped with 0 errors on Fri Mar 23 08:54:20 2007
config:

        NAME                 STATE     READ WRITE CKSUM
        zpool                DEGRADED     0     0     0
          raidz2             ONLINE       0     0     0
            /zdisk/c0t0d0    ONLINE       0     0     0
            /zdisk/c1t0d0    ONLINE       0     0     0
            /zdisk/c2t0d0    ONLINE       0     0     0
            /zdisk/c3t0d0    ONLINE       0     0     0
            /zdisk/c4t0d0    ONLINE       0     0     0
            /zdisk/c5t0d0    ONLINE       0     0     0
          raidz2             ONLINE       0     0     0
            /zdisk/c0t1d0    ONLINE       0     0     0
            /zdisk/c1t1d0    ONLINE       0     0     0
            /zdisk/c2t1d0    ONLINE       0     0     0
            /zdisk/c3t1d0    ONLINE       0     0     0
            /zdisk/c4t1d0    ONLINE       0     0     0
            /zdisk/c5t1d0    ONLINE       0     0     0
          raidz2             ONLINE       0     0     0
            /zdisk/c0t2d0    ONLINE       0     0     0
            /zdisk/c1t2d0    ONLINE       0     0     0
            /zdisk/c2t2d0    ONLINE       0     0     0
            /zdisk/c3t2d0    ONLINE       0     0     0
            /zdisk/c4t2d0    ONLINE       0     0     0
            /zdisk/c5t2d0    ONLINE       0     0     0
          raidz2             ONLINE       0     0     0
            /zdisk/c0t3d0    ONLINE       0     0     0
            /zdisk/c1t3d0    ONLINE       0     0     0
            /zdisk/c2t3d0    ONLINE       0     0     0
            /zdisk/c3t3d0    ONLINE       0     0     0
            /zdisk/c4t3d0    ONLINE       0     0     0
            /zdisk/c5t3d0    ONLINE       0     0     0
          raidz2             ONLINE       0     0     0
            /zdisk/c0t4d0    ONLINE       0     0     0
            /zdisk/c1t4d0    ONLINE       0     0     0
            /zdisk/c2t4d0    ONLINE       0     0     0
            /zdisk/c3t4d0    ONLINE       0     0     0
            /zdisk/c4t4d0    ONLINE       0     0     0
            /zdisk/c5t4d0    ONLINE       0     0     0
          raidz2             ONLINE       0     0     0
            /zdisk/c0t5d0    ONLINE       0     0     0
            /zdisk/c1t5d0    ONLINE       0     0     0
            /zdisk/c2t5d0    ONLINE       0     0     0
            /zdisk/c3t5d0    ONLINE       0     0     0
            /zdisk/c4t5d0    ONLINE       0     0     0
            /zdisk/c5t5d0    ONLINE       0     0     0
          raidz2             DEGRADED     0     0     0
            /zdisk/c0t6d0    ONLINE       0     0     0
            /zdisk/c1t6d0    ONLINE       0     0     0
            /zdisk/c2t6d0    ONLINE       0     0     0
            /zdisk/c3t6d0    ONLINE       0     0     0
            /zdisk/c4t6d0    ONLINE       0     0     0
            spare            DEGRADED     0     0     0
              /zdisk/c5t6d0  UNAVAIL      0     0     0  corrupted data
              /zdisk/c0t7d0  ONLINE       0     0     0
        spares
          /zdisk/c0t7d0      INUSE     currently in use
          /zdisk/c1t7d0      AVAIL
          /zdisk/c2t7d0      AVAIL
          /zdisk/c3t7d0      AVAIL
          /zdisk/c4t7d0      AVAIL
          /zdisk/c5t7d0      AVAIL

errors: No known data errors

# digest -a md5 /zpool/zfs1/somefile.bin
c61163bc590222cfbc0576b933b9ba53

# mkfile 100m /zdisk/newdisk

# zpool replace zpool /zdisk/c5t6d0  /zdisk/newdisk

# zpool status
  pool: zpool
 state: DEGRADED
 scrub: resilver completed with 0 errors on Fri Mar 23 08:57:26 2007
config:

        NAME                    STATE     READ WRITE CKSUM
        zpool                   DEGRADED     0     0     0
          raidz2                ONLINE       0     0     0
            /zdisk/c0t0d0       ONLINE       0     0     0
            /zdisk/c1t0d0       ONLINE       0     0     0
            /zdisk/c2t0d0       ONLINE       0     0     0
            /zdisk/c3t0d0       ONLINE       0     0     0
            /zdisk/c4t0d0       ONLINE       0     0     0
            /zdisk/c5t0d0       ONLINE       0     0     0
          raidz2                ONLINE       0     0     0
            /zdisk/c0t1d0       ONLINE       0     0     0
            /zdisk/c1t1d0       ONLINE       0     0     0
            /zdisk/c2t1d0       ONLINE       0     0     0
            /zdisk/c3t1d0       ONLINE       0     0     0
            /zdisk/c4t1d0       ONLINE       0     0     0
            /zdisk/c5t1d0       ONLINE       0     0     0
          raidz2                ONLINE       0     0     0
            /zdisk/c0t2d0       ONLINE       0     0     0
            /zdisk/c1t2d0       ONLINE       0     0     0
            /zdisk/c2t2d0       ONLINE       0     0     0
            /zdisk/c3t2d0       ONLINE       0     0     0
            /zdisk/c4t2d0       ONLINE       0     0     0
            /zdisk/c5t2d0       ONLINE       0     0     0
          raidz2                ONLINE       0     0     0
            /zdisk/c0t3d0       ONLINE       0     0     0
            /zdisk/c1t3d0       ONLINE       0     0     0
            /zdisk/c2t3d0       ONLINE       0     0     0
            /zdisk/c3t3d0       ONLINE       0     0     0
            /zdisk/c4t3d0       ONLINE       0     0     0
            /zdisk/c5t3d0       ONLINE       0     0     0
          raidz2                ONLINE       0     0     0
            /zdisk/c0t4d0       ONLINE       0     0     0
            /zdisk/c1t4d0       ONLINE       0     0     0
            /zdisk/c2t4d0       ONLINE       0     0     0
            /zdisk/c3t4d0       ONLINE       0     0     0
            /zdisk/c4t4d0       ONLINE       0     0     0
            /zdisk/c5t4d0       ONLINE       0     0     0
          raidz2                ONLINE       0     0     0
            /zdisk/c0t5d0       ONLINE       0     0     0
            /zdisk/c1t5d0       ONLINE       0     0     0
            /zdisk/c2t5d0       ONLINE       0     0     0
            /zdisk/c3t5d0       ONLINE       0     0     0
            /zdisk/c4t5d0       ONLINE       0     0     0
            /zdisk/c5t5d0       ONLINE       0     0     0
          raidz2                DEGRADED     0     0     0
            /zdisk/c0t6d0       ONLINE       0     0     0
            /zdisk/c1t6d0       ONLINE       0     0     0
            /zdisk/c2t6d0       ONLINE       0     0     0
            /zdisk/c3t6d0       ONLINE       0     0     0
            /zdisk/c4t6d0       ONLINE       0     0     0
            spare               DEGRADED     0     0     0
              replacing         DEGRADED     0     0     0
                /zdisk/c5t6d0   UNAVAIL      0     0     0  corrupted data
                /zdisk/newdisk  ONLINE       0     0     0
              /zdisk/c0t7d0     ONLINE       0     0     0
        spares
          /zdisk/c0t7d0         INUSE     currently in use
          /zdisk/c1t7d0         AVAIL
          /zdisk/c2t7d0         AVAIL
          /zdisk/c3t7d0         AVAIL
          /zdisk/c4t7d0         AVAIL
          /zdisk/c5t7d0         AVAIL

errors: No known data errors

# zpool status
  pool: zpool
 state: ONLINE
 scrub: resilver completed with 0 errors on Fri Mar 23 08:57:26 2007
config:

        NAME                STATE     READ WRITE CKSUM
        zpool               ONLINE       0     0     0
          raidz2            ONLINE       0     0     0
            /zdisk/c0t0d0   ONLINE       0     0     0
            /zdisk/c1t0d0   ONLINE       0     0     0
            /zdisk/c2t0d0   ONLINE       0     0     0
            /zdisk/c3t0d0   ONLINE       0     0     0
            /zdisk/c4t0d0   ONLINE       0     0     0
            /zdisk/c5t0d0   ONLINE       0     0     0
          raidz2            ONLINE       0     0     0
            /zdisk/c0t1d0   ONLINE       0     0     0
            /zdisk/c1t1d0   ONLINE       0     0     0
            /zdisk/c2t1d0   ONLINE       0     0     0
            /zdisk/c3t1d0   ONLINE       0     0     0
            /zdisk/c4t1d0   ONLINE       0     0     0
            /zdisk/c5t1d0   ONLINE       0     0     0
          raidz2            ONLINE       0     0     0
            /zdisk/c0t2d0   ONLINE       0     0     0
            /zdisk/c1t2d0   ONLINE       0     0     0
            /zdisk/c2t2d0   ONLINE       0     0     0
            /zdisk/c3t2d0   ONLINE       0     0     0
            /zdisk/c4t2d0   ONLINE       0     0     0
            /zdisk/c5t2d0   ONLINE       0     0     0
          raidz2            ONLINE       0     0     0
            /zdisk/c0t3d0   ONLINE       0     0     0
            /zdisk/c1t3d0   ONLINE       0     0     0
            /zdisk/c2t3d0   ONLINE       0     0     0
            /zdisk/c3t3d0   ONLINE       0     0     0
            /zdisk/c4t3d0   ONLINE       0     0     0
            /zdisk/c5t3d0   ONLINE       0     0     0
          raidz2            ONLINE       0     0     0
            /zdisk/c0t4d0   ONLINE       0     0     0
            /zdisk/c1t4d0   ONLINE       0     0     0
            /zdisk/c2t4d0   ONLINE       0     0     0
            /zdisk/c3t4d0   ONLINE       0     0     0
            /zdisk/c4t4d0   ONLINE       0     0     0
            /zdisk/c5t4d0   ONLINE       0     0     0
          raidz2            ONLINE       0     0     0
            /zdisk/c0t5d0   ONLINE       0     0     0
            /zdisk/c1t5d0   ONLINE       0     0     0
            /zdisk/c2t5d0   ONLINE       0     0     0
            /zdisk/c3t5d0   ONLINE       0     0     0
            /zdisk/c4t5d0   ONLINE       0     0     0
            /zdisk/c5t5d0   ONLINE       0     0     0
          raidz2            ONLINE       0     0     0
            /zdisk/c0t6d0   ONLINE       0     0     0
            /zdisk/c1t6d0   ONLINE       0     0     0
            /zdisk/c2t6d0   ONLINE       0     0     0
            /zdisk/c3t6d0   ONLINE       0     0     0
            /zdisk/c4t6d0   ONLINE       0     0     0
            /zdisk/newdisk  ONLINE       0     0     0
        spares
          /zdisk/c0t7d0     AVAIL
          /zdisk/c1t7d0     AVAIL
          /zdisk/c2t7d0     AVAIL
          /zdisk/c3t7d0     AVAIL
          /zdisk/c4t7d0     AVAIL
          /zdisk/c5t7d0     AVAIL

errors: No known data errors

# digest -a md5 /zpool/zfs1/somefile.bin
c61163bc590222cfbc0576b933b9ba53

Now I am going to create another ZFS file system with compression on. You can see the time taken to create such a big file (1GB) in a compressed file system is only 51.112 seconds vs 49.419 seconds without compression. Also the MD5 hash of the same file under the 2 file systems are the same.

# time dd if=/dev/urandom of=/zpool/zfs1/bifile.bin bs=1024 count=100000
100000+0 records in
100000+0 records out

real    0m49.419s
user    0m0.701s
sys     0m41.169s

# zfs get compression zpool/zfs1
NAME             PROPERTY       VALUE                      SOURCE
zpool/zfs1       compression    off                        local

# zfs create zpool/zfs2

# zfs set compression=on zpool/zfs2

# time dd if=/dev/urandom of=/zpool/zfs2/bifile.bin bs=1024 count=100000
100000+0 records in
100000+0 records out

real    0m52.112s
user    0m0.697s
sys     0m40.897s

# cp /zpool/zfs1/bifile.bin /zpool/zfs2/bifile.bin-copy-zfs1

# digest -a md5 /zpool/zfs1/bifile.bin
b15a3f71dd6ffb937c9cbf508cb442ff

# digest -a md5 /zpool/zfs2/bifile.bin-copy-zfs1
b15a3f71dd6ffb937c9cbf508cb442ff

Solaris 10 rocks, ZFS on Solaris 10 rocks++.

PS. I also explored the IP filter on the T1 so that I can implement host-based firewall for my customer. The article, Using Solaris IP Filters, is a very good starting point. It is pretty easy to implement and I tried that out with Samba.

Labels: ,

2 Comments:

Blogger Esmond said...

You mentioned in the post that the file had the same digest value before, after, and during pool corruption. But doesn't this just mean that the file is probably not contained on a disk that got corrupted? If the file is on the corrupted disk, you would see a different digest value, right?
bstone@aspirinsoftware.com

2:56 PM  
Blogger chihungchan said...

I tried to corrupt just 1 disk and because the zpool is based on raidz2 (RAID 6), ZFS is about to figure out which strip is good and which one is bad. That's why the MD5 digest remains the same, before, during and after the corruption.

Also, replacing the 'disk' is just a breeze.

8:07 AM  

Post a Comment

<< Home