Seiten

Friday, July 12, 2013

Moving from (software) RAID 1 to RAID 5 and testing performance

Finally I finished migrating my new N54L from a 2-disk software RAID 1 (2x2TB WD20EFRX) to 3-disk RAID 5 (3x2TB WD20EFRX) using mdadm. I got an excellent mini HOWTO I followed. Resyncing and reshaping the RAID took several days ... top speed was 110.000K/s ... but left my data untouched. Finally growing the RAID now I have two partitions; one with 0.5GB (md0) and the other with 3.4GB (md1) which both host an EXT4 file system, whereas the first one is encrypted via LUKS and the second one is not.

RAID device and file system configuration

Below is a summary of the setup and the values I found working well for my system.

3-disk RAID 5 device file system
device chunk stripe_cache_size read_ahead_kb type encryption stride stripe-width
/dev/md0 64 4096 32768 ext4 luks 16 32
/dev/md1 512 16384 32768 ext4 no 128 256

I've played around with the system a bit. Changing the chunk size on the fly however took a long time. /dev/md0 will contain some backups, so there probably will be a mixture of small and large files. So I've chosen to only test the values 64K, 128K and 512K (default) for this device. I left the other untouched as it will mainly contain large files.

Performance measurement

Below are the results using hdparm to measure performance. First lets take a look at the drives ...

$ hdparm -tT /dev/sd[bcd]
/dev/sdb:
 Timing cached reads:   3268 MB in  2.00 seconds = 1634.54 MB/sec
 Timing buffered disk reads: 438 MB in  3.00 seconds = 145.87 MB/sec

/dev/sdc:
 Timing cached reads:   3292 MB in  2.00 seconds = 1646.32 MB/sec
 Timing buffered disk reads: 392 MB in  3.01 seconds = 130.22 MB/sec

/dev/sdd:
 Timing cached reads:   3306 MB in  2.00 seconds = 1653.18 MB/sec
 Timing buffered disk reads: 436 MB in  3.00 seconds = 145.26 MB/sec

$ hdparm --direct -tT /dev/sd[bcd]
/dev/sdb:
 Timing O_DIRECT cached reads:   468 MB in  2.01 seconds = 233.17 MB/sec
 Timing O_DIRECT disk reads: 442 MB in  3.00 seconds = 147.15 MB/sec

/dev/sdc:
 Timing O_DIRECT cached reads:   468 MB in  2.00 seconds = 233.69 MB/sec
 Timing O_DIRECT disk reads: 392 MB in  3.01 seconds = 130.36 MB/sec

/dev/sdd:
 Timing O_DIRECT cached reads:   468 MB in  2.00 seconds = 233.94 MB/sec
 Timing O_DIRECT disk reads: 442 MB in  3.01 seconds = 146.93 MB/sec

... and now at the RAID devices ...

$ hdparm -tT /dev/md?
/dev/md0:
 Timing cached reads:   3320 MB in  2.00 seconds = 1660.37 MB/sec
 Timing buffered disk reads: 770 MB in  3.01 seconds = 256.05 MB/sec

/dev/md1:
 Timing cached reads:   3336 MB in  2.00 seconds = 1668.07 MB/sec
 Timing buffered disk reads: 742 MB in  3.01 seconds = 246.89 MB/sec

$ hdparm --direct -tT /dev/md?
/dev/md0:
 Timing O_DIRECT cached reads:   974 MB in  2.00 seconds = 487.08 MB/sec
 Timing O_DIRECT disk reads: 770 MB in  3.01 seconds = 256.17 MB/sec

/dev/md1:
 Timing O_DIRECT cached reads:   784 MB in  2.00 seconds = 391.18 MB/sec
 Timing O_DIRECT disk reads: 742 MB in  3.01 seconds = 246.42 MB/sec

... and now lets see, which actual speed we reach using dd. First lets check the encrypted device:

RAID-5 /dev/md0 (LUKS encrypted EXT4): chunk=64K, stripe_cache_size=4096,
   readahead(blockdev)=65536, stride=16, stripe-width=32 ...

$ dd if=/dev/zero of=/mnt/md0/10g.img bs=1k count=10000000
10000000+0 Datensätze ein
10000000+0 Datensätze aus
10240000000 Bytes (10 GB) kopiert, 64,1227 s, 160 MB/s

$ dd if=/mnt/md0/10g.img of=/dev/null bs=1k count=10000000
10000000+0 Datensätze ein
10000000+0 Datensätze aus
10240000000 Bytes (10 GB) kopiert, 85,768 s, 119 MB/s

Well, read speed is consistently lower than write speed for the encrypted file system. Lets take a look at the non-encrypted device:

RAID-5 /dev/md1 (EXT4): chunk=512K, stripe_cache_size=16384,
   readahead(blockdev)=65536, stride=128, stripe-width=256 ...

$ dd if=/dev/zero of=/mnt/md1/10g.img bs=1k count=10000000
10000000+0 Datensätze ein
10000000+0 Datensätze aus
10240000000 Bytes (10 GB) kopiert, 37,0016 s, 277 MB/s

$ dd if=/mnt/md1/10g.img of=/dev/null bs=1k count=10000000
10000000+0 Datensätze ein
10000000+0 Datensätze aus
10240000000 Bytes (10 GB) kopiert, 33,5901 s, 305 MB/s

Looks nice to me.

How to set these values

I use a mixture of udev, util-linux and e2fsprogs to set the values.

First I checked, which values for stripe_cache_size and read_ahead_kb are working best for me. For the LUKS encrypted EXT4 I got varying results showing best performances with values of 4096, 8192 and 16384 for stripe_cache_size. I decided for the first value, because it appeared more often with the best performance than the others.

$ less /etc/udev/rules.d/90-local-n54l.rules | grep stripe_cache
SUBSYSTEM=="block", KERNEL=="md0", ACTION=="add", TEST=="md/stripe_cache_size", TEST=="queue/read_ahead_kb", ATTR{md/stripe_cache_size}="4096", ATTR{queue/read_ahead_kb}="32768", ATTR{bdi/read_ahead_kb}="32768"
SUBSYSTEM=="block", KERNEL=="md1", ACTION=="add", TEST=="md/stripe_cache_size", TEST=="queue/read_ahead_kb", ATTR{md/stripe_cache_size}="16384", ATTR{queue/read_ahead_kb}="32768", ATTR{bdi/read_ahead_kb}="32768"

The read_ahead_kb value can also be set using blockdev. Note that this command expects a value of 512-byte sectors whereas read_ahead_kb is the size in kbyte. Therefor the difference in values:

$ blockdev --setra 65536 /dev/md[01]

Tuning the EXT4 file system performance with calculated values using tune2fs:

$ tune2fs -E stride=16,stripe-width=32 -O dir_index /dev/mapper/_dev_md0
$ tune2fs -E stride=128,stripe-width=256 -O dir_index /dev/md1

Disabling NCQ reduced the speed a lot for me, so I left the values as is and did not struggle with it:

$ cat /sys/block/sd[bcd]/device/queue_depth 
31
31
31

No comments:

Post a Comment