blacklion Member
4 posts | First of all: yes, I read topics about graid5 performance. And it seems, that my config is like existing one. But...
No luck to get more than "single HDD" speed for R/W.
I use geom_radi5-eff. My hardware is 5x500Gb WD5000AAKS.
When stripe size is set to 128K, speed for R and W is exactly "one HDD speed" (microbenchmarks like "dd if=/dev/zero of=/raid/big.file bs=128k", iozone with similar options gives same results). Multiple clients have even lower "overal" speed.
when stripe size is 128/(5-1)=32Kb, write speed is about 2.5*HDD and read speed is still exactly 1*HDD.
What do I do wrong?
Kernel memory is big enough, cache is 80Gb, wdt is 5, and maxwql is 50... |
Enlightenment Administrator
 105 posts | Could you give me the output of following commands?
# graid5 status
# sysctl vm.kmem_size
# sysctl vm.kmem_size_max
# sysctl kern.geom.raid5
What kind of system are running this on? (CPU, Memory, Chipset)
To what controller are the disks connected? And if its not the chipset controller, is the controller connected via PCI or PCI-express? PCI is very bad to I/O performance. And last question: how did you create the filesystem?
Could you show me raw I/O benchmarks too like:
dd if=/dev/zero of=/dev/raid5/myraid5volume bs=1m count=1000
Warning: you should not attempt this when you have a filesystem on the RAID5 with files you dont want to loose, since the above command will destroy the filesystem. You also have to unmount it before attempting. Take control of the input and you shall become master of the output. |
blacklion Member
4 posts | #graid5 status
Name Status Components
raid5/storage COMPLETE CALM ad6
ad8
ad10
ad12
ad14
# sysctl vm.kmem_size
vm.kmem_size: 419430400
# sysctl vm.kmem_size_max
vm.kmem_size_max: 419430400
# sysctl kern.geom.raid5
kern.geom.raid5.wqf: 0
kern.geom.raid5.wqp: 71
kern.geom.raid5.blked2: 523206
kern.geom.raid5.blked1: 21
kern.geom.raid5.dsk_ok: 50
kern.geom.raid5.wreq2_cnt: 1628127
kern.geom.raid5.wreq1_cnt: 513244
kern.geom.raid5.wreq_cnt: 1010352
kern.geom.raid5.rreq_cnt: 1488577
kern.geom.raid5.mhm: 180263
kern.geom.raid5.mhh: 7353860
kern.geom.raid5.coca: 7
kern.geom.raid5.veri_w: 0
kern.geom.raid5.veri: 0
kern.geom.raid5.veri_nice: 100
kern.geom.raid5.veri_fac: 25
kern.geom.raid5.maxmem: 80000000
kern.geom.raid5.maxwql: 50
kern.geom.raid5.wdt: 5
kern.geom.raid5.tooc: 5
kern.geom.raid5.debug: 0
#
CPU:E4600, Mem: 2gb DDR2-800, Q35+ICH9DO, system on separate disk.
Disks are connected to on-board south bridge controller, and it CAN do parallel requests: "dd" to/from all "plain" disks in same time is not slower, than to only one disk.
dd if=/dev/zero of=/dev/raid5/storage bs=1m count=1024
For 32K stripe:
1024+0 records in
1024+0 records out
1073741824 bytes transferred in 3.859569 secs (278202515 bytes/sec)
For 128K stripe:
1024+0 records in
1024+0 records out
1073741824 bytes transferred in 3.363622 secs (319221910 bytes/sec)
Hmmm... Why it becomes faster from last time I tried it?!
But I more interested in reading speed, which is "slow"...
blob# dd if=/dev/raid5/storage of=/dev/null bs=1m count=1024
1024+0 records in
1024+0 records out
1073741824 bytes transferred in 14.395448 secs (74588983 bytes/sec)
|
Enlightenment Administrator
 105 posts | | Posted on 6 June 2008 @ 22:39 | edited 22:39 | |
Okay, could you do:
newfs -U -b 32768 /dev/raid5/storage
mount /dev/raid5/storage /mnt
cd /mnt
bonnie -s 2000
(1st benchmark)
then:
sysctl vfs.read_max=64
bonnie -s 2000
(2nd benchmark)
You should see in increase in sequential read performance due to more read ahead. The default read-ahead is only 8 blocks of the default 16KiB blocksize, meaning the filesystem reads up to 128KiB ahead. If you are using a stripesize of say 128KiB, that means that some disks cannot be used to process I/O at the same time (no parallellisation) - a key feature of striping RAID is the ability to process two or more I/O requests at the same time, because it is accessing the data on another physical disk. Therefore, processing 2 requests will take the same time as processing 1. Due to the fact that the load is not divided evenly this doubling of I/O performance does not scale as well as theoretically possible.
I'm interested in what results you get now. Take control of the input and you shall become master of the output. |
blacklion Member
4 posts | Yep, block 32768 + read_max=64 helps a lot!
It gives me stable 270Mb/s read from large file.
Situation with write is more interesting -- it could wary from 70Mb/s to 290Mb/s from run to run on 40Gb file (simple dd from /dev/null with 1m blocks). It looks like write combining works one time and doesn't work other... |
Enlightenment Administrator
 105 posts | Yeah you are exhausting the write buffer. You should use a higher write buffer. Try setting:
sysctl kern.geom.raid5.wdt=1
sysctl kern.geom.raid5.maxwql=200
Using a maxwql value of 200 would need 200 * (stripesize * (nr_of_disks - 1)) = 100MiB of kernel memory. Your pool of kernel memory called "kmem" is shared amongst all kernel components and you should not exceed the maximum kernel memory, or the system will simply panic with a kmem_map too small error.
Try tuning the maxwql value until you get stable write speeds. If it stays unstable, try the PP version of graid5 and tune the kern.geom.raid5.wc ("write cache") value. This setting is only available with the PP version of geom_raid5. Here is some other useful info:
kern.geom.raid5.wreq2_cnt: 1628127
kern.geom.raid5.wreq1_cnt: 513244
kern.geom.raid5.wreq_cnt: 1010352
wreq_cnt is the total number of write requests. wreq1_cnt is the number of writes that could be combined into a 1-phase "full stripe write" request, which are very fast. The wreq2_cnt indicates slower 2-phase writes which could not be combined into a full stripe, and requires reading from the disks even though you are only writing to the filesystem. In the gstat output you will see only writes on the raid5/volume device, but you will see reads on the member disks. That means that the I/O is not handled efficiently, or the I/O is simply not sequential. The UFS2 filesystem does not write large blocks of data sequentially so some performance loss over the "raw write speed" is expected. Take control of the input and you shall become master of the output. |