I have been pulling my hair out with a small VI3 implementation running against an IBM DS3300 iSCSI array. Performance, for lack of a better term, sucked. Granted, the DS3300 is not an enterprise level workhorse of a storage system, but it fit the budget. Read performance was decent from the array, but write performance was terrible, maxing out at 10Mpbs throughput and insanely high latencies on long writes when the system was under load. This led to some long P2V operations, poor guest performance, and some questions from the project sponsors on why I couldn’t make the environment sing.
The system was configured with a single controller with dual GigE NIC’s. The controller had 512MB of battery backed cache (there is also a 1GB cache upgrade option available). I wrote off some of the poor performance to a single controller with a less-than-optimal amount of cache; blamed the SAS controller to SATA disk command translation overhead; cringed at the 6 disk RAID5 configuration; and engaged in some self doubting. I convinced the powers that be that we were IO constrained and got some funds to fill out the 3U chassis to a full 12 SATA disks, and reconfigured the array as a RAID10. Performance gains were almost unnoticeable with these changes. In addition, I did some basic troubleshooting of the network environment, verifying multiple paths to the storage, setting Flow Control on the switches to receive only, and double-checked my iSCSI initiator settings. Note: The DS3300 is only supported with the ESX software initiator. I found documentation on the DS3300 to be lacking, but did discover that the Dell MD3000i is based on the same LSI Engenio array. Some Googling on the Dell solution led to to the ‘SMcli’ command line interface for both arrays. The commands are slighly different for the Dell and IBM. The links to the IBM CLI documentation were broken, so I had to do a bit of trial and error to get the commands right. I used the Dell documentation as a starting point. (Rant: Seriously, IBM? Can you make your documentation any harder to get through – is it a Redbook, is it an Engineering Whitepaper, is it a support document, is it a case study – and why can I only find these with complex Google searches, not on your own product pages, and why can’t you name for documents intelligently, not with some random string of characters).
Update: The IBM System Storage DS3000, DS4000, and DS5000Command Line Interface and Script Commands Programming Guide is here: IBM System Storage DS3000, DS4000, and DS5000Command Line Interface and Script Commands Programming Guide – DS3k4k5kCLIreference, SMCLI
Moving on… I received an automated alert from the DS3300 about an incomplete battery learn cycle. Using the IBM Storage Manager GUI I generated a Storage Subsystem Profile’ from the Support tab to check the battery status. In the profile I discovered that while write cache was enabled, it had a status of “Enabled (Suspended)”. Ah ha! Now I’ve got some decent Google material that led me to this: https://communities.vmware.com/thread/195838. Hot damn I love the VMware Community Forums!
It turns out that in a single-controller configuration the setting for cache mirroring remains enabled by default. Because there is no 2nd controller to mirror to, the array suspends write caching. This is probably a safety thing – loss of high availability on the controllers puts data in cache at risk should the only controller fail. I weighed my options and decided that the poor performance I was experiencing beat HA concerns, so I enabled write cache on the array using this command:
c:program filesibm_ds4000client>smcli -n <ARRAYNAME> -c “set allLogicalDrives mirrorEnabled=false;”
And then followed with this for good measure:
c:program filesibm_ds4000client>smcli -n <ARRAYNAME> -p <arraypassword> -c “set allLogicalDrives writeCacheEnabled=true;”
The results were immediately noticeable:
The screen shot is from Veeam Monitor Free Edition, taken during 4 concurrent V2V operations from Hyper-V to VMware. With the write cache fully functional, disk usage peaked at 54MBps, latency dropped to about 6ms, and my blood pressure dropped a few notches.
While poking around the CLI I also found that you can dump performance stats from the array (performance is otherwise hard to find on the thing) using this command:
C:Program FilesIBM_DS4000client>smcli -n <ARRAYNAME> -c “set session performanceMonitorInterval=5 performanceMonitorIterations=120;save storageSubsystem performanceStats file=”c:ds3300perfstats.csv“;”
This will give you a 10 minute record of performance from the array which you can analyze using Excel. The Dell Enterprise Center TechCenter Wiki has a great write-up on how to efficiently analyze the data from this command here: https://www.delltechcenter.com/page/MD3000i+Performance+Monitoring, complete with a YouTube video that walks you through the process:
I am beginning to think that the DS3300 (and MD3000i) may actually be a viable starter solution for SMB’s starting out on a virtualization project. But I would recommend the cache upgrade, 2nd controller, SAS disks instead of SATA to eliminate the SAS-to-SATA translation overhead and more faster disks instead of fewer slower disks so you can drive throughput and IOPS to a higher level.
Have any of you deployed the DS3300 or MD3000i (or the generic LSI solution)? Do you have any performance tuning tips for these arrays? If so, share in the comments!