This is the third in a multi-part series on storage basics. I’ve had some good feedback from folks in the SMB space saying that the first couple posts in this series have been beneficial, so we’ll be sticking with some basic concepts for another post or two before we dive into some nitty-gritty details and practical applications of these concepts in a VMware environment. In the second post of this series I introduced the concept of IOPS and explained how the physical characteristics of a hard disk drive determine the theoretical IOPS capability of a disk. I then noted that you can aggregate disks to achieve a greater number of IOPS for a particular storage environment. Today, we will look at just how you combine multiple disks and the performance impact of doing so. Remember that we are keeping this simple; the concepts I present here may not apply to that fancy new SAN you just purchased with your end-of-year money or the cheap little SATA controller on your desktop’s motherboard (not that there’s anything wrong with it) – we’re more in the middle ground of direct attached storage (DAS) as we firm up concepts.
Enterprise servers and storage systems have the ability to combine multiple disks into a group using Redundant Array of Independent Disks (RAID) technology. We’ll assume a hardware RAID controller is responsible for configuring and driving storage IO to the connected disks. RAID controllers typically have battery-backed cache (we’ll talk cache in a future post), an interconnect where the drives plug in, such as SCSI or SAS (we’ll talk about these too in a future post), and hold the configuration of the RAID set including stripe size and RAID level. The controller also does the basic work of reading and writing on RAID set – mirroring, striping, and parity calculations. There are several different types of RAID level – rather than rehash the details of them, read the Wikipedia entry on RAID and then come back here….
Ok, great. So you now know that RAID is implemented to increase performance through the aggregation of multiple disks, and to increase reliability though mirroring and parity. Now let’s consider the performance implications of some basic RAID levels. As with many things in the IT industry, there are trade-offs: security vs. usability, brains vs. brawn, and now performance vs. reliability. As we increase reliability in a RAID array through mirroring and parity, performance can be impacted. This is where the more disks = more IOPS bit starts to fall apart. The exact impact depends on the RAID type. Here are some examples of how RAID impact the maximum theoretical IOPS using the most common RAID levels, where:
I = Total IOPS for Array (note that I show Read and Write separately)
i = IOPS per disk in array (based on spindle speed averages from Part II: IOPS)
n = Number of disks in array
r = Percentage of read IOPS (calculated from the Average Disk Reads/Sec divided by total Average Disk Transfers/Sec in your Windows Perfmon)
w = Percentage of write IOPS (calculated from the Average Disk Writes/Sec divided by total Average Disk Transfers/Sec in your Windows Perfmon)
RAID0 (striping, no redundancy)
This is basic aggregation with no redundancy. A single drive error/failure could render your data useless and as such it is not recommended for production use. It does allow for some simple math:
Because there is no mirroring or parity overhead, theoretical maximum Read and Write IOPS are the same.
RAID 1 & RAID10 (mirroring technologies):
Because data is mirrored to multiple disks
Read I = n*i
For example, if we have six 15k disks in a RAID10 config, we should expect a theoretical maximum number of IOPS for our array to be 6*180 = 1080 IOPS
Write I = (n*i)/2
RAID5 (striping with a single parity disk)
Read I = (n-1)*i
Example: Five 15k disks in a RAID 5 (4 + 1) will yield a maximum IOPS of (5-1)*180 = 720 READ IOPS. We subtract 1 because one of the disks holds parity bits, not data.
Write I = (n*i)/4
Example: Five disks in a RAID 5 (4 + 1) will yield a maximum IOPS of (5*180)/4 = 225 WRITE IOPS
Again, these formulas are very basic and have little practical value. Furthermore, it is seldom that you will find a system that is doing only reads or only writes. More often, as is the case with typical VMware environments, reads and writes are mixed. An understanding of your workload is key to accurately sizing your storage environment for performance. One of the workload characteristics (we’ll explore some more in the future) that you should consider in your sizing is the percentage of read IOPS vs. the percentage of write IOPS. A formula like this gets you close if you want to do the math for a mixed read/write environment in a RAID5 set:
I = (n*i)/(r+4 *w)
Example: a 60% read/40% write workload with five 15k disks in a RAID5 would provide (5*180)/(.6+4*.4) = 409 IOPS.
The previous examples have all been from the perspective of the storage system. If we take a look at this from the server/OS/application side, something interesting shows up. Let’s say you fired up Windows perfmon and collected Physical Disk Transfers/sec counters every 15 seconds for 24 hours and analyzed the data in Excel to find the 95th Percentile for total average IOPS (this is a pretty standard exercise if you are buying enterprise storage array or SAN). Let’s say that you find that the server in question was asking for 1000 IOPS at the 95th Percentile (let’s stick with our 60% read/40% write workload). And finally, let’s say we put this workload on a RAID5 array. That’s saying a lot of stuff, but what does it all mean? Because RAID5 has a write penalty factor of 4 (again, Duncan Epping’s posted a great article here which I referenced in Part II that describes this in a slightly different way) we can tweak the previous formula to show the IO’s to the backend array given a specific workload.
I = Target workload IOPS
f = IO penalty
r = % Read
w = % Write
IO = (I * r) + (I * w) * f
Our example then looks like this (remember work inside parenthesis first, and then My Dear Aunt Sally):
(1000 * .6) + ((1000 * .4) * 4) = 2200
Simply stated, this means that for every 1000 IOPS that our workload requests from our storage system, the backing array perform 2200 IO’s, and it better do it quickly or you will start to see latency and queuing (we call this performance degradation, boys and girls!). Again, this is a very simplistic approach neglecting factors like cache, randomness of the workload, stripe size, IO size, and partition alignment which can all impact requirements on the backend. I’ll cover some of those later.
As you can hopefully see, the laws of physics combined with some simple math can provide some pretty useful numbers. A basic understanding of your array configuration against your workload requirements can go a long way in preventing storage bottlenecks. You may also find that as you consider the cost per disk against various spindle speeds, capacities and RAID levels that you are better off buying smaller, faster, fewer, more, slower…. disks depending on your requirements. The geekier amongst us could even take these formulas and some costs per disk and hit up Excel Goal Seek to find the optimal level, but that’s more than this little blog can do for you today.
Before I wrap up this post, I want to leave you with a few more links that I have bookmarked around the topics of IOPS and RAID over the past several years:
- DB sizing for Microsoft Operations Manger, includes a nice chart with formulas similar to the ones I provided in this article: https://blogs.technet.com/jonathanalmquist/archive/2009/04/06/how-can-i-gauge-operations-manager-database-performance.aspx
- An Experts Exchange post with some good info in the last entry on the page (subscription required) https://www.experts-exchange.com/Storage/Storage_Technology/Q_22669077.html
- A Microsoft TechNet article with storage sizing for Exchange – a bit dated but still applicable: https://technet.microsoft.com/en-us/library/aa997052(EXCHG.65).aspx
- A simple whitepaper from Dell on their MD1000 DAS array – easy language to help the less technical along: https://support.dell.com/support/edocs/systems/md1120/multlang/whitepaper/SAS%20MD1xxx.pdf
- A great post that uses some math to show performance and cost trade-offs of RAID level, disk type, and spindle speed. https://www.yonahruss.com/architecture/raid-10-vs-raid-5-performance-cost-space-and-ha.html
- Another nifty post that looks at cost vs. performance vs capacities of various disks speeds in an array https://blogs.zdnet.com/Ou/?p=322
- Storage Basics – Part I: An Introduction
- Storage Basics – Part II: IOPS
- Storage Basics – Part III: RAID
- Storage Basics – Part IV: Interface
- Storage Basics – Part V: Controllers, Cache and Coalescing
- Storage Basics – Part VI: Storage Workload Characterization
- Storage Basics – Part VII: Storage Alignment
- Storage Basics – Part VIII: The Difference in Consumer vs. Enterprise Class Disks and Storage Arrays; or ‘Why is the SAN you are proposing so darn expensive?’
- Storage Basics – Part IX: Alternate IOPS Formula
Excelent one dude,i was lookign for this
Matthew Shuter says
Raid 0 portion, you have “This is basic aggregation with now redundancy”
now should be no
Joshua Townsend says
Good catch – I have updated the post.
In the above IOPS calculation begining (RAID5) its been said for 5 disk you take it as four reason being one disk for Parity.
But actually RAID 5 doesnt have a separate disk for just holding the Parity. It takes a junk of each disk in RAID 5. So in the case the calculation should be 5*180=900 IOPS (ideally) and not (5-1)*180=720.
Am I right?
Joshua Townsend says
Karthik – Thanks for the comment. RAID5 is indeed distributed parity – no single disk holds all parity data. Parity bits are read only during rebuilds and so they do not impose a penalty during normal read operations. This means that for reads, your calculation of a RAID5 set with 5 15k disks (5*180=900 IOPS) is correct.
Your formula for writes, however, is not correct. For writes, the actual work of writing parity imposes a penalty – 4 IOPS are consumed to write data and parity for each I/O operation requested by your server/OS/application/workload – the “factor of 4” I wrote about. This penalty takes away from the IOPS available to your real workload (we call this overhead). In reality, your 5-disk RAID5 could write a total of 900 IOPS (5*180=900) before queuing. However, your workload can’t take advantage of all 900 IOPS because most of the IOPS go to reading-modifying-writing the parity bits. Put another way, the disks are busy reading, modifying and writing parity bits instead of writing your workload’s “real” data. Your workload only gets the left-over IOPS after the parity writing overhead has been consumed.
Summing it up, your formula for calculating RAID5 read IOPS is (Number of Disks)*(IOPS per disk)=READ IOPS. Your forumla for calculating RAID5 write IOPS available to your server/OS/application/workload is (Number of Disks)*(IOPS per disk)/4=WRITE IOPS.
Keep in mind this is all theoretical – many arrays use cache, coalescing, and other tricks to minimize the RAID5 write penalty.
Hope this helps,
I enjoyed reading this article and I learned a lot. I loved the “over simplification” as I am learning hardware after being in software industry for too long and I could not find anything else that even comes close. Thank you so much for writing this. Your time, effort and energy is well spent here.
Thank you for the great article series!!!
Not an expert in the filed but the formula for calculation of the total IOPS for a RAID5 array, given the read/write workload percentage, looks to me more like
I=(n*i*r) + (n*i*w)/4 = (n*i)(r + w/f). Am I wrong?
Thanks Josh.Excellent post !
So if I need 1000 IOPS for an application , should I plan for 2200 IOPS ?( not considering any other factors -just a DAS array for example)