I am increasingly finding that both my SMB and Enterprise customers are uneducated on the fundamentals of storage sizing and performance. As a result, storage is often overlooked as a performance bottleneck despite it being a vital component to consider in a virtualization implementation. Storage will only increase in importance as hosts are getting bigger, data volumes increase, and more workloads are virtualized. For some reason, most people can grasp the importance of CPU and memory performance constraints but storage performance is often overlooked and can be hard to explain to business users or executives.
Case in point – I have recently been called into some environments that were not performing well – these environments happened to be running Microsoft SQL, but could just have well been running any application or collection of virtual machines. Fingers were being pointed in all directions: at applications, at the virtualization layer, at a lack of memory, and DBA’s were insisting that there were too few CPU’s. The situation was getting political and emotional when I walked into it. A few minutes with Windows Perfmon was all I needed to identify storage performance as the root cause of the firestorm that had been ignited. Using a bit of data, I was able to turn the discussion from an emotional fight to a simple problem of physics and mathematics (and a bit of simple math could have avoided the problem in the first place).
I have seen this play out a few too many times and so decided to write-up this multi-part series on the basics of storage with a focus on storage performance. That said, a little math and physics is where we will start as we look at the basic building block of a storage environment: a hard disk drive. Wikipedia defines a hard disk drive as “a non-volatile storage device that stores digitally encoded data on rapidly rotating platters with magnetic surfaces.” Your computer, server, or VMware cluster uses hard drives to read and write data. Wikipedia also covers the history and atomic structure of a hard drive pretty well. For our purposes, the take away is that hard drives are physical objects, and as such, follow the laws of physics (duh) in the following measurable ways:
1.) Capacity, which is measured in bits or bytes and exponents there of (MB, GB, TB, PB). This is how much data will fit on your disk, from simple text files to virtual disks, and everything in between. For example, if you have a 500GB SQL database, you darn well better have a hard drive that has a capacity of at least 500GB. This is a pretty simple concept, so I’ll leave it there for now.
2.) Performance, which is measured in a couple ways:
- at the disk itself in Input-Output Per Second (IOPS) – a measure of how many read and write commands a disk can complete in a second
- interface throughput, measured in MBps or Gbps – a measure of the peak rate that a volume of data can be read from or written to disk
- latency – the amount of time between when you ask a disk (or storage system if you want to read ahead) to do something and when it can actually do it, very closely related to IOPS as you’ll read in a forthcoming article in this series.
Each disk, array, and storage system has its own fixed set of measurements given a specific configuration. Knowing the physical capabilities of your storage system as measured in the above ways, and your systems storage requirements will go a long way towards a successful design and implementation of your storage environment. The remaining parts of this series will take a look at these performance characteristics a bit more in-depth and explain what happens as you introduce factors like RAID, cache, data reduction techniques such as snapshots and deduplication, and varying workloads.
Please keep in mind that while I have designed and implemented a variety of DAS, NAS, and SAN technologies from a host of vendors including Dell, EMC, IBM, and NetApp, I am by no means a storage expert. The information I will provide is generalized, over-simplified, and does not consider varying approaches from different storage vendors. Nonetheless, I hope you find this useful information if you are designing a solution, troubleshooting a performance issue or preparing to make a storage purchase.
- Storage Basics – Part I: An Introduction
- Storage Basics – Part II: IOPS
- Storage Basics – Part III: RAID
- Storage Basics – Part IV: Interface
- Storage Basics – Part V: Controllers, Cache and Coalescing
- Storage Basics – Part VI: Storage Workload Characterization
- Storage Basics – Part VII: Storage Alignment
- Storage Basics – Part VIII: The Difference in Consumer vs. Enterprise Class Disks and Storage Arrays; or ‘Why is the SAN you are proposing so darn expensive?’
- Storage Basics – Part IX: Alternate IOPS Formula