architecture Archives

I’ve had several folks ask me recently about how to support very large NTFS volumes on vSphere virtualized Windows servers. The current limitation for a VMDK in vSphere 5.1 is 2TB minus 512B. FWIW, a Hyper-V virtual disk in Windows Server 2012 can be up to 64TB. Those asking the question want to support NTFS volumes greater than 2TB for a variety of purposes – Exchange databases, SQL databases, and file shares. Windows (depending on the version and edition) can theoretically support NTFS volumes up to 256TB (depending on cluster size and assuming GPT), with files up to 256TB in size (see https://en.wikipedia.org/wiki/NTFS for more). The way that they wanted to solve the limitation is to present multiple 2TB VMDKs to a Windows VM, and then use Windows Logical Disk Manager (LDM) to convert the VMDKs to dynamic disks (from the Windows default Basic Disk), then concatenate or span multiple disk partitions into one large NTFS volume. Talk about a monster VM…. The question to me, then, became this: Is using spanned dynamic disks on multiple VMDKs a good idea? Here are some of my thought on the question.

First, there are no right or wrong answers. How you choose to support big data/large disk requirements will be a mix of preference, manageability, performance, recoverability, and fault domain considerations. These considerations will be at a few different levels – storage array, VMware, Guest OS, guest Application, and backup systems. A large spanned Windows volume can offer some simplified management – you might not have to worry as much about running out of space, or junior engineers having to think about where to place data in the guest OS. I tend to avoid using LVM/Spanned Windows Dynamic Disks within VM’s when possible for a variety of reasons – here are some of my considerations (for a variety of systems – Exchange, SQL, file servers, etc.):

Performance

Some applications, such as Microsoft SQL, can benefit from having more, smaller disks with multiple files in a database filegroup. Having different files on different disks, on different vSCSI controllers can increase SQL’s ability to do asynchronous parallel IO. Microsoft’s recommendation for SQL (https://technet.microsoft.com/library/Cc966534) is to have between .25 to 1 data files per filegroup per core, with each file a on different drive/LUN. So a 8 vCPU SQL server would have between 2 and 8 .mdf/.ndf files on an equal number of drives. This lends itself to more, smaller VMDK’s that are not striped or spanned by Windows. This requires a bit of design work within the database, optimizing your table and index structures to span multiple files in a file group.
Smaller, purpose built files/volumes/LUNS can be placed on the right storage tier with the best caching mechanism (e.g. SQL log volumes placed on RAID1/0 with more write cache availability).
A single volume may have a limited queue depth. You’ll probably increase queuing capabilities as you scale out the number of VMDK’s, and Windows will be able to drive more IO as additional disk channels are opened up.
A greater number of virtual disks spread over different VMFS datastores may increase the number of paths used to service the workload. This may allow for increased storage bandwidth, more in-path cache, and more storage processor efficiency.

Manageability

By using Dynamic Disk striping, spanning or software RAID within the guest, you are introducing an extra layer of complexity that you will need to keep in mind while performing operations on the VM/VMDK. A storage operation on an array, LUN, VMFS datastore, or VMDK within your guest-striped volume could take the whole volume down.
Having smaller, purpose-built VMDK’s allows you to move specific parts of your workload to a physical storage tier that best suits it. Putting everything into one monolithic volume doesn’t allow this level of granularity. For example, I might create a smaller Exchange mailbox database and put executives mailboxes in it. I would then place the mailbox database in a VMDK on a VMFS on a LUN on a high tier or replicated disk (great use case for VASA Profile Driven Storage BTW). The interns mailbox datastore would be placed on the lowest tier of non-replicated storage. This configuration would also lend itself to more targeted and efficient backup schemes.
This Microsoft TechNet article for Exchange 2013 storage architecture (https://technet.microsoft.com/en-us/library/ee832792.aspx ) suggests that using GPT Basic Disks is a best practice, although Dynamic disks are supported. Conversely, you could deduce that using spanned dynamic disks is not best practice. The TechNet article also recommends keeping your Exchange mailbox databases (MDB) under 200GB, so there’s no need for a VMDK over 2TB is you’re following best practices.