Posts Tagged ‘NetApp’
My Storage Basics series has been neglected for some time (sick kids, snow storms, VMware Upgrades, SAN implementations and some Cisco switch upgrades took all my free time), so let’s jump right in to Part V – Cache, Controllers, and Coalescing. Between the alliteration and fancy words, it might seem like I am about to tell a tale of international espionage. Unfortunately, my introductory treatment of these aspects of a storage system will probably not keep you on the edge of your seat – but I’ll try to keep it interesting.
Throughout this series, we’ve been working our way from the basic building block of any storage system – the disks – outwards towards the brains of the operation – the controller. You’ll recall that in Part II I introduced IOPS and the math that goes into calculating the IOPS capacity of a disk array. In Part III we considered a RAID implementation’s impact on performance and availability. And most recently in Part IV we looked at the common interface types when dealing with storage arrays. If we put the previous parts together we still don’t have a functional storage system. The missing piece is the controller. Simply put, the storage controller is the hardware adapter between the disks and the servers that connect to the storage. The controller has a specific ‘interface‘ type, is responsible for RAID operations, and handles advanced storage functionality. A controller can be as simple as the Dell PERC or HP Smart Array add-in card on your server, or as complex as the Storage Processor in an enterprise class Storage Area Network (SAN) such as an EMC CLARiiON or NetApp FAS.
Controllers
As we look at controllers and the advanced features they provide we’ll see that some of the earlier performance equations start to break down. The simplest controllers take disk read/write commands from the operating system and send commands down to the disk(s) attached to be read or written. This gets data onto the disk, but often does not do so in an efficient or reliable manner. RAID-capable controllers take on the added responsibility of configuring disks in the desired RAID level, calculating & writing parity data, and writing the data in disk-spanning stripes or mirrors depending on the RAID level.
Cache
To increase performance and improve reliability, storage vendors implement a caching system on their controllers. Cache is memory that acts as a buffer for disk I/O, and is usually battery-backed to prevent data loss in the event of a power failure. Because of the exponentially greater speed of RAM over spinning magnetic disks, cache can improve performance by orders of magnitude. Cache can operate on both reads and writes to disk.
When dealing with writes, the controller cache is typically used in one of two ways: write-through or write-back. In write-through mode, data is written to volatile cache and then to disk, and only acknowledged as written once the data resides on the non-volatile disk. Write-back mode allows the controller to acknowledge the data as having been written as soon as it is held in cache. This allows the cache to buffer writes quickly and then write them to the slower disk when the disk has cycles to accept I/O. The greater your cache size, the more data that can be buffered, ultimately resulting in better performance as measured in both IOPS and throughput. This graph from my article on troubleshooting write performance on an IBM DS3300 iSCSI array shows how throughput increased and latency decreased when enabling write cache. The extent to which cache increases performance is highly dependent on the workload characteristics (I/O size, randomness, and ratio of reads:writes).
Read-cache acts as a buffer for reads in a couple ways. First, some controllers attempt to ‘read-ahead’, anticipating future read requests from the operating system and buffering what it expects to be the next blocks of desired data. Some entry-level controllers simply buffer the next physical chunk of data and fill cache memory with it, while more advanced controllers may attempt to predict the right block of data based on previous requests (you just asked for 3 blocks in a row, I’m guessing you’ll come asking for the 4th next so I’ll just buffer it in fast cache for you now). Secondly, read cache holds data that has been previously read, regardless of any pre-fetching the controller may have done. This allows for much faster subsequent access of the same data because it is held in the faster cache, eliminating the need for the controller to go to disk for the data again. Just like with write cache, the extent to which cache increases performance is highly dependent on the workload characteristics.
A given storage array controller only has so much cache to work with. A Dell PERC5/E, for example, has 256MB of cache that can be used for both read and write. While this may be enough for a direct-attached storage array, SAN’s serving multiple systems demand more cache. In contrast, an EMC CLARiiON CX4-960 has 32GB. Some storage vendors, such as NetApp, are getting creative with cache. NetApp’s Performance Acceleration Module (PAM) is an add-in card that provides up to a whopping 512GB of Layer 2 cache to the storage system.
Caching mechanisms can dramatically influence performance under the right conditions. With healthy cache in place, IOPS calculations become skewed. However, cache can be exhausted or may not hold the data you are interested in. If cache is insufficient to satisfy read requests, or has reached its high-water mark for writes, performance can drop off. When cache is exhausted, the backing disk must be able to satisfy the I/O workload or performance will be unacceptable. This is where the IOPS calculations kick in, and where having the right disk type and configuration really matters.
Queuing & Coalescing
Advanced storage systems introduce additional features to reduce I/O contention and improve cache utilization. I won’t go into all of the features here because they vary by storage vendor. However, I will point out two common techniques – queuing and coalescing.
Queuing refers to the ability of a storage system to queue storage commands for later processing. Queuing can take place at various points in your storage environment, from the HBA to the storage processor/controller. A little queuing may be OK depending on your workload, but too many outstanding I/Os can negatively impact performance (this is measured in latency). Queue depths can be adjusted on many components in your storage and VMware landscape, but check with your vendor’s support group before you make changes to these settings.
Coalescing is performed by some storage systems to modify the character of the workload. To better understand coalescing, picture a bunch of random write activity. Without cache in place, the disk heads will be bouncing all over the platters trying to get the data on to disk. A little write cache will allow the storage array to acknowledge the write for the OS, but the array still needs to de-stage that data from cache to disk quickly to prevent cache exhaustion. The back-end disks will still be doing the chicken dance, bouncing around trying to write the random workload…. Now picture an intelligent system that re-orders the random writes that are held in cache and writes them to the disk in nice sequential stripes. The disk heads will be less prone to jumping around the platter and the behavior will start to look more like a nice waltz than the funky chicken dance. Coalescing is used for writes, not reads, so not all workloads benefit.
Wrap-up
With this article on Controllers, Cache, and Coalescing we’ll end our look at the basic building blocks of a storage array. Before we end the Storage Basic series I am planning a few more articles on Storage Workload Characterization (which has been mentioned, but not directly addressed in this and previous articles), Identifying a Stressed Storage System, and Best Practices for Storage Performance in a VMware Environment.
If you are interested in more reading on Controllers, Cache, and Coalescing, I recommend the following:
Additional Reading:
- Impact of cache on the performance of the HP StorageWorks XP12000 Disk Array white paper
- Performance impact of controller cache: SQL Server read only workloads
- IOps? – Dig into the article’s comments for some great dialog between some people who really know their stuff!
- Storage Performance for SQL Server
- Storage Caching 101 – Chuck Hollis (EMC)
- Improving Performance with Interrupt Coalescing for Virtual Machine Disk IO in VMware ESX Server
I wrote about a method for determining guest free disk space using a PowerShell script a couple weeks ago. Scott Lowe picked up the post on his blog last week. Since then I have had several other conversations with folks looking the best way to report on inefficiencies in their environments (it’s the economy, stupid) and mitigate those inefficiencies as budgets get tighter.
When it comes to reporting there are a ton of options available. The solution you choose will be dependent on your environment and the tools you already have in place. Small and Medium-sized Businesses (SMB’s) often do not have full blown, network-wide monitoring and management solutions, so VMware-specific solutions are often a great fit. There are several examples beyond my simple script, and many are free. The short list includes: Mightycare Solutions MCS StorageView 1.1, Rich Garsthagen’s VCplus, and Rob de Veij’s RVTools.
There are many other mid-tier solutions – both enterprise-wide and VMware specific – constantly emerging as the virtualization ecosystem matures. Offerings from ManageIQ, Embotics, Veeam, V-Kernel, Zenoss, Hyperic, and others are increasingly able to provide fresh and relevant data on what is happening under the covers in your virtual environment.
Larger IT shops most likely have a systems monitoring solution easily capable of reporting this – think offerings from the likes of Microsoft, Altiris, BMC, or CA. The trick in these solutions is narrowing down the information to your virtualized resources and getting the information to the right teams. Customized reports using fields such as the BIOS Vendor string can help show only servers running VMware, for example. As a side note, the Vendor BIOS string can also come in handy when applying Group Policies (GPO), allowing you to filter policies for only virtualized resources (disabling screen savers on Windows guests through GPO is a good example of this).
And don’t forget, we’re not reporting for reporting sake. We’re after relevant information that allows us to be more efficient and proactive in the overall goals of our environments. Good reporting identifiies areas in need of improvement, and smart system administrators look for creative ways to improve their systems efficiency.
NetApp has extended their 50% Virtualization Guarantee to include Citrix Xen and Microsoft Hyper-V. The program, first announced in 2008, initially covered only VMware virtualization solutions. The 50% Guarantee program is a catchy way to get folks thinking through the cost savings that virtualization can offer when combined with shared storage (and in this economy who isn’t thinking about savings!).
NetApp has linked several Technical Reports on the 50% Virtualization Guarantee program site that are worth reading even if you are not preparing for a new storage purchase. Here are links to the TR’s:
- Best Practices for Citrix Xen Server
- Best Practices for Microsoft Hyper-V
- Best Practices for VMware Virtual Infrastructure
Are you planning new storage purchases this year? If so, how do vendor resources and marketing tools like the 50% Virtualization Guarantee affect your decisions?
I am often asked about sizing storage vis-à-vis how much free space within a guest VMDK eats into the overall size of the volume. The answer can be drastically whether we are dealing with thick-provisioned VMDK’s on FC or iSCSI LUN’s, or thin provisioned VMDK’s on NFS volumes. The amount of free space present in guest VMDK’s also comes into effect when calculating the impact of dedupe on the volume. Add in some flexible volumes on NetApp storage and the amount of provisioned storage in the design changes significantly.
There are several methods of obtaining the amount of free space in a guest OS, from third party systems management tools (some are vendor or OS specific) to custom scripting (some VB for your Windows hosts, etc.). VirtualCenter also knows how much free space is within each guest VMDK, but the information is not readily displayed.
The first method of getting guest free space is using the VMware VI Toolkit for Windows. A simple statement like what I show below will pull the info for you (see Hal Rottenberg’s post on the VMware Communities Forum):
PS > $hdCapacity = @{ N = “Capacity (bytes)”; E = { $_.Guest.Disks | % { $_.Capacity } } }
PS > $hdFreeSpace = @{ N = “FreeSpace (bytes)”; E = { $_.Guest.Disks | % { $_.FreeSpace } } }
PS > Get-VM | select Name, $hdCapacity, $hdFreeSpace
You can re-write the command into a single line and change the output to show Percentage Free space, for example. The following came from http://communities.vmware.com/message/1046360.
Get-VM | Where { $_.PowerState -eq “PoweredOn” } | Get-VMGuest | Select VmName -ExpandProperty Disks | Select VmName, Path, @{ N=”PercFree”; E={ [math]::Round( (100 * ( $_.FreeSpace / $_.Capacity ) ),0 ) } } | Sort PercFree
These examples are well and good, but there are a couple of catches – The guest must be powered up and VMware Tools must be installed and running inside each VM you want to pull statistics from.
Many people do not know that the guest disk capacity and free space statistics are also captured in the VirtualCenter database and is available for VM’s in any power state (On, Off, or Suspended), so long as VMware Tools has been installed and run at least once in the VM. A SQL simple query will return the data too you (this could be more simple – I was pulling some additional statistics for a project I am working on):
SELECT VPX_GUEST_DISK.VM_ID, VPX_GUEST_DISK.PATH, VPX_GUEST_DISK.CAPACITY, CONVERT(bigint, VPX_GUEST_DISK.CAPACITY) / 1048576 AS ‘Capacity MB’, VPX_GUEST_DISK.FREE_SPACE, CONVERT(bigint, VPX_GUEST_DISK.FREE_SPACE) / 1048576 AS ‘Free Space MB’,VPX_HOST.ID, VPX_HOST.DATACENTER_ID, VPX_HOST.DNS_NAME, VPX_VM.ID, VPX_VM.DATACENTER_ID, VPX_VM.FILE_NAME, VPX_VM.LOCAL_FILE_NAME, VPX_VM.POWER_STATE, VPX_VM.GUEST_OS, VPX_VM.GUEST_STATE, VPX_VM.MEM_SIZE_MB, VPX_VM.NUM_DISK, VPX_VM.DNS_NAME, VPX_VM.IS_TEMPLATE, VPX_VM.HOST_ID
FROM VPX_GUEST_DISK VPX_GUEST_DISK, VPX_HOST VPX_HOST, VPX_VM VPX_VM
WHERE VPX_VM.ID = VPX_GUEST_DISK.VM_ID AND VPX_HOST.ID = VPX_VM.HOST_ID
Of course, you can always combine the power of PowerShell and the raw SQL data to create formatted output. Frank Hagen displays a method of obtaining SQL data through a PowerShell script and dumping the data to Excel for further manipulation on his blog. I modified a few lines of Frank’s code for my purposes:
################################################
# QuerySQL.ps1 # Frank W Hagen – 2008/07/15
# # FWHagen.wordpress.com
# Powershell script to query a SQL database # fwhagen.blog@gmail.com
# and write the output to an Excel file
#
# Usage:
# * Create a SQL query file by putting a valid SQL query in a text file in
# the subdirectory specified in $SQLQueryPath named <$TaskName>.sql
# * Set configuration variables in config section below
# * Run at command line: powershell -nologo .\QuerySQL.ps1
# OR right-click this script and Open With -> Powershell.EXE
#
# If you get a security warning running the script, see the following post:
# http://fwhagen.wordpress.com/2007/10/29/running-local-powershell-scripts/
######################################################### Function used for binding to Excel
function Invoke([object]$m, [string]$method, $parameters)
{ $m.PSBase.GetType().InvokeMember($method, [Reflection.BindingFlags]::InvokeMethod, $null, $m, $parameters, [System.Globalization.CultureInfo]“en-US”) }############################################
### Configuration information for specific query #####
$TaskName = “VIGuestDiskFree” # Title and name of query file#######################################################
### UPDATE THE FOLLOWING LINES FOR YOUR ENVIRONMENT ###
$SqlServer = “”; # SQL Server hosting VirtualCenter Database (include \instance name if not using Default)
$SqlCatalog = “”; # Virtual Center Database name
$SQLUserID = “”; # SQL Server User ID if not using Integrated Security see lines 69 and 70
$SQLPassword = “”; # SQL Server User Password if not using Integrated Security see lines 69 and 70
### END OF CUSTOMER UPDATABLE FIELDS ###
#######################################################
$WriteOutXML = $False
$WriteOutCSV = $False
$WriteOutXLS = $True
#
# Environment Configuration
$TaskPath = “C:\Scripts\” # Root Directory for creating reports
$SQLQueryPath = “SQLQueries\” # Subdirectory for finding the queryfile
####################################################### Timestamp the output folder and files using ISOdate
$OutPath = ($TaskPath + (Get-Date -Format yyyyMMdd) + “-” + $TaskName + “\”)
$OutFileName = ( (Get-Date -Format yyyyMMdd) + “-” + $TaskName )# Create the output folder #TODO: Fix Call to eliminate verbose results from system
if (!$(test-path ($OutPath)))
{
New-Item -itemType directory -Name ((Get-Date -Format yyyyMMdd) + “-” + $TaskName) > $nullif ($(test-path ($OutPath)))
{
Write-Host ($OutPath + ” Created”) -ForegroundColor “darkgreen”
}
else
{
Write-Host ($OutPath + ” FAILED”) -ForegroundColor “red”
}
}# Get the T-SQL Query from .SQL file
$SqlQuery = Get-Content ($TaskPath + $SQLQueryPath + $TaskName + “.sql”)
Write-Host (“Executing Queryfile: ” + ($TaskName + “.sql”) + ” “) -ForegroundColor “darkgreen”
#Write-Host ($SqlQuery) -ForegroundColor “gray”# Setup SQL Connection (using Integrated Security (your workstation login). Use standard connection string format for other)
$SqlConnection = New-Object System.Data.SqlClient.SqlConnection
#$SqlConnection.ConnectionString = “Server = $SqlServer; Database = $SqlCatalog; Integrated Security = True”
$SqlConnection.ConnectionString = “Server = $SqlServer; Database = $SqlCatalog; Integrated Security = False; User ID = $SQLUserID; Password = $SQLPassword;”# Setup SQL Command
$SqlCmd = New-Object System.Data.SqlClient.SqlCommand
$SqlCmd.CommandText = $SqlQuery
$SqlCmd.Connection = $SqlConnection# Setup .NET SQLAdapter to execute and fill .NET Dataset
$SqlAdapter = New-Object System.Data.SqlClient.SqlDataAdapter
$SqlAdapter.SelectCommand = $SqlCmd
$DataSet = New-Object System.Data.DataSet
$DataTable = New-Object System.Data.DataTable# Execute and Get Row Count
$nRecs = $SqlAdapter.Fill($DataSet)Write-Host ($nRecs.ToString() + ” Records retrieved.”) -ForegroundColor “Blue”
$SqlConnection.Close();if ($nRecs -gt 0)
{
# Make copy of successful query in output directory for traceability
if ($(test-path ($OutPath + $OutFileName + “.sql”)))
{
del ($OutPath + $OutFileName + “.sql”)
}
Copy-Item ($TaskPath + $SQLQueryPath + $TaskName + “.sql”) -destination ($OutPath + $OutFileName + “.sql”)# Very simple to export XML
if($WriteOutXML)
{
Write-Host “Creating XML File…” -ForegroundColor “darkgreen”
if ($(test-path ($OutPath + $OutFileName + “.xml”)))
{
del ($OutPath + $OutFileName + “.xml”)
}$DataSet.Tables[0].WriteXML($OutPath + $OutFileName + “.xml”);
}# Very simple to export CSV
if($WriteOutCSV)
{
Write-Host “Creating CSV File…” -ForegroundColor “darkgreen”
if ($(test-path ($OutPath + $OutFileName + “.csv”)))
{
del ($OutPath + $OutFileName + “.csv”)
}$DataSet.Tables[0] | Export-Csv ($OutPath + $OutFileName + “.csv”)
}# Very hard to export XSL – This method writes the data to an object array and pastes the array directly into Excel (Thanks go to a few sources on the Internet for this method)
if($WriteOutXLS)
{
Write-Host “Creating Excel File…” -ForegroundColor “darkgreen”
if ($(test-path ($OutPath + $OutFileName + “.xls”)))
{
del ($OutPath + $OutFileName + “.xls”)
}$sheetIndex = 0;
$oExcel = New-Object -COM Excel.Application
$oExcel.Visible = $false
$oBooks = $oExcel.Workbooks
$oCulture= [System.Globalization.CultureInfo]“en-US”
$oBook=$oBooks.psbase.gettype().InvokeMember(“Add”,[Reflection.BindingFlags]::InvokeMethod,$null,$oBooks,$null,$oCulture)
#$oSheet = $oBook.Worksheets.Item(1)$DataTable = $DataSet.Tables[0];
$nDr = $DataTable.Rows.Count + 1
$nDc = $DataTable.Columns.Count + 1# Create the object array
$rawData = new-object ‘object[,]‘ $nDr,$nDc# Write the field names in the first row
for ($col = 0; $col -lt $DataTable.Columns.Count; $col++)
{
$rawData[0, $col] = $DataTable.Columns[$col].ColumnName;
}# Copy the dataset to the object array
for ($col = 0; $col -lt $DataTable.Columns.Count; $col++)
{
for ($row = 0; $row -lt $DataTable.Rows.Count; $row++)
{
$rawData[($row + 1), $col] = $DataTable.Rows[$row][$col];
}
}# Calculate the final column letter
$finalColLetter = “”;
$colCharset = “ABCDEFGHIJKLMNOPQRSTUVWXYZ”;
$colCharsetLen = $colCharset.Length;
if ($DataTable.Columns.Count -gt $colCharsetLen)
{
$finalColLetter = $colCharset.Substring((($DataTable.Columns.Count – 1) / ($colCharsetLen – 1)), 1);
}
$finalColLetter += $colCharset.Substring(($DataTable.Columns.Count – 1) % $colCharsetLen, 1);### Export it all to Excel #####
Write-Host “Writing to Excel…” -ForegroundColor “darkgreen”# Create a new Sheet
$excelSheet = $oBook.Worksheets.Item(1)#$excelSheet.name = $DataTable.TableName; #TODO: Be nice to figure out how to make this work (not critical)
# Create the entire range on the worksheet and dump the data into it
$excelRange = “A1:” + $finalColLetter + “” + ($DataTable.Rows.Count + 1)
$excelSheet.Range($excelRange).FormulaLocal = $rawData;# Mark the first row as BOLD #TODO: Be nice to figure out how to make this work (not critical)
#$excelSheet.Rows[1].Font.Bold = $True;
#$excelSheet.Cells.Item(1,1).Font.Bold = $True;# Save the Excel file and we’re done
Invoke $oBook SaveAs ($OutPath + $OutFileName + “.xls”) > $null
Invoke $oBook Close 0 >$null
$oExcel.Quit()
}
}Write-Host (“Complete”)
I dropped the PowerShell code into a file named VIGuestDiskFree.ps1 in C:\Scripts on my Vista laptop, and the SQL query from earlier in this post to c:\scripts\sqlqueries\VIGuestDiskFree.sql. Simply run the PowerShell script from a PowerShell command line. The Excel output file will be created in a date-stamped folder in C:\Scripts. Once you have this data you can go about the business of formatting the data as an Excel Table and summing the Free Space column to find out your storage savings with thin provisioning and other storage efficiency technologies. I have attached a .zip (VIGuestFreeScripts) containing the scripts in this post (formatting is a bit off in WordPress).
How do you report on your storage utilization and storage efficiency efforts? Post a comment to share your methods!




