Posts Tagged ‘virtualization’

In parts I, II, and III of the Storage Basics series we looked at the basic building blocks of modern storage systems: hard disk drives.  Specifically, we looked at the performance characteristics of disks in terms of IOPS and the impact of combining disks into RAID sets to improve performance and resiliency.  Today we will have a quick look at another piece of the puzzle that impacts storage performance: the interface.  The interface, for lack of a better term, can describe several things in a storage conversation.   It can be let me break it down for you (remember, we’re keeping it simple here).

At the most basic level (assume a direct-attached setup), ‘interface’ can be used to describe the physical connections required to connect a hard drive to a system (motherboard/controller/array).  The ‘interface’ extends beyond the disk itself, and includes the controller, cabling, and disk electronics necessary to facility communications between the processing unit and the storage device.  Perhaps a better term for this would be ‘intra-connect’ as this is all relative to the storage bus.  Common interfaces include IDE, SATA, SCSI, SAS, and FC.  Before data reaches the disk platter (where it is bound by IOPS), it must pass through the interface.  The standards bodies that define these interfaces go beyond the simple physical form factor; they also define the speed and capabilities of the interface, and this is where we find another measure of storage performance: throughput.  The speed of the interface is the maximum sustained throughput (transfer speed) of the interface and is often measured in Gbps or MBps.

Here are the interface speeds for the most common storage interfaces:

  • IDE          100MBps or 133MBps
  • SATA      1.5Gbps or 3.0Gbps (6.0Gbps is coming)
  • SCSI         160MBps (Ultra-160) and 320MBps (Ultra-320)
  • SAS          1.5Gbps or 3.0Gbps (6.0Gbps is coming)
  • FC             1Gb, 2Gb, 4Gb, or 8Gb (Duplex throughput rates are 200MBps, 400MBps, 800MBps, and 1600MBps respectively)

If we take these speeds at face value, we see that a 320MBps SCSI and a 2Gbps FC are not too different.  If you dig a bit deeper you will soon find that simple speed ratings are not the end of the story.  For example, FC throughput can be impacted by the length and type of cable (fiber channel can use twisted pair copper in addition to fiber optic cables).  Also, topologies can limit speeds – serial connected topologies are more efficient than parallel on the SCSI side, and arbitrated loops can incur a penalty on the FC side.  The specifications of each interface type also define capabilities such as the protocol that can be used, the number of devices allowed on a bus, and the command set that can be used in communications on a storage system.  For example, SATA native command queuing (NCQ) can offer a performance increase over parallel ATA’s tagged command queuing with other factors held constant.   Because of this, you  might also see some performance implications of connecting a SATA drive to a SAS backplane, as the SAS backplane translates SAS commands to SATA.

If we move away from the direct-connect model, and into a shared storage environment that you might use in a VMware-virtualized environment, the ‘interface’ takes on an additional meaning.  You certainly still have the bus ‘interface’ that connects your disks to a backplane.  Modern arrays typically use SAS or FC backplanes.  If you have multiple disk enclosures, you also have an interface that connects each disk shelf to the controller/head/storage processor, or to an adjacent tray of disks.  For example, EMC Clariion’s use a copper fiber channel cable in a switched fabric to connect disk enclosures to the back-end of the storage processors.

If we move to the front-end of the storage system, ‘interface’ describes the medium and protocol used by initiating systems (servers) when connecting to the target SAN.  Typical front-end interface mediums on a SAN are Fiber Channel (FC) and Ethernet.  Front-end FC interfaces come in the standard 2Gb, 4Gb, or 8Gb speeds, while Ethernet is 1Gbps or 10Gbps.  Many storage arrays support multiple front-end ports which can be aggregated for increased bandwidth, or targeted by connecting systems using multi-pathing software for increased concurrency and failover.

Various protocols can be sent over these mediums.  VMware currently supports Fiber Channel Protocol (FCP) on FC, and iSCSI and NFS on Ethernet.  FC and iSCSI are block-based protocols that utilize encapsulated SCSI commands.  NFS is a NAS protocol.  Fiber Channel over Ethernet (FCoE) is also available on several storage arrays, sending FCP packets across Ethernet.

Determining which interface to use on both the front-end and back-end of your storage environment requires an understanding of your workload and your desired performance levels.  A post on workload characterization is coming in this series, so I won’t get too deep now.  I will, however, provide a few rules of thumb.  First, capture performance statistics: using Windows Perfmon, look at Physical Disk|Disk Read Bytes/sec or Disk Write Bytes/sec), or check out stats in your vSphere Client if you are already virtualized.

  • If you require low latency, use fiber channel.
  • If your throughput is regularly over 60MBps, you should consider fiber channel connected hosts.
  • iSCSI or NFS are often a good fit for general VMware deployments.

There is a ton of guidance and performance numbers available when it comes to choosing the right interconnect for a VMWare deployment, and a ton of variables that impact performance.  Start with this whitepaper from VMware: http://www.vmware.com/resources/techresources/10034.  For follow up reading, check out Duncan Epping’s post with a link to a NetApp comparison of FC, iSCSI, and NFS: http://www.yellow-bricks.com/2010/01/07/fc-vs-nfs-vs-iscsi/.  If you are going through a SAN purchase process, ask your vendor to assist you in collecting statistics for proper sizing of your environment.  Storage vendors (and their resellers) have a few cool tools for collecting and analyzing statistics – don’t be afraid to ask questions on how they use those tools to recommend a configuration for you.

I’ve kept this series fairly simple.  Next up in this series is a look at cache, controllers and coalescing.  With the next post we’ll start to get a bit more complex and more specific to VMware and Tier 1 workloads, both virtual and physical.  Thanks for reading!

We all know that virtualization allows us to do more with less.  Fewer servers and space-saving storage (talk about an oxymoron) help us put some green in the datacenter and back in the budget.  But with tight budgets demanding greater efficiency, virtualization pushing per-U-space utilization higher, and increasingly rack-dense equipment, proper planning of your physical plant remains an essential part of IT.  I argue that right-sizing your power, cooling, and floor-space is more critical now than it has ever been, and is a knowing how to do it is a darn good skill for a virtualization engineer to possess.

So along those lines… I was just doing some site-prep work for a new Clariion installation and noticed that the EMC Power Calculator has been updated.  It is now a pretty slick little web app that can be found on the PowerLink site (login required) here: https://powerlink.emc.com/nsepn/webapps/powercalculator/Main.aspx.

While I am at it, here are some links to other power consumption calculators.  Let me know if you have others and I will update this post:

There’s some fun and timely chatter happening right now on Twitter around power consumption and sizing – join in by following me at http://twitter.com/joshuatownsend/!

This is the third in a multi-part series on storage basics.  I’ve had some good feedback from folks in the SMB space saying that the first couple posts in this series have been beneficial, so we’ll be sticking with some basic concepts for another post or two before we dive into some nitty-gritty details and practical applications of these concepts in a VMware environment.  In the second post of this series I introduced the concept of IOPS and explained how the physical characteristics of a hard disk drive determine the theoretical IOPS capability of a disk.  I then noted that you can aggregate disks to achieve a greater number of IOPS for a particular storage environment.  Today, we will look at just how you combine multiple disks and the performance impact of doing so.  Remember that we are keeping this simple; the concepts I present here may not apply to that fancy new SAN you just purchased with your end-of-year money or the cheap little SATA controller on your desktop’s motherboard (not that there’s anything wrong with it) – we’re more in the middle ground of direct attached storage (DAS) as we firm up concepts.

Enterprise servers and storage systems have the ability to combine multiple disks into a group using Redundant Array of Independent Disks (RAID) technology.  We’ll assume a hardware RAID controller is responsible for configuring and driving storage IO to the connected disks.  RAID controllers typically have battery-backed cache (we’ll talk cache in a future post), an interconnect where the drives plug in, such as SCSI or SAS (we’ll talk about these too in a future post), and hold the configuration of the RAID set including stripe size and RAID level.  The controller also does the basic work of reading and writing on RAID set – mirroring, striping, and parity calculations.  There are several different types of RAID level – rather than rehash the details of them, read the Wikipedia entry on RAID and then come back here….

Ok, great.  So you now know that RAID is implemented to increase performance through the aggregation of multiple disks, and to increase reliability though mirroring and parity.  Now let’s consider the performance implications of some basic RAID levels.  As with many things in the IT industry, there are trade-offs: security vs. usability, brains vs. brawn, and now performance vs. reliability.  As we increase reliability in a RAID array through mirroring and parity, performance can be impacted.  This is where the more disks = more IOPS bit starts to fall apart.  The exact impact depends on the RAID type.  Here are some examples of how RAID impact the maximum theoretical IOPS using the most common RAID levels, where:

I = Total IOPS for Array (note that I show Read and Write separately)

i = IOPS per disk in array (based on spindle speed averages from Part II: IOPS)

n = Number of disks in array

r = Percentage of read IOPS (calculated from the Average Disk Reads/Sec divided by total Average Disk Transfers/Sec in your Windows Perfmon)

w = Percentage of write IOPS (calculated from the Average Disk Writes/Sec divided by total Average Disk Transfers/Sec in your Windows Perfmon)

RAID0 (striping, no redundancy)

This is basic aggregation with no redundancy.  A single drive error/failure could render your data useless and as such it is not recommended for production use.  It does allow for some simple math:

I =n*i

Because there is no mirroring or parity overhead, theoretical maximum Read and Write IOPS are the same.

RAID 1 & RAID10 (mirroring technologies):

Because data is mirrored to multiple disks

Read I = n*i

For example, if we have six 15k disks in a RAID10 config, we should expect a theoretical maximum number of IOPS for our array to be 6*180 = 1080 IOPS

Write I = (n*i)/2

RAID5 (striping with a single parity disk)

Read I = (n-1)*i

Example: Five 15k disks in a RAID 5 (4 + 1) will yield a maximum IOPS of (5-1)*180 = 720 READ IOPS.  We subtract 1 because one of the disks holds parity bits, not data.

Write I = (n*i)/4

Example: Five disks in a RAID 5 (4 + 1) will yield a maximum IOPS of (5*180)/4 = 225 WRITE IOPS

Again, these formulas are very basic and have little practical value.  Furthermore, it is seldom that you will find a system that is doing only reads or only writes.  More often, as is the case with typical VMware environments, reads and writes are mixed.  An understanding of your workload is key to accurately sizing your storage environment for performance.  One of the workload characteristics (we’ll explore some more in the future) that you should consider in your sizing is the percentage of read IOPS vs. the percentage of write IOPS.  A formula like this gets you close if you want to do the math for a mixed read/write environment in a RAID5 set:

I = (n*i)/(r+4 *w)

Example: a 60% read/40% write workload with five 15k disks in a RAID5 would provide (5*180)/(.6+4*.4) = 409 IOPS.

The previous examples have all been from the perspective of the storage system.  If we take a look at this from the server/OS/application side, something interesting shows up.  Let’s say you fired up Windows perfmon and collected Physical Disk Transfers/sec counters every 15 seconds for 24 hours and analyzed the data in Excel to find the 95th Percentile for total average IOPS (this is a pretty standard exercise if you are buying enterprise storage array or SAN).  Let’s say that you find that the server in question was asking for 1000 IOPS at the 95th Percentile (let’s stick with our 60% read/40% write workload).  And finally, let’s say we put this workload on a RAID5 array.  That’s saying a lot of stuff, but what does it all mean?  Because RAID5 has a write penalty factor of 4 (again, Duncan Epping’s posted a great article here which I referenced in Part II that describes this in a slightly different way) we can tweak the previous formula to show the IO’s to the backend array given a specific workload.

I = Target workload IOPS

f = IO penalty

r = % Read

w = % Write

IO = (I * r) + (I * w) * f

Our example then looks like this (remember work inside parenthesis first, and then My Dear Aunt Sally):

(1000 * .6) + ((1000 * .4) * 4) = 2200

Simply stated, this means that for every 1000 IOPS that our workload requests from our storage system, the backing array perform 2200 IO’s, and it better do it quickly or you will start to see latency and queuing (we call this performance degradation, boys and girls!).  Again, this is a very simplistic approach neglecting factors like cache, randomness of the workload, stripe size, IO size, and partition alignment which can all impact requirements on the backend.  I’ll cover some of those later.

As you can hopefully see, the laws of physics combined with some simple math can provide some pretty useful numbers.  A basic understanding of your array configuration against your workload requirements can go a long way in preventing storage bottlenecks.  You may also find that as you consider the cost per disk against various spindle speeds, capacities and RAID levels that you are better off buying smaller, faster, fewer, more, slower…. disks depending on your requirements.  The geekier amongst us could even take these formulas and some costs per disk and hit up Excel Goal Seek to find the optimal level, but that’s more than this little blog can do for you today.

Before I wrap up this post, I want to leave you with a few more links that I have bookmarked around the topics of IOPS and RAID over the past several years:

Happy New Year to everyone!  2010 is shaping up to be quite a good year, both personally and professionally.  Between two little boys at home and a bunch of new projects at work I should stay busy.  I will also continue to co-lead the Washington, DC VMware User Group (VMUG).  In my spare time I will continue to write what I hope are technically sound, practical, and timely articles on VMtoday.com.

And speaking of VMtoday.com… Eric Siebert has opened up voting for the top 25 virtualization bloggers on his vSphere-Land.com site.  I am very honored to be included in the ballot list of 55 of our industry’s top bloggers.  Please take a few minutes to vote – a couple lucky voters will win a copy of TrainSignal’s VMware vSphere DVD training course.

Thanks for reading and best wishes for a blessed and productive 2010!

~Josh

Today marks the one year anniversary of my first post on VMtoday.com, and an exciting year it has been in many ways.  First, some stats:

  • VMtoday.com has been visited more than 10,000 times in the past year.  While the number of site visits is far below some of my fellow virtualization bloggers, it is still exciting for me to see that I am making an impact on the community (despite my meager post count).  There are some of you who read my posts through Planet V12n and RSS, which is cool by me.
  • Yesterday was the busiest day for the site.  No coincidence that it comes after adding some new content….  I’ll try to be more faithful in publishing regular, relevant content!
  • My most popular post to date has been: IBM DS3300 iSCSI Write Performance Solved. I’m glad this has been useful for so many, but I hope that you don’t just apply the workarounds I wrote about. I would rather have you build a “bet the business” iSCSI environment by adding that second controller to your MD3000i or DS3300.
  • My least popular post to date has been: VMworld Here I Come. I will try not to be so boring in the future.
  • The first link back to my site was from Scott Lowe’s blog.  Thanks for the link, Scott.  Scott wrote a darn fine book: Mastering VMware vSphere 4.  Buy it.
  • The site theme, like my Twitter page, is very blue.  I am working on a new theme in all of the spare time that a busy professional and father of two boys can muster.

Along with this site, I have made a concerted effort engage the virtualization community in several ways:

  • Twitter keeps me plugged in to the latest news and discussions, and has been a source of help to me (and I would like to think that I have helped some of you as well).  Follow me at http://twitter.com/joshuatownsend.
  • I have stepped up into a leadership role with the Washington DC VMware User Group (VMUG).  It has been awesome to meet with and learn from my local colleagues.  I will continue to work with the DC VMUG leadership team to deliver exciting and relevant content and activities (and also some more seating space).  I welcome your feedback and ideas for ways to improve the VMUG.
  • VMworld 2009 was a great way to meet many of you while learning some of the hottest new technology on the planet.  My wife came along and enjoyed spending time with some of the other vSpouses (or vWidows?).
  • I took a new job early in the year with a VMware Partner in the DC area.  This new job has been both challenging and rewarding, affording me opportunities to more effectively engage customers and spend more time working on virtualization-specific solutions.

I look forward to contributing to the virtualization community as a blogger, VMUG leader, and practitioner.  If you want to learn a bit more about me, check out the About page on this site.  I welcome your feedback and appreciate your reading my work!

- Josh

I recently posted an article on how specific actions during the upgrade of a VMware Virtual Machine’s hardware from v4 to v7 can cause problems with certain services, including DNS, DHCP, and WINS. In that case, the problem was related to Microsoft Windows leaving non-present devices with networking configurations and  the failure of the VMware Upgrade Helper service to copy WINS settings when updating the NIC.  As my fellow blogger and VMUG leader, Jason Boche, responded on Twitter: “Same gotchas, different version.”  And right he is – anyone with experience in P2V or V2V, or who has been working with VMware long enough to have done a 2.5 to 3.0 upgrade experienced the same gotchas.

There are other issues with VMware virtual hardware upgrades, however, that you may not have experienced.  One such issue that I have experienced is highlighted in VMware Knowledge Base article 1013109: “Upgrading virtual hardware in ESX 4 may cause Windows 2008 disks to go offline“.  The problems described in the article are unique to Windows 2008 Enterprise and Datacenter editions only.  The problem is pretty well described in the title of the article – Upgrading virtual hardware in ESX 4 may cause Windows 2008 disks to go offline.  In this case, like with the ghost NIC’s I described last week, is more of a Microsoft issue, but it will rear its head when a VMware Administrator least desires it.  With this particular problem, the Windows Virtual Disk Service (part of the native Storage Management suite) is set to not auto-mount newly discovered disks that do reside on a shared bus.  Microsoft has a MSDN article on the VDS SANS policy here.  Upgrading the virtual hardware version causes the disks to be re-discovered and not auto-mounted.  This can potentially impact all non-system disks on a VM.

You may also experience similar issues when upgrading the vSCSI adapter in a VM from a standard LSI Logic Parallel SCSI adapter to a (new in vSphere 4.0) paravirtualized SCSI (pvSCSI) adapter, move virtual disks to new vSCSI adapters to increase the number of concurrent disk IO operations, or when you change the SCSI node ID of a virtual disk.  These may all trigger a re-discovery of the disks by the Windows Virtual Disk Service, leaving data disks offline on Windows 2008 Enterprise and Datacenter Edition guests.

In my opinion, these issues are not reasons to forgo upgrading your virtual hardware version.  However, when your upgrade/migration plans call for upgrading the virtual hardware version of your guests you should be prepared to resolve any issues caused by ‘ghost hardware’, offline disks, and the like.  Both the MSDN and VMware articles I cited above offer workarounds for the offline disk issue.  Here are the links again:

  • http://kb.vmware.com/selfservice/microsites/search.do?language=en_US&cmd=displayKC&externalId=1013109
  • http://msdn.microsoft.com/en-us/library/bb525577%28VS.85%29.aspx
  • I have been pulling my hair out with a small VI3 implementation running against an IBM DS3300 iSCSI array.  Performance, for lack of a better term, sucked.  Granted, the DS3300 is not an enterprise level workhorse of a storage system, but it fit the budget.  Read performance was decent from the array, but write performance was terrible, maxing out at 10Mpbs throughput and insanely high latencies on long writes when the system was under load.  This led to some long P2V operations, poor guest performance, and some questions from the project sponsors on why I couldn’t make the environment sing.

    The system was configured with a single controller with dual GigE NIC’s.  The controller had 512MB of battery backed cache (there is also a 1GB cache upgrade option available).  I wrote off some of the poor performance to a single controller with a less-than-optimal amount of cache; blamed the SAS controller to SATA disk command translation overhead; cringed at the 6 disk RAID5 configuration; and engaged in some self doubting.  I convinced the powers that be that we were IO constrained and got some funds to fill out the 3U chassis to a full 12 SATA disks, and reconfigured the array as a RAID10.  Performance gains were almost unnoticeable with these changes.  In addition, I did some basic troubleshooting of the network environment, verifying multiple paths to the storage, setting Flow Control on the switches to receive only, and double-checked my iSCSI initiator settings.  Note: The DS3300 is only supported with the ESX software initiator.  I found documentation on the DS3300 to be lacking, but did discover that the Dell MD3000i is based on the same LSI Engenio array.  Some Googling on the Dell solution led to to the ‘SMcli’ command line interface for both arrays.   The commands are slighly different for the Dell and IBM.  The links to the IBM CLI documentation were broken, so I had to do a bit of trial and error to get the commands right.  I used the Dell documentation as a starting point.  (Rant: Seriously, IBM?  Can you make your documentation any harder to get through – is it a Redbook, is it an Engineering Whitepaper, is it a support document, is it a case study – and why can I only find these with complex Google searches, not on your own product pages, and why can’t you name for documents intelligently, not with some random string of characters).

    Moving on… I received an automated alert from the DS3300 about an incomplete battery learn cycle.  Using the IBM Storage Manager GUI I generated a  Storage Subsystem Profile’ from the Support tab to check the battery status.  In the profile I discovered that while write cache was enabled, it had a status of “Enabled (Suspended)”.   Ah ha!  Now I’ve got some decent Google material that led me to this: http://communities.vmware.com/thread/195838.  Hot damn I love the VMware Community Forums!

    It turns out that in a single-controller configuration the setting for cache mirroring remains enabled by default.  Because there is no 2nd controller to mirror to, the array suspends write caching.  This is probably a safety thing – loss of high availability on the controllers puts data in cache at risk should the only controller fail.  I weighed my options and decided that the poor performance I was experiencing beat HA concerns, so I enabled write cache on the array using this command:

    c:\program files\ibm_ds4000\client>smcli -n <ARRAYNAME> -c “set allLogicalDrives mirrorEnabled=false;”

    And then followed with this for good measure:

    c:\program files\ibm_ds4000\client>smcli -n <ARRAYNAME> -p <arraypassword> -c “set allLogicalDrives writeCacheEnabled=true;”

    The results were immediately noticeable:

    DS3300 Performance Improvement when Write Cache is Enabled

    DS3300 Performance Improvement when Write Cache is Enabled - Click for a Larger View

    The screen shot is from Veeam Monitor Free Edition, taken during 4 concurrent V2V operations from Hyper-V to VMware.  With the write cache fully functional, disk usage peaked at 54MBps, latency dropped to about 6ms, and my blood pressure dropped a few notches.

    While poking around the CLI I also found that you can dump performance stats from the array (performance is otherwise hard to find on the thing) using this command:

    C:\Program Files\IBM_DS4000\client>smcli -n <ARRAYNAME> -c “set session performanceMonitorInterval=5 performanceMonitorIterations=120;save storageSubsystem performanceStats file=\”c:\\ds3300perfstats.csv\“;”

    This will give you a 10 minute record of performance from the array which you can analyze using Excel.  The Dell Enterprise Center TechCenter Wiki has a great write-up on how to efficiently analyze the data from this command here: http://www.delltechcenter.com/page/MD3000i+Performance+Monitoring, complete with a YouTube video that walks you through the process:

    I am beginning to think that the DS3300 (and MD3000i) may actually be a viable starter solution for SMB’s starting out on a virtualization project.  But I would recommend the cache upgrade, 2nd controller, SAS disks instead of SATA to eliminate the SAS-to-SATA translation overhead and more faster disks instead of fewer slower disks so you can drive throughput and IOPS to a higher level.

    Have any of you deployed the DS3300 or MD3000i (or the generic LSI solution)?  Do you have any performance tuning tips for these arrays?  If so, share in the comments!

    VMware vExpert and fellow Northern Virginian, Ken Cline, has posted an excellent article on his Ken’s Virtual Reality blog that aims to demystify VMware networking.  The article, the first in a new series by Ken, provides an overview of networking in an ESX/ESXi environment and breaks down the intricacies of the vSwitch and VLANs.  The article comes complete with some nifty diagrams to help make sense of the topic. The timing of this article is great for me as it helps to frame my thoughts as I delve into the design of my latest VMware project on an IBM BladeCenter with IP SAN storage.

    Great article, Ken!  I look forward to reading the rest of the series.

    One more post to wrap up the nonsense with my DL380 G3 ESX servers….

    Vincent Vlieghe noted that you must make a couple changes to your DL380 G3’s for ESX to work correctly.  His post was written back in 2006 when we were still working with ESX 2.x, but the same appears to be true of ESX 3.5 RTM (Updates are not supported on this hardware per the HCL).  The changes you must make to BIOS are:

    For stable operation on these systems, ESX Server requires a BIOS MPS Table Mode setting of Full Table APIC. With the exception of the specific systems referenced below, the following BIOS settings must be applied in order if available:

    1. System Options > OS Selection: Select Windows 2000.
    2. Advanced Options > MPS Table Mode: Select Full Table APIC.
    3. When presented with multiple Windows options (Windows 2000, Windows Server 2003, Windows .NET, and so on) select Windows 2000. If both BIOS settings are available and can be modified, both must be set correctly. You should confirm these settings after any BIOS upgrade operation.

    I have seen other references that say that you should also disable hyperthreading on this platform, but I was able to successfully run with Hyperthreading enabled with no performance degradation or stability issues.  I hope this information is helpful to those of you still running these dinosaurs!

    I wrote some time back about networking problems with a clean install of ESX 3.5 U3 on a HP DL380 G3 server in a lab environment.  A simple downgrade to ESX 3.5 RTM corrected the issue and I didn’t think much about it.  One of the servers in the lab died and I went about the business of rebuilding it.  Having learned my lesson, I started with an ESX 3.5 RTM install and then patched to Update 3 plus other applicable updates.  Much to my chagrin, the server began crapping out on me randomly.  Some reboots, some networking issues, and other assorted not so good things.  Now the DL380 G3 is not the spring chicken it used to be, so I assumed some faulty hardware was probably to blame.  Some diagnostics and log reviews yielded no hardware issues.

    On a whim, I decided to check the VMware HCL to see if the DL380 G3 was still on the list of compatible servers for ESX.  Now, I had checked, or rather ‘remembered’ checking, the HCL before that first problematic install, but a recheck never hurts.  When I arrived at the VMware HCL page I saw the same old trusty PDF link with a slightly newer revision date than my previous visit.  I was pleasantly surprised when I clicked the PDF link to find that I was redirected to a searchable, filterable forms-based version of the HCL.  Nice!  Let’s do this thing….

    I’m a little lazy, so I simply used a keyword search to look up ‘DL380 G3′.  Presto-chango: I’ve got results, and I like what I see:

    Search Results for DL380 G3 on the VMware HCL

    Search Results for DL380 G3 on the VMware HCL

    My eyes jump right to ESX 3.5 – Supported, on my platform, no further questions your honor.  Close the old browser window and move on with my life, my life being troubleshooting this darn server.

    A few hours later I am still struggling with the server and turn to Ebay for salvation.  “If you can’t beat em, cheat em,” my grandfather used to say.  I’ll find new hardware for my lab.  I identified some other hunk of junk that just might work and decided to check the HCL for it.  That’s when it jumped out at me: there are Update versions included in the HCL and I had been to quick to see it on my DL380 G3 search.  Back to the HCL.

    This time I just do a search for ‘DL380′, leaving off the Generational notation and get the following:

    Search Results for DL380 from the VMware HCL

    Search Results for DL380 from the VMware HCL

    The ProLiant DL380 G5 with Quad-core Intel Xeon processors lists ESX 3.5 U3, ESX 3.5 U2, and ESX 3.5 U1 as supported releases, along with the RTM ESX 3.5.  The Update versions are not listed for the G3 or G4.  After some self-deprecating curses and a reinstall of ESX 3.5 Update-nada, stability returned.

    The lesson learned, double-check the HCL (or if you are a little slow like me, a triple-check doesn’t hurt).  The HCL is major version and Update-revision sensitive.  And, not all models are treated equally.  You’ll notice in the picture to the left that the DL380 G5 has different supported releases depending on the CPU Model.

    Also, keep in mind that you need to verify that all components of your VMware infrastructure are on the HCL from Servers and Systems to IO Devices, and Storage/SAN.  The VMware HCL site offers some basic tips for searching here: http://www.vmware.com/resources/compatibility/help.php.

    Here’s the real take-away: The VMware HCL is there for a reason.  Sure, you might be able to get something that is not on the HCL to work, but you may experience instability along the way.  In the event that you are running a non-HCL system you may also find that VMware Support may be limited in what they can do for you.

    Follow Me!

        

    Virtualization Jobs

    Virtualization Resources