Posts Tagged ‘3.5’

I have been meaning to write this up for a while; Scott Drummonds’ ‘Love Your Balloon Driver’ post today at his Virtual Performance blog gave me a nice reminder.  I actually caught a sneak peak at the graphs with an explanation from Scott at his instructor-led lab at VMworld 2009.  Scott calls out that the only workload they discovered suffers from balloon driver activity is Java.  The reason for Java’s problems with balloon driver activity is that Java itself runs in a VM and so the guest OS cannot properly determine which pages should be swapped out when the balloon driver calls for it.

My experiences causes me to agree with Scott and the whitepaper he cites – in a properly designed and equipped environment the balloon driver is not detrimental for most every workload to a point.   However, I recently discovered in a client site that the balloon driver can cause significant issues when the environment is poorly designed and under-sized.  Here the background:

I was called into an already established environment where the client was running on an older blade with VMware ESX 3.5.  The blade maxed out at 16GB RAM and had dual dual-core CPU’s with no hope for an upgrade.  On the blade was a single guest VM running Windows 2003 with SQL 2005, in it’s full 32-bit glory.  The VM was configured with 4 vCPU’s and 16GB of memory.  Some of you can probably already guess where this is going….

The x86 Windows guest had PAE configured, and SQL took advantage of AWE to use the additional memory beyond the 4GB limit of a 32-bit system.  Additionally, the Windows guest had the /3GB switch enabled in boot.ini.  Finally, as per SQL best practices, the ‘Lock Pages in Memory‘ permission was granted to the SQL Server service account.  What the guest was left with was 1GB of kernel mode memory and 15GB of User Mode/Extended addressable memory.

And here’s the problem.  The client was using ESX, not ESX 3.5, so the Service Console required memory.  In this case, the service console had approximately 512MB allocated to it.  Futhermore, VM’s require some overhead on ESX to run.  The memory overhead consumed by a Windows guest on ESX 3.5 with 4 vCPU and 16GB of memory is a bit more than 512MB.  On a properly sized ESX server with multiple similar guests/workloads, you could probably gain much of the overhead back through transparent page sharing; but in this case I had a 1:1 P2V ratio.  If you are any good at math you see that the environment is running about 1GB short of memory.  A quick check of the balloon driver stat in vCenter show that the balloon driver was constantly active and demanding about 1GB back from the guest… constantly.

Under normal circumstances this might not be an issue, but in this case the Windows guest was being absolutely punished.  The guest CPU’s were pegged at 100% with an excessive amount of kernel time, often indicating IO issues.  And indeed I did experience terrible disk and network performance on the guest.  At the root of the problem is this – the Lock Pages in Memory permission allows SQL to get a firm grasp on the user mode memory available to it (15GB) and lock it up.  This left the already starved (because of the 3GB switch in the boot.ini) guest kernel with it’s 1GB the only thing the balloon driver could really swap out.

The client suggested a reservation of 16GB on the VM, knowing that memory reservations prevent balloon driver activity.  I calmly asked them to back away from the keyboard as I explained how if a starved guest was bad, how much worse a starved Service Console would be.  In the end the fix was quiet easy – I convinced the customer that they should reduce the amount of memory allocated to the guest by about 1GB, enough to let the 512MB SC and the 512MB of overhead run without contention.  I was able to show them the difference between allocated and active memory in vCenter – the 1GB being surrendered was not really being actively used, SQL just had it locked up.  In fact, surrendering the 1GB of memory back to ESX breathed new life into the guest VM, bringing its performance back in line with expectations.

Ideally, I would have brought in a bigger ESX server that could serve additional VM’s, driving greater levels of efficiency across the environment.  It just wasn’t an option for the client in this case.  In the end, the problem was fixed and I was reminded just how fun it can be to explain some of these backwards sounding virtualization concepts to customers – fewer vCPU’s can lead to better performance of guests, less guest memory can fix performance issues, and increasing the quantity of similar guests on a host can drive better performance to a point because of transparent page sharing.

Stay tuned over the next few weeks as I digest and write on my VMworld experience – from VMUG activities to Paul Maritz’s press conference announcing the vCloud Express, and plenty of great sessions in between.  Like many of you, I returned from VMworld with quite a backlog of work but I’ll do my best to squeeze in some posts and tweets.

I started this blog for a couple reasons: 1.) To help you, my readers, with your virtualization projects, and 2.) To help myself by: a.) raising my online profile as an expert in the community, and b.) To give myself somewhere to keep tidbits of knowledge that I find myself going back to look for over and over again. This post is a 2b.

I just built up a new laptop and couldn’t remember how to set up pass-through authentication on my VI3 Client. A quick Google search gave me the answer, courtesy of Stu Radnidge‘s post on nothing other than VirtualCenter 2.5 Passthrough Authentication.  This little gem saves the terribly tedius work of having to manually enter your login credentials each time you launch the Virtual Infrastructure 3 Client by passing through your currently logged-in credentials to the VC server.  Thanks for the tip, Stu!

One more post to wrap up the nonsense with my DL380 G3 ESX servers….

Vincent Vlieghe noted that you must make a couple changes to your DL380 G3′s for ESX to work correctly.  His post was written back in 2006 when we were still working with ESX 2.x, but the same appears to be true of ESX 3.5 RTM (Updates are not supported on this hardware per the HCL).  The changes you must make to BIOS are:

For stable operation on these systems, ESX Server requires a BIOS MPS Table Mode setting of Full Table APIC. With the exception of the specific systems referenced below, the following BIOS settings must be applied in order if available:

  1. System Options > OS Selection: Select Windows 2000.
  2. Advanced Options > MPS Table Mode: Select Full Table APIC.
  3. When presented with multiple Windows options (Windows 2000, Windows Server 2003, Windows .NET, and so on) select Windows 2000. If both BIOS settings are available and can be modified, both must be set correctly. You should confirm these settings after any BIOS upgrade operation.

I have seen other references that say that you should also disable hyperthreading on this platform, but I was able to successfully run with Hyperthreading enabled with no performance degradation or stability issues.  I hope this information is helpful to those of you still running these dinosaurs!

I wrote some time back about networking problems with a clean install of ESX 3.5 U3 on a HP DL380 G3 server in a lab environment.  A simple downgrade to ESX 3.5 RTM corrected the issue and I didn’t think much about it.  One of the servers in the lab died and I went about the business of rebuilding it.  Having learned my lesson, I started with an ESX 3.5 RTM install and then patched to Update 3 plus other applicable updates.  Much to my chagrin, the server began crapping out on me randomly.  Some reboots, some networking issues, and other assorted not so good things.  Now the DL380 G3 is not the spring chicken it used to be, so I assumed some faulty hardware was probably to blame.  Some diagnostics and log reviews yielded no hardware issues.

On a whim, I decided to check the VMware HCL to see if the DL380 G3 was still on the list of compatible servers for ESX.  Now, I had checked, or rather ‘remembered’ checking, the HCL before that first problematic install, but a recheck never hurts.  When I arrived at the VMware HCL page I saw the same old trusty PDF link with a slightly newer revision date than my previous visit.  I was pleasantly surprised when I clicked the PDF link to find that I was redirected to a searchable, filterable forms-based version of the HCL.  Nice!  Let’s do this thing….

I’m a little lazy, so I simply used a keyword search to look up ‘DL380 G3′.  Presto-chango: I’ve got results, and I like what I see:

Search Results for DL380 G3 on the VMware HCL

Search Results for DL380 G3 on the VMware HCL

My eyes jump right to ESX 3.5 – Supported, on my platform, no further questions your honor.  Close the old browser window and move on with my life, my life being troubleshooting this darn server.

A few hours later I am still struggling with the server and turn to Ebay for salvation.  “If you can’t beat em, cheat em,” my grandfather used to say.  I’ll find new hardware for my lab.  I identified some other hunk of junk that just might work and decided to check the HCL for it.  That’s when it jumped out at me: there are Update versions included in the HCL and I had been to quick to see it on my DL380 G3 search.  Back to the HCL.

This time I just do a search for ‘DL380′, leaving off the Generational notation and get the following:

Search Results for DL380 from the VMware HCL

Search Results for DL380 from the VMware HCL

The ProLiant DL380 G5 with Quad-core Intel Xeon processors lists ESX 3.5 U3, ESX 3.5 U2, and ESX 3.5 U1 as supported releases, along with the RTM ESX 3.5.  The Update versions are not listed for the G3 or G4.  After some self-deprecating curses and a reinstall of ESX 3.5 Update-nada, stability returned.

The lesson learned, double-check the HCL (or if you are a little slow like me, a triple-check doesn’t hurt).  The HCL is major version and Update-revision sensitive.  And, not all models are treated equally.  You’ll notice in the picture to the left that the DL380 G5 has different supported releases depending on the CPU Model.

Also, keep in mind that you need to verify that all components of your VMware infrastructure are on the HCL from Servers and Systems to IO Devices, and Storage/SAN.  The VMware HCL site offers some basic tips for searching here: http://www.vmware.com/resources/compatibility/help.php.

Here’s the real take-away: The VMware HCL is there for a reason.  Sure, you might be able to get something that is not on the HCL to work, but you may experience instability along the way.  In the event that you are running a non-HCL system you may also find that VMware Support may be limited in what they can do for you.

About Me

twitterface

Hello, and thank you for visiting VMtoday. My name is Josh Townsend. I am a technology professional with a strong background in VMware Virtualization, Storage, and Microsoft technologies. I am a Sr. Systems Administrator at Tiber Creek Consulting in Fairfax, VA, and hold several technical certifications, including VMware Certified Professional. I am also a 2010 VMware vExpert.

vExpert logo

VCP logo

I am also leader of the Washington DC Metro Area VMware User Group (VMUG).

VMUG logo

The opinions expressed on this site are my own and may not reflect the views of my employer, VMware, or any other party unless otherwise stated.

Please feel free to follow me on Twitter
@joshuatownsend

Virtualization Jobs

Virtualization Resources