Archive for the ‘VMware’ Category

I’ve heard some grumbling about the long lines popping up at VMworld with folks blaming the new first-come format for admission into the various sessions. A few folks have asked me my take on it – here’s what I’ve got:

- Yes, the lines are long but they seem to move fast. Whether that’s because people are bailing out or because the staff here is efficiently moving people into session rooms as they open up I don’t know. I suspect a bit of both.
- I anticipate lines getting shorter over the next few days as the Solutions Exchange opens up a ton of space and activities for VMworld participants.
- As people get a feel for the flow of the event, things will start moving along better. Rather than jam-packing as many sessions into their schedule, people will start to balance vendor time on the Solution Exchange floor, Hands-On Labs, and other activities.

What has your experience been so far? Any suggestions for improving the VMworld experience? I’d love to hear your ideas in the comments!

Way back in the day (we’re talking way back in high school here) I worked my summers away at a Six Flags park in Western New York.  Every fall the park hosted an International Food Festival.  Let me tell you – the perogies, sausage and gyro’s slid down like nobody’s business.  But the real prize was the Italian bakery’s cannoli’s.  The 5 or so folks that shared an office with me decided we should track our cannoli consumption.  We did (like you do a drug or spin up VM’s just cause you can) something like 126 cannoli’s in one weekend. We called it the Cannoli Count – kept a tally o a whiteboard in the office.  Sickening, right?  So, what does this have to do with virtualization, you ask.  Well, not much really, but here is where I am going with it.  T-shirts are handed out like nobody’s business at VMworld, and they accumulate like a pack-rat’s pile of newspapers in my dresser drawers (I still have a high-school wrestling t-shirt that dates back to 1994 in the rotation).  My VMware Widow wife hates them, so I figure I’ll see how many more I can collect this week.  With two already in my hands after VMworld check-in, enter the:

Josh’s VMworld T-Shirt Count

||


I’ll keep the running tally going throughout VMworld – let’s see where this ends up (besides me cleaning out my dresser to avoid sleeping on the couch when I get home).

I was fortunate enough to be offered a sneak peak at the VMworld 2010 Hands-on Labs setup this morning, and let me tell you – I am impressed.  A lot of hard work has gone into planning, architecting and deploying the Labs environment, promising to make it the most user-friendly VMworld Labs setup yet.  Here is what you need to know:

Location: The Labs will be held at Moscone West, on the corner of 5th & Howard St. This is a change from last year.

Format: There will be two types of hands-on labs – instructor led (they’re calling these Advanced Lab Tutorials) and self-paced.

  • The instructor labs are more of a tutorial for those who want to be walked through the lab manual by a subject matter expert in an open discussion format.  The Advanced Lab Tutorial sessions support 250 seats.  The Advanced Lab Tutorials will be useful for preparing for the associated self-paced labs.  Take the Advanced Lab Tutorial first, then head downstairs to the Self-Paced lab.
  • The self paced labs are designed with a ton of flexibility, allowing you to choose what and when you work through the material.  For an overview of the Lab topics, check out the VMworld 2010 Program Guide.
  • When you arrive at the Self-Paced Labs area, you will register for the lab you want and head to a nice waiting area if no seats are available.  When your number is called, you will be lead to your seat and will fire up your lab.  You’ll have an hour to work through the lab.  If you need more time, ask.

Technical Specs: The VMware Core Team has obviously put an enormous amount of thought and time into improving the lab experience.  For those who attended VMworld 2009, the lab experience folks a bit disappointed due to some technical glitches and scheduling issues.  This year’s Labs are built with a ton of redundancy and allow for a much smoother, user-directed schedule.  The scale and scope of the labs is astonishing to say the least.  Here are some stats I gleaned on the lab setup:

  • There are 30 self-paced lab topics, each demanding their own unique environment.
  • There are 480 seats available for the self-paced labs, in a stadium seating configuration.  This allows a huge number of people to flow through the lab environment efficiently, with minimal wait time.  The lab schedule has some 40 hours of time for you to get in and work over the next several days.  This equates to more than 20,000 lab-seat hours (up from about 5000 hours last year).
  • The labs run from one of three data centers: Miami, FL (Terremark); Ashburn, VA (Verizon); and locally in the Moscone Center.  This provides a great deal of redundancy and positions the labs as a cloud offering to fit the theme of this year’s VMworld.  The Miami and Ashburn sites have been running for a while, and will be reused for VMworld Europe next month.  This is a change from last year where the gear was fork-lifted in for the show (remember all the racks at the bottom of the escalators?).  This has given the team more time to work on the setup and iron out any problems.
  • The self-paced labs are based on VMware’s Cloud Lab infrastructure, purpose built for VMworld Labs.  Cloud Lab provides a slick interface for provisioning labs to participants while doing some really smart things in the background to enhance performance and flexibility.
  • It is estimated that more than 100,000 VM’s will be provisioned in Labs this week – more than 5000 VM’s built and destroyed per lab hour! <- Read that again. Astonishing, no?
  • The gear driving the labs is provided by HP, Dell, EMC, NetApp, Cisco, and Xsigo.  Xangati is used for monitoring performance of PCoIP to the Wyse thin clients at each seat.
  • There are 4 racks of compute power and 2 racks of storage per datacenter.
  • The storage environment is mostly 10GbE.  EMC FastCache and NetApp Dedupe are both in use.  Storage is mostly NFS-based.
  • The memory footprint required to run the labs is some 36TB.
  • Labs are running a few levels deep – ESX nested inside of ESX with VM’s running inside.
  • Host Profiles are heavily leveraged to ensure a consistent environment.
  • Twin DS3′s provide Internet connectivity for the Labs.
  • In true cloud fashion, the Lab Cloud product dynamically pre-populates lab environments based on demand.  As some labs rise in popularity, the Lab Cloud will stage up environments based on that demand.  This will reduce wait time for the lab environment to be readied.  In years past, students would wait 5-7 minutes for their custom lab environments to be readied (building, deploying and booting a unique Active Directory, vCenter, ESXi, nested VM’s and associated products takes some time).  No guarantees that there won’t be some wait time, but this is a huge step in the right direction.
  • There will be some 150 moderators ready to help with Self-Paced labs.  Moderators are subject matter experts.  If you request help through the Lab Cloud interface, a moderator who is a SME in your topic will be dispatched to help you.

A few more things to note:

  • There will be prize drawings for those who do the most labs, as well as those who complete the labs the fastest.  Prizes will include a full pass to VMworld 2011 in Las Vegas.
  • Lab manuals will be made available after the show.
  • Some of the labs look really cool. You can find a list in the VMworld 2010 Program Guide.  I am excited to see the VMware vSphere Sandbox lab – an everything-but-the-kitchen-sink setup of as many products as they could cram in.  This provides a playground for you to see all of the VMware products working together, where you can create, destroy and otherwise play as you wish.
  • I would love to see this environment be made available for other uses after VMworld.  I think VMUG’s could really benefit, as could VMware’s partner community.

Special thanks to Adam Zipman who leads the team putting this together, Dan Anderson (Dan is the lead architect behind this massive operation) and Curtis Pope who led development of the Cloud Lab interface.  Also, thanks to John Troyer for setting up this morning’s breifing.  I appreciate your time today, guys.

I hope you all are as excited about the labs this year as I am.  I am planning to spend a good chunk of time working through the lab environments.

I ran into an issue with a customer today where a VM was performing terribly.  From within the guest OS (a Windows 2003 application server running .NET in IIS which I will call BigBadServer) things appeared sluggish and CPU time was high.  The amount of time being spent on the kernel was notably high.  The VM in question had 4 vCPU’s and a good helping of memory.

high kernel time in perfmon

I don’t have access to the VMware client at this particular site – just some of the guests, so I was flying bling.  Gut feeling told me that I was dealing with a resource contention issue.  I had the VMstats provider running in the guest (http://vpivot.com/2009/09/17/using-perfmon-for-accurate-esx-performance-counters/) showed me that there was no ballooning or swapping going on, and that the vCPU’s were not limited and the CPU share value seemed to be at the default.

I strongly suspected that the physical server running VMware ESX was oversubscribed on physical CPU (pCPU) resources.  Essentially, the guest VM’s that are sharing the resources of the physical machine are demanding more resources than the machine can handle.  To verify this theory, I had the client check the ‘CPU Ready’ metric on BigBadServer and bingo!

CPU Ready is a measure of the amount of time that the guest VM is ready to run against the pCPU, but the VMware CPU Scheduler cannot find time to run the VM because other VM’s are competing for the same resources.

From the stats the customer provided on our phone call, the CPU Ready for any one of the 4 vCPU’s on the BigBadServer was on average 3723ms (min: 1269ms, max:8491ms).  (Update 8/25/2010 to clarify summation stat) The summation for the entire VM was around 12,000ms on average and peaked around 35,000.  The stats came from the real-time performance  graph/table in the vSphere client. The real-time stats in the vSphere Client update every 20 seconds, so the CPU Ready summation value should be divided by 20,000 to get a percentage of CPU ready for the 20 second time slice.  If I take the worst case scenario of 8491ms per vCPU, this VM spent nearly 43% (8491/20,000) of the 20 second time slice waiting for CPU resources.

The CPU Ready summation in milliseconds counter in the vCenter Client is not always the most accurate or easy to interpret stat – to better quantify the problem it might be best to go to the ESX command line and run ESXTOP.  CPU Ready over 5% could be a sign of trouble, over 10% and there is a problem.  Running ESXTOP in batch mode and then analyzing the output using Windows Perfmon or Excel might be a good way to go on this to get a view over several hours rather than the realtime stats we were looking at.  I wrote a post a while back with more info on ESXTOP batch mode: http://vmtoday.com/2009/09/esxtop-batch-mode-windows-perfmon/

To help quantify the problem a bit more, the BigBadServer is on an ESX 4.0 server with about 10 other servers.  The physical blade has two dual-core CPU’s (AMD Opteron 2218HE’s which are not hyperthreaded).  The other VM’s on the blade have different vCPU and vMemory configurations.  3 VM’s (including BigBadServer) have 4 vCPU’s.  A couple have 2 vCPU’s, and the remainder are configured with 1 vCPU.  In ESX 4.x, the VMware console OS actually runs as a hidden VM, pegged to pCPU #1.

I generally recommend a pCPU:vCPU ration of 1:4 for mid-sized VMware deployments of single vCPU VM’s.  The blade we are running on is a 1:5 with several multi-vCPU VM’s.  The multi-vCPU’s start to skew the ratio recommendation and require some advanced design decisions.  VMware’s scheduler requires that all the vCPU’s on a VM run concurrently (even if the Guest OS is trying to execute a single thread).  Also, the VMware CPU Scheduler prefers to have all the vCPU’s from a VM run on the same pCPU.  As workloads are bounced around between pCPU’s, the benefits of CPU cache are lost.  This is one of those ‘more-is-less’ situations that you run into on virtualized environments.

What this CPU Scheduler nonsense means in this case is that the 4 vCPU’s on BigBadServer have to wait until all logical pCPU’s on the box are idle (including the one that runs ESX itself) before it can run.  If ESX can’t accomplish that (we are experiencing resource contention) it starts prioritizing workloads according to what it can best run.  It is much easier to schedule the smaller VM’s, so it tends to run those on pCPU more frequently.  The larger VM’s tend to suffer a bit more than the smaller ones.  We are competing with 2 other VM’s with 4 vCPU’s that use up all of the logical pCPU’s when they need to run, as well as with the smaller VM’s.

I suggested a few ways to fix this issue for the BigBadServer web server:

  1. Using Shares and/or Reservations on the VM.  This probably won’t work in our situation as the physical server is too over-subscribed.  We might see a slight improvement in BigBadServer (or we might not see any change), but possibly at the extreme expense of the other VM’s sharing the blade.
  2. Reduce the number of vCPU’s on BigBadServer AND the other multi-vCPU VM’s on the same physical server.  This would reduce resource contention and open up a whole bunch of scheduling options for the VMware CPU Scheduler.  This is the quickest/cheapest fix, but will not work if the VM’s really do need 4 vCPU’s.  A little workload analysis should determine which can be made smaller (the vCenter server graphs/stats should be enough for this).  For what it’s worth, by our analysis BigBadServer seems to be happier with 4 vCPU assuming we can run with a low CPU Ready on those 4.
  3. Move the BigBadServer VM to a physical ESX server with fewer multi-vCPU VM’s so there is less contention.
  4. Move the BigBadServer VM to a physical ESX server with quad-core pCPU’s (ideally two quad-cores or bigger).  This would give a lot more flexibility to the VMware CPU Scheduler and allow it to run quad-vCPU VM’s on the same pCPU for greater efficiency.
  5. Split BigBadServer into 2 smaller VM’s – The server currently runs a couple sites.  We could split them onto two servers – one for Project1 and one for Proejct2.  This configuration would take some design, testing, and time but could scale out better, give more flexibility and availability in the long run.

I’m not sure which way the customer will go on this one yet, but I feel good having armed them with enough knowledge and options to make an informed decision.

To avoid problems like this in the future, I recommend these rules of thumb:

  • Design your hosts for your guests.  Taking your Guest VM sizes into account when designing your environment and choosing physical hardware is crucial if you need bigger VM’s.
  • Don’t make your VM’s bigger than you have to.  It is always easier to add resources than take them away.  Hot Add of CPU and Memory in vSphere make adding incredibly easy.
  • Monitor your environment for CPU Ready, Swapping, and other metrics that can indicate an inefficient design.
  • Call for help when you can’t figure out what is going on (I’m happy to help!).  VMware is super powerful, but some things can be downright backwards when it comes to resource allocation on a fixed set of hardware.

If you are looking for some resources to help explain CPU Scheduling a bit more, I recommend:

(Updated 8/25/2010 to include a few additional reference links and corrected summation divided by time slice to get accurate values)

About Me

twitterface

Hello, and thank you for visiting VMtoday. My name is Josh Townsend. I am a technology professional with a strong background in VMware Virtualization, Storage, and Microsoft technologies. I am a Sr. Systems Administrator at Tiber Creek Consulting in Fairfax, VA, and hold several technical certifications, including VMware Certified Professional. I am also a 2010 VMware vExpert.

vExpert logo

VCP logo

I am also leader of the Washington DC Metro Area VMware User Group (VMUG).

VMUG logo

The opinions expressed on this site are my own and may not reflect the views of my employer, VMware, or any other party unless otherwise stated.

Please feel free to follow me on Twitter
@joshuatownsend

Virtualization Jobs

Virtualization Resources