Archive for the ‘VMware How To’ Category
I ran into an issue with a customer today where a VM was performing terribly. From within the guest OS (a Windows 2003 application server running .NET in IIS which I will call BigBadServer) things appeared sluggish and CPU time was high. The amount of time being spent on the kernel was notably high. The VM in question had 4 vCPU’s and a good helping of memory.
I don’t have access to the VMware client at this particular site – just some of the guests, so I was flying bling. Gut feeling told me that I was dealing with a resource contention issue. I had the VMstats provider running in the guest (http://vpivot.com/2009/09/17/using-perfmon-for-accurate-esx-performance-counters/) showed me that there was no ballooning or swapping going on, and that the vCPU’s were not limited and the CPU share value seemed to be at the default.
I strongly suspected that the physical server running VMware ESX was oversubscribed on physical CPU (pCPU) resources. Essentially, the guest VM’s that are sharing the resources of the physical machine are demanding more resources than the machine can handle. To verify this theory, I had the client check the ‘CPU Ready’ metric on BigBadServer and bingo!
CPU Ready is a measure of the amount of time that the guest VM is ready to run against the pCPU, but the VMware CPU Scheduler cannot find time to run the VM because other VM’s are competing for the same resources.
From the stats the customer provided on our phone call, the CPU Ready for any one of the 4 vCPU’s on the BigBadServer was on average 3723ms (min: 1269ms, max:8491ms). (Update 8/25/2010 to clarify summation stat) The summation for the entire VM was around 12,000ms on average and peaked around 35,000. The stats came from the real-time performance graph/table in the vSphere client. The real-time stats in the vSphere Client update every 20 seconds, so the CPU Ready summation value should be divided by 20,000 to get a percentage of CPU ready for the 20 second time slice. If I take the worst case scenario of 8491ms per vCPU, this VM spent nearly 43% (8491/20,000) of the 20 second time slice waiting for CPU resources.
The CPU Ready summation in milliseconds counter in the vCenter Client is not always the most accurate or easy to interpret stat – to better quantify the problem it might be best to go to the ESX command line and run ESXTOP. CPU Ready over 5% could be a sign of trouble, over 10% and there is a problem. Running ESXTOP in batch mode and then analyzing the output using Windows Perfmon or Excel might be a good way to go on this to get a view over several hours rather than the realtime stats we were looking at. I wrote a post a while back with more info on ESXTOP batch mode: http://vmtoday.com/2009/09/esxtop-batch-mode-windows-perfmon/
To help quantify the problem a bit more, the BigBadServer is on an ESX 4.0 server with about 10 other servers. The physical blade has two dual-core CPU’s (AMD Opteron 2218HE’s which are not hyperthreaded). The other VM’s on the blade have different vCPU and vMemory configurations. 3 VM’s (including BigBadServer) have 4 vCPU’s. A couple have 2 vCPU’s, and the remainder are configured with 1 vCPU. In ESX 4.x, the VMware console OS actually runs as a hidden VM, pegged to pCPU #1.
I generally recommend a pCPU:vCPU ration of 1:4 for mid-sized VMware deployments of single vCPU VM’s. The blade we are running on is a 1:5 with several multi-vCPU VM’s. The multi-vCPU’s start to skew the ratio recommendation and require some advanced design decisions. VMware’s scheduler requires that all the vCPU’s on a VM run concurrently (even if the Guest OS is trying to execute a single thread). Also, the VMware CPU Scheduler prefers to have all the vCPU’s from a VM run on the same pCPU. As workloads are bounced around between pCPU’s, the benefits of CPU cache are lost. This is one of those ‘more-is-less’ situations that you run into on virtualized environments.
What this CPU Scheduler nonsense means in this case is that the 4 vCPU’s on BigBadServer have to wait until all logical pCPU’s on the box are idle (including the one that runs ESX itself) before it can run. If ESX can’t accomplish that (we are experiencing resource contention) it starts prioritizing workloads according to what it can best run. It is much easier to schedule the smaller VM’s, so it tends to run those on pCPU more frequently. The larger VM’s tend to suffer a bit more than the smaller ones. We are competing with 2 other VM’s with 4 vCPU’s that use up all of the logical pCPU’s when they need to run, as well as with the smaller VM’s.
I suggested a few ways to fix this issue for the BigBadServer web server:
- Using Shares and/or Reservations on the VM. This probably won’t work in our situation as the physical server is too over-subscribed. We might see a slight improvement in BigBadServer (or we might not see any change), but possibly at the extreme expense of the other VM’s sharing the blade.
- Reduce the number of vCPU’s on BigBadServer AND the other multi-vCPU VM’s on the same physical server. This would reduce resource contention and open up a whole bunch of scheduling options for the VMware CPU Scheduler. This is the quickest/cheapest fix, but will not work if the VM’s really do need 4 vCPU’s. A little workload analysis should determine which can be made smaller (the vCenter server graphs/stats should be enough for this). For what it’s worth, by our analysis BigBadServer seems to be happier with 4 vCPU assuming we can run with a low CPU Ready on those 4.
- Move the BigBadServer VM to a physical ESX server with fewer multi-vCPU VM’s so there is less contention.
- Move the BigBadServer VM to a physical ESX server with quad-core pCPU’s (ideally two quad-cores or bigger). This would give a lot more flexibility to the VMware CPU Scheduler and allow it to run quad-vCPU VM’s on the same pCPU for greater efficiency.
- Split BigBadServer into 2 smaller VM’s – The server currently runs a couple sites. We could split them onto two servers – one for Project1 and one for Proejct2. This configuration would take some design, testing, and time but could scale out better, give more flexibility and availability in the long run.
I’m not sure which way the customer will go on this one yet, but I feel good having armed them with enough knowledge and options to make an informed decision.
To avoid problems like this in the future, I recommend these rules of thumb:
- Design your hosts for your guests. Taking your Guest VM sizes into account when designing your environment and choosing physical hardware is crucial if you need bigger VM’s.
- Don’t make your VM’s bigger than you have to. It is always easier to add resources than take them away. Hot Add of CPU and Memory in vSphere make adding incredibly easy.
- Monitor your environment for CPU Ready, Swapping, and other metrics that can indicate an inefficient design.
- Call for help when you can’t figure out what is going on (I’m happy to help!). VMware is super powerful, but some things can be downright backwards when it comes to resource allocation on a fixed set of hardware.
If you are looking for some resources to help explain CPU Scheduling a bit more, I recommend:
- VMware’s Official documentation of CPU Scheduler in vSphere 4.1 – http://www.vmware.com/files/pdf/techpaper/VMW_vSphere41_cpu_schedule_ESX.pdf.
- A nice summary of co-scheduling from VMware’s Performance Blog: http://blogs.vmware.com/performance/2008/06/esx-scheduler-s.html
- Description and stats on Ready Time metrics for VI3: http://www.vmware.com/pdf/esx3_ready_time.pdf
- Understanding Virtual Center Performance Statistics: http://communities.vmware.com/docs/DOC-5230.pdf
(Updated 8/25/2010 to include a few additional reference links and corrected summation divided by time slice to get accurate values)
I posted an article in December on how the SVGA driver included with VMware Tools caused the guest VM to freeze. I referenced VMware’s KB Article 1011709, which directed you to not use the SVGA drivers included with VMware Tools. KB1011709 has since been updated (as of February 25, 2010) to indicate that the VMware Tools package included with ESX 4.0 Update 1 includes a new WDDM driver that is fully supported. If you have updated to Update 1, you should upgrade VMware Tools to take advantage of the new driver.
If you followed the KB1011709′s original advice and did a custom install of VMware Tools (leaving out the SVGA driver through a custom install), you may have to do a re-install of VMware Tools before the new driver is available. Once you get VMware Tools upgraded, the new driver can be found in the guest VM at C:\Program Files\Common Files\VMware\Drivers\wddm_video. These drivers are not automatically installed, so you’ll have to update your guest’s video adapter driver in Device Manager.
It’s a bummer that the WDDM SVGA drivers are not automatically installed. You could probably copy these drivers to other VM’s and use Windows Device Manager to replace the standard driver with the newer WDDM driver without having to do the uninstall, reboot, reinstall of VMware tools on all of your VM’s.
Just as I was about to publish this, I saw a TweetDeck pop-up from @jasonboche saying that he had published pretty much the same update here: http://www.boche.net/blog/index.php/2010/03/28/windows-2008-r2-and-windows-7-on-vsphere/. Not only does he have pretty pictures to go with his post, but also points out that VMware Tools installs/upgrades executed with VMware Update Manager (VUM) will not install the upgraded SVGA driver. He also recommends updating templates to include the upgraded drivers. Great points, Jason.
I am finishing up an installation of an EMC Clariion CX4 SAN. One of the final steps of the installation is to configure PowerPath/VE on the ESXi hosts. PowerPath/VE is EMC’s multipathing extension module for VMware (and Hyper-V), designed to replace the Native Multipathing Plugin (NMP) for increased I/O performance and failover management. To simplify and automate the installation of PowerPath/VE, I decided to use VMware Update Manager (VUM) to push the extension to the ESXi 4.x hosts in the environment.
The process of setting up an additional VUM patch repository to host PowerPath/VE (and other 3rd party extensions such as the Cisco Nexus 1000v) is pretty straight forward. 3rd party extensions are supported in VUM beginning with vSphere 4.0 Update 1. Chad Sakac has posted a great video guide on YouTube that covers the setup:
I opted to use the tomcat installation on the environment’s vCenter server to host the PowerPath/VE repository. To accomplish this, I simply created a new directory in the tomcat root directory. The default path for the root directory on a vSphere vCenter Server is “C:\Program Files\VMware\Infrastructure\tomcat\webapps” (or C:\Program Files (x86)\VMware\Infrastructure\tomcat\webapps on a 64-bit installation).
I created a directory named ‘depot’ and within that directory created a PowerPathVE folder. I extracted the contents of the VUM folder from the PowerPath .zip file that I downloaded from http://powerlink.emc.com. A screenshot of the directory is below:
After creating the directory for the patch repository, I simply added an Extension Repository to VMware Update Manager as Chad shows in his video. I would like to call out one caveat – Because vCenter may not listen on standard HTTP/HTTPS ports, I used
https://vcenter.domain.local:8443/depot/PowerPathVE/index.xml as the path to the source.
Once PowerPath was added to an Extension Baseline in VUM, I simply had to scan my hosts for updates and remediate. Installation of PowerPath/VE requires the host to be in Maintenance Mode and concludes with a reboot. Pretty simple.
Then all you have to do is fight through an overly-complex licensing setup (seriously, a 112 page PDF on how to install licenses???), a bit of configuration, and you are multi-pathing with the best of them. If you are interested in learning more about PowerPath/VE, start with this whitepaper: EMC PowerPath/VE for VMware vSphere Best Practices Planning. For a bit of real-world insight into the performance increase you might see with PowerPath/VE, check out this blog post from Eric Sloof: Massive I/O power increase using EMC PowerPath/VE.
Update – 3/27/09: VMware published a Knowledge Base article on this procedure a few weeks after I wrote this post. You can find it in article 1018740.
In Part I of this series, I discussed the important of storage performance in a virtual environment (really any environment, virtual or not, where you want acceptable performance), and introduced some of the basic measures of a storage environment. In Part II, we will look more closely at what may be the most important storage design consideration in a VMware server-consolidation enviornments, many SQL environments, and VDI environments to name a few: IOPS.
If we stick with a single-disk-centric approach as we did in Part I, IOPS is quite simply a measure of how many read and write commands a disk can complete in a second. IOPS is an important measure of performance in a shared storage environment (such as VMware) and in high-transaction-rate workloads like SQL. Because hard drives are forced to abide by the laws of physics, the IOPS capabilities of a disk are consistent and predictable given a specific configuration. The formula for calculating IOPS for a given disk is pretty straight forward (please show your work):
IOPS = 1000/(Seek Latency + Rotational Latency)
Exact latencies vary by disk type, quality, number of platters, etc. You can look up the tech specs for most drives on the market. As an example, I have randomly chosen the technical specifications of the Seagate Cheatah 15k.7 SAS drive. This particular drive has the following performance characteristics:
- Average (rotational) latency: 2.0msec
- Average read seek (latency): 3.4msec
- Average write seek (latency): 3.9msec
Using the read latency number, the math works out like this:
1000
———- = 185 maximum read IOPS
2.0+3.4
The maximum write IOPS will be a bit less (~169IOPS) because of the higher write seek latency. Writing is more ‘expensive’ than reading and therefore slower.
Fortunately, there are some widely accepted ‘working’ numbers, so you do not have to use this formula for each and every disk you might consider using. Because rotational latency is based on the rotational speed, we can use the published Rotations Per Minute (RPM) rating of the drive to guess-timate the IOPS capabilities. Typical spindle speeds (measured in RPM) and their equivalent IOPS are in the table below.
RPM………IOPS
7,200 80
10,000 130
15,000 180
SSD 2500 – 6000
While not a traditional spinning disk, I have also included Solid State Disks (SSD’s) for reference as SSD’s are starting to see increased market adoption. I have seen a wide range of sizing IOPS for SSD depending on the technology, type (SLC, MLC, etc.) Check out http://en.wikipedia.org/wiki/Solid-state_drive for an introduction, and ask your vendors for more in-depth technical information.
If you are brand-new to this (and you are still reading, congrats!), you can see how many IOPS your Windows computer is asking for by opening Performance Monitor and looking at the ‘Disk Transfers/sec’ counter under Physical Disk. This is a sum of the ‘Disk Reads/sec’ and ‘Disk Writes/sec’ counters as you can see in the screenshot below:
If you are after some stats for your VMware ESX environment, check out esxtop and looking for CMDS/s in the output. I published a couple articles on using esxtop here and here. The numbers from PerfMon and esxtop get you pretty close but can be skewed by a few things we’ll discuss in later posts.
Now that was fun and all, but let’s get real: Single-disk configurations are uncommon in servers. As such, we’ll part ways with our Simple Jack single disk approach to storage and begin to look at more real-world multi-disk enterprise-class storage configurations. A discussion of IOPS in a multi-disk array is a great way to start. From a very elementary perspective, you can combine multiple hard drives together to aggregate their performance capabilities. For example, two 15k RPM disks working together to server a workload could provide a theoretical 360 IOPS (180 + 180). This also scales out so ten 15k RPM disks could provide 1800 IOPS, and 100 15k RPM disks could provide 18,000 IOPS.
Designing your environment so that your storage can deliver sufficient IOPS to the requesting workload is of utmost importance. If you are working on a storage design, arm yourself with data from perfmon, top, iostat, esxtop, and vscsiStats. I typically gather at least 24 hours of performance data from systems under normal conditions (a few days to a week may be good if you have varying business cycles) and take the 95th percentile as a starting point. So from a very simple approach, if your data and calculations show a 1800 IOPS demand at the 95th percentile, you ought to have at least ten 15k RPM disks (or twenty-three 7.2k RPM SATA disks) to achieve performance goals. It’s amazing how some simple data and a pretty little Excel spreadsheet can help you understand and justify the right hardware for the job.
Now before you go and start filling out that PO form for a nice new storage system based on these numbers there are a few more things we ought to discuss. RAID, cache, and advanced storage technologies will skew these numbers and need to be understood. Stay tuned to future articles in this series for more on those topics and more.
Finally, there has been a bunch of activity in the VMware ecosystem of vendors, bloggers, and twittering-type-folks around storage performance. As this here post sat in my drafts folder, Duncan Epping posted this gem of an article that pretty much included all of the content of this article, as well as future ones in my series: http://www.yellow-bricks.com/2009/12/23/iops/. Do yourself a favor and read his post and the comments from his readers – both are filled with a ton of great information, including some vendor-specific implementations.
I was led to Duncan’s article by a post by Chad Sakac on his blog: http://virtualgeek.typepad.com/virtual_geek/2009/12/whats-what-in-vmware-view-and-vdi-land.html. This is also a great read that covers some of the same information with a focus on VMware View/VDI and is also worth a few minutes of your time. Also check out http://vpivot.com/2009/09/18/storage-is-the-problem/ for a rubber-meets-the-road post from Scott Drummonds on the importance of storage performance vis-a-vis IOPS in a VMware-virtualized SQL environment.








