January 28, 2013

I’ve written in the past about how high CPU Ready values can cause performance problems in VMware vSphere environments. For those who don’t know, CPU Ready is a measure of the amount of time that a guest VM is ready to run, but the VMware ESXi CPU Scheduler on the host is not able to immediately allocate cycles to the guest because it is busy doing work for other VM’s. CPU Ready values are exposed through ESXTOP and in the vSphere Client.

I’m often called into customer environments to do performance troubleshooting, and CPU Ready is one of the first performance measurements I check my first few minutes in the environment (I also look at memory balloon driver metrics, disk latency, CPU utilization and memory utilization of both hosts and guest VM’s). Unfortunately, I’m often called in after the excrement has made physical contact with a hydro-electric powered oscillating air current distribution device, and the customer is demanding a quick fix. Checking a few basic metrics in the vSphere Client is often enough to put me on the trail of the problem.

Note that the summation value is shown on hosts, guest VM’s and guest vCPU’s in the vSphere client. The different counters have slightly different meanings. Host CPU Ready might be a bit higher than an individual guest VM’s CPU Ready counter, for example. Host CPU ready is a good value to look at if all the VM’s are suffering performance issues. If just a single or a few VM’s are suffering performance issues, look at the guest VM CPU Ready value. The guest VM CPU Ready value is a summation of the CPU Ready of each vCPU on the guest.

As a rule of thumb, a Real-Time CPU Ready value of 10% or greater on a vCPU indicates declining performance for server workloads (I usually go with a bit lower value for VMware View virtual desktops (VDI) as users are much more likely to perceive CPU Ready on desktops that they are actively using than on a server they are connected to through a client-server setup). Theoretically, on VM’s with multiple vCPU’s the guest VM counter is safe to go beyond 10% so long as the per-vCPU counter is under 10%. For 2 vCPU VM’s the whole VM CPU Ready value can hit 20%, for a 4 vCPU 40%, etc. before we hit that 10% rule of thumb (Because the ESX CPU Scheduler has to co-schedule all vCPU’s on a VM, bigger VM’s are more prone to CPU Ready on hosts with CPU contention. This probably offsets the theoretical vCPU percentages).

The problem, however, is that the vSphere Client shows CPU Ready as a Summation of Milliseconds of CPU Ready for the Sampling Period. Summation of milliseconds is not always an easy value to wrap your head around as the impact of the number changes depending on the VM configuration, the charting period (View) / sampling interval. In some cases a summation value of 2000 can indicate problems, and in other views 1,000,000 may be ok.

In the vSphere client, the chart/graph’s are shown with an update interval. The summation values are for the entire interval. For the ‘Realtime’ interval, we’re really looking at 20 second time slices. On the Past Day view, the interval is 5 minutes (300 seconds). Past week is 30 minutes, past month is 2 hours, and past year is 1 day.

A little math is needed to convert the summation of milliseconds value to a percentage value – an easier number to understand and compare. I covered how to convert the summation value to a percent here: High CPU Ready, Poor Performance. VMware one-up’d me ( 😉 ) by publishing a KB article a couple years ago that presented the same formula for converting summation in the vSphere Client to a percentage. The formula goes like this: [Read more…] about CPU Ready Revisted – Quick Reference Charts