<?xml version="1.0" encoding="UTF-8"?> <rss
version="2.0"
xmlns:content="http://purl.org/rss/1.0/modules/content/"
xmlns:wfw="http://wellformedweb.org/CommentAPI/"
xmlns:dc="http://purl.org/dc/elements/1.1/"
xmlns:atom="http://www.w3.org/2005/Atom"
xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
><channel><title>VMtoday &#187; Issues &amp; Troubleshooting</title> <atom:link href="http://vmtoday.com/category/vmware/issues-troubleshooting/feed/" rel="self" type="application/rss+xml" /><link>http://vmtoday.com</link> <description>VMware News, Views, &#38; How-To&#039;s from vExpert Josh Townsend</description> <lastBuildDate>Thu, 09 Feb 2012 14:43:42 +0000</lastBuildDate> <language>en</language> <sy:updatePeriod>hourly</sy:updatePeriod> <sy:updateFrequency>1</sy:updateFrequency> <generator>http://wordpress.org/?v=3.3.1</generator> <item><title>vSphere 5 Networking Bug Affects Software iSCSI</title><link>http://vmtoday.com/2012/02/vsphere-5-networking-bug-affects-software-iscsi/?utm_source=rss&#038;utm_medium=rss&#038;utm_campaign=vsphere-5-networking-bug-affects-software-iscsi</link> <comments>http://vmtoday.com/2012/02/vsphere-5-networking-bug-affects-software-iscsi/#comments</comments> <pubDate>Wed, 08 Feb 2012 20:33:54 +0000</pubDate> <dc:creator>Joshua Townsend</dc:creator> <category><![CDATA[Issues & Troubleshooting]]></category> <category><![CDATA[VMware]]></category> <category><![CDATA[VMware How To]]></category> <category><![CDATA[esxi 5]]></category> <category><![CDATA[iscsi]]></category> <category><![CDATA[network]]></category> <category><![CDATA[networking]]></category> <category><![CDATA[NIC]]></category> <category><![CDATA[troubleshooting]]></category> <category><![CDATA[vDS]]></category> <category><![CDATA[vmknic]]></category> <category><![CDATA[vmnic]]></category> <category><![CDATA[vsphere 5]]></category> <category><![CDATA[vswitch]]></category><guid
isPermaLink="false">http://vmtoday.com/?p=854</guid> <description><![CDATA[I recently stumbled on two vSphere 5 ESXi networking bugs that I thought I would share. The issues are very similar from a cursory level, but have different symptoms, troubleshooting steps, and implications for your architecture, so I’m going to split the issues into two separate posts. Because troubleshooting these issues was a real pain, [...]]]></description> <content:encoded><![CDATA[<p></p><p>I recently stumbled on two vSphere 5 ESXi networking bugs that I thought I would share. The issues are very similar from a cursory level, but have different symptoms, troubleshooting steps, and implications for your architecture, so I’m going to split the issues into two separate posts. Because troubleshooting these issues was a real pain, I’ll provide some details on how to identify these issues in your environments and wrap up with a third post on what I believe to be some best practices to avoid these same problems and achieve greater redundancy and resiliency in your vSphere environments.</p><p><strong><span
style="text-decoration: underline;">The Problem</span></strong></p><p>Today, we’ll look at an ESXi 5 networking issue that caused massive iSCSI latency, lost iSCSI sessions, and lost network connectivity. I’ve been able to reproduce this issue in several environments, on different hardware configurations. Here’s the background information on how all this started: I upgraded an ESXi 4.1 host to ESXi 5 using vSphere Update Manager (VUM). Note that I did use the host upgrade image that contained the <a
href="http://kb.vmware.com/selfservice/microsites/search.do?language=en_US&amp;cmd=displayKC&amp;externalId=2007108">ESXi500-201109001 iSCSI fixes</a> – if you are upgrading to vSphere 5 and have iSCSI in your environment, use this image. Here’s a quick look at how the networking was configured on this host:</p><p>&nbsp;</p><p>The iSCSI networking was configured in a very typical setup, and per best practices, as outline in <a
href="http://pubs.vmware.com/vsphere-50/topic/com.vmware.vsphere.storage.doc_50/GUID-8AE88758-20C1-4873-99C7-181EF9ACFA70.html">VMware’s documentation</a>, as well as from many vendors (see EMC’s Chad Sakac’s ‘<a
href="http://virtualgeek.typepad.com/virtual_geek/2009/09/a-multivendor-post-on-using-iscsi-with-vmware-vsphere.html">A Multivendor Post on using iSCSI with VMware vSphere</a>’), with two vmnic uplinks, two vmknics, with one active adapter on the correct layer-2/layer-3 network, and the other unused.</p><p><a
href="http://cloudfront.vmtoday.com/wp-content/uploads/2012/02/iSCSI1-config1.jpg" rel="lightbox[854]"><img
class="aligncenter size-full wp-image-864" title="vSwitch iSCSI vmknic override failover order with unused NIC" src="http://cloudfront.vmtoday.com/wp-content/uploads/2012/02/iSCSI1-config1.jpg" alt="vSwitch iSCSI vmknic override failover order with unused NIC" width="533" height="602" /></a><a
href="http://cloudfront.vmtoday.com/wp-content/uploads/2012/02/iscsi2-config1.jpg" rel="lightbox[854]"><img
class="aligncenter size-full wp-image-863" title="vSwitch iSCSI vmknic override failover order with unused NIC" src="http://cloudfront.vmtoday.com/wp-content/uploads/2012/02/iscsi2-config1.jpg" alt="vSwitch iSCSI vmknic override failover order with unused NIC" width="533" height="602" /></a></p><p>After the upgrade, the standard vSwitch with two vmnics for uplinks (Broadcom NetXtreme II BCM5709 1000Base-T) and two vmknics that serviced the software iSCSI adapter failed to pass traffic (vmkping to the iSCSI targets failed) and could not mount ANY iSCSI LUN&#8217;s. VM network, management, and vMotion ports were not affected.</p><p>If I let the host sit long enough, it *might* find a couple paths to the storage, but even then performance was deteriorated per the vmkernel.log:</p><pre>WARNING: ScsiDeviceIO: 1218: Device naa.60026b90003dcebb000003ca4af95792 performance has deteriorated. I/O latency increased from average value of 5619 microseconds to 495292 microseconds.</pre><p><strong><span
style="text-decoration: underline;">Troubleshooting</span></strong></p><p>I’m going to dump a whole bunch of my troubleshooting steps on you – hopefully they not only help folks dealing with this particular bug, but help with general network and configuration troubleshooting in VMware vSphere. During troubleshooting, I removed the vmk binding for these two on the iSCSI adapter, removed the software iSCSI Adapter itself, removed the vmknics on the vSwitch, and removed the vSwitch itself. I then recreated the vSwitch, set vSwitch MTU to 9000, recreated two vmk ports, set 9000MTU, assigned IP, and set failover order for multipath iSCSI. I then re-created the software iSCSI adapter and bound the two vmk ports. I was able to pass vmk traffic and mount iSCSI LUN&#8217;s. Great – problem solved!?!?! Not so much &#8211; I rebooted the host and the problem returned.</p><p>Here are my next troubleshooting steps:</p><ul><li>I repeated the procedure above and re-gained connectivity, but the problem returns on subsequent reboots. I can verifiably recreate the problem.</li><li>I verified end-to-end connectivity for other hosts on the same Layer 1, Layer 2, and Layer 3 network as the iSCSI initiator and iSCSI targets.</li><li>I verified the ESXi host’s networking configuration using the vSphere client, double-checking the vSwitch, vmnic uplinks, and vmknic configurations. Everything looked good so I canceled out.</li><li>I then reinstalled ESXi from scratch (maybe something was left over from 4.1 that a clean install would weed out), built up the same configuration, and was again able to re-create the problem.</li><li>I poured over logs (vmkernel.log, syslog.log and storagerm.log primarily). I could see an intermittent loss of storage connectivity, failure to log into the storage targets (duh – there is no connectivity, no vmkping) and high storage latency on hosts where I had rebuilt the iSCSI stack and run a few VM’s.</li><li>I switched out the Broadcom NIC for an Intel NIC (the Broadcom had hardware iSCSI capabilities – I wanted to be sure the hardware iSCSI was not interfering).</li><li>I verified TOE was enabled.</li></ul><p><strong><span
style="text-decoration: underline;">The ‘Ah-Ha’ Moment</span></strong></p><p>Next, I verified the ESXi host’s networking configuration using the vSphere client one more time – the properties of the vSwitch, the properties of the vmkernel (vmk) ports, the manual NIC teaming overrides, IP addressing, etc. Everything looked correct – I MADE NO CHANGES – but when I clicked <strong><span
style="text-decoration: underline;">OK</span></strong> (last time I canceled) to close the vSwitch properties and was greeted with this warning:</p><p><a
href="http://cloudfront.vmtoday.com/wp-content/uploads/2012/02/changing-an-iscsi-initiator-port-group-warning.jpg" rel="lightbox[854]"><img
class="aligncenter size-full wp-image-855" title="changing an iscsi initiator port group warning" src="http://cloudfront.vmtoday.com/wp-content/uploads/2012/02/changing-an-iscsi-initiator-port-group-warning.jpg" alt="changing an iscsi initiator port group warning" width="480" height="214" /></a></p><p>Wait a second… I didn’t change anything, why am I being prompted with a you’re ‘Changing an iSCSI Initiator Port Group’ warning? I like to live dangerously, and wanted to see what would happen, so I said ‘Yes’.</p><p>Much to my surprise, after only viewing and closing the vSwitch and iSCSI vmk port group settings, I was able to complete a vmkping on the iSCSI-bound vmk’s. And moreover, I completed a Rescan of all storage adapters and my iSCSI LUN’s were found, mounted, and ready for use. Problem solved? Nope. The same ugly issue re-appeared after a reboot.</p><p>While the problem wasn’t solved, I now had something to work with. My go-to troubleshooting question “What Changed?” could maybe be answered. Even though I didn’t change anything in the vSwitch Properties GUI, something changed. To see what changed in the background, I compared the output of the following ESXi Shell (or vCLI, or PowerCLI) commands before and after making ‘the change’ happen (by viewing the properties of the vSwitch/vmk ports), but found no changes.</p><ul><li>esxcfg-vswitch -l</li><li>esxcfg-vmknic -l</li><li>esxcfg-nics -l</li></ul><p>Then, I made backup copy of esx.conf</p><pre> cp /etc/vmware/esx.conf /etc/vmware/esx.conf.bak</pre><p>Then I caused ‘the change’ and then compared checksums using md5sum, but found no differences:</p><pre> md5sum /etc/vmware/esx.conf /etc/vmware/esx.conf.bak</pre><p>I compared the running .conf and the backup .conf, but found no differences:</p><pre> diff /etc/vmware/esx.conf /etc/vmware/esx.conf.bak</pre><p><strong><span
style="text-decoration: underline;">Call in Air Support</span></strong><br
/> At this point, I was out of ideas so I called for help: “Hello, 1-866-4VMWARE, option 4, option 2 – help!”</p><p>After repeating many of the same troubleshooting steps, the support engineer decided that I had hit on a known, and not yet patched, bug. The details of the bug are included in <a
title="Incorrect NIC failback occurs when an unused uplink is present" href="http://kb.vmware.com/selfservice/microsites/search.do?language=en_US&amp;cmd=displayKC&amp;externalId=2008144" target="_blank">KB 2008144: Incorrect NIC failback occurs when an unused uplink is present</a>. That’s right – my iSCSI traffic, vmkpings, etc were being sent down the wrong NIC – the <em>UNUSED</em> NIC. Ouch. The bug caused the networking stack to behave in a very unpredictable way, making my troubleshooting steps next to useless, and any other advanced troubleshooting ideas I had (sniffing, logs, etc.)</p><p>Once I knew what the issue was, I could see a bit of evidence in the logs:</p><pre>WARNING: VMW_SATP_LSI: satp_lsi_pathIsUsingPreferredController:714:Failed to get volume access control data for path "vmhba33:C0:T0:L4": No connection

NMP: nmp_DeviceUpdatePathStates:547: Activated path "<span style="color: #ff0000;">NULL</span>" for NMP device "naa.60026b90003dcebb0000c7454d5cc946".

WARNING: ScsiPath: 3576: Path vmhba33:C0:T0:L4 is being removed</pre><p>Notice the <span
style="color: #ff0000;">NULL</span> path – the path can’t be interpreted correctly when being sent down the wrong (unsued) vmnic that is on a different subnet and VLAN. The gotcha on this issue is that I had followed best practices where applicable, and accepted default settings on the vSwitch and vmknics.</p><p><strong><span
style="text-decoration: underline;">The Quick Fix</span></strong><br
/> <a
title="Incorrect NIC failback occurs when an unused uplink is present" href="http://kb.vmware.com/selfservice/microsites/search.do?language=en_US&amp;cmd=displayKC&amp;externalId=2008144" target="_blank">VMware KB 2008144</a> offers two workaround for this bug. The quick fix for the problem is to simply change the Failback setting on either the vSwitch running the software iSCSI vmknic’s to “<strong>No</strong>” (default is yes), or to change the setting on the vmknic itself if you have other port groups on the vSwitch (such as a VM Network port group to give your guest VM’s access to the iSCSI network).</p><p><a
href="http://cloudfront.vmtoday.com/wp-content/uploads/2012/02/failback-No.jpg" rel="lightbox[854]"><img
class="aligncenter size-full wp-image-859" title="Change vSwitch or Portgroup Failback" src="http://cloudfront.vmtoday.com/wp-content/uploads/2012/02/failback-No.jpg" alt="Change vSwitch or Portgroup Failback" width="536" height="663" /></a></p><p>Changing Failback = No on the iSCSI vmknics and then rescanning the storage adapters fix the glitch immediately.</p><p><strong><span
style="text-decoration: underline;">Architecture Changes</span></strong><br
/> The second workaround from VMware is “Do not have any unused NICs present in the team.”. This translates to a slightly different architecture than that described in many documents. To achieve this workaround, the configuration would have to change to two vSwitches, each with a single vmnic uplink and a single vmk port, bound to the iSCSI adapter. This change does not impact redundancy or availability when compared with the single-vSwitch:two-vmk configuration that I was running with as one of the vmnics was set to unused anyway. This workaround does add a bit more complexity, as there are a few more elements to configure, monitor, manage, and document.</p><p>&nbsp;</p><p>This problem seems to only present itself on vSphere Standard Switches (vSwitch), although I could not get confirmation of this (please post a comment if you know!). Assuming this is true, a vDistributed Switch (vDS) could be used for Software iSCSI traffic. Mike Foley has a write-up on how to migrate iSCSI from a vSwitch to a vDS on his blog here: <a
title="Dr. iSCSI or How I learned to stop worrying and love virtual distributed switches on vSphere V5" href="http://www.yelof.com/?p=72" target="_blank">http://www.yelof.com/?p=72</a>.</p><p><strong><span
style="text-decoration: underline;">A Couple More Notes</span></strong><br
/> My troubleshooting fix of viewing the vSwitch settings and clicking ok seemed to temporarily resolve the issues because it triggered an up/down event on the vmk of the unused uplink. This caused the network stack to re-evaluate paths and start using the correct, Active, uplink.</p><p>Note that this problem can occur outside of my iSCSI use case – any vSwitch, Port Group, or VMKNIC with an unused adapter set in the NIC Teaming Failover Order are susceptible to this bug, so watch for it on redundant vMotion networks (vMotion randomly fails), VM Network networks (sudden loss of guest connectivity), or even your management network (hosts fall out of manageability from vCenter, and can’t be contacted via SSH, vSphere client, etc.<br
/> Leave a comment if you’ve experienced this bug – your notes on the problem may help others find and fix the issue until VMware releases a fix. I understand that a fix for this particular bug is not due out until at least vSphere 5 Update 1.</p><p>I&#8217;ll have another (shorter) writeup on the 2nd networking bug I found in ESXi 5 later in the week &#8211; check back here for a link once it is published.</p> ]]></content:encoded> <wfw:commentRss>http://vmtoday.com/2012/02/vsphere-5-networking-bug-affects-software-iscsi/feed/</wfw:commentRss> <slash:comments>5</slash:comments> </item> <item><title>vCenter Crashes After Applying ESXi Patch ESXi410-201010401-SG</title><link>http://vmtoday.com/2010/12/vcenter-crashes-after-applying-esxi-patch-esxi410-201010401-sg/?utm_source=rss&#038;utm_medium=rss&#038;utm_campaign=vcenter-crashes-after-applying-esxi-patch-esxi410-201010401-sg</link> <comments>http://vmtoday.com/2010/12/vcenter-crashes-after-applying-esxi-patch-esxi410-201010401-sg/#comments</comments> <pubDate>Fri, 10 Dec 2010 18:19:32 +0000</pubDate> <dc:creator>Joshua Townsend</dc:creator> <category><![CDATA[Issues & Troubleshooting]]></category> <category><![CDATA[VMware]]></category> <category><![CDATA[VMware How To]]></category> <category><![CDATA[bug]]></category> <category><![CDATA[DPM]]></category> <category><![CDATA[DRS]]></category> <category><![CDATA[HA]]></category> <category><![CDATA[Patch]]></category> <category><![CDATA[vcenter]]></category> <category><![CDATA[vsphere]]></category><guid
isPermaLink="false">http://vmtoday.com/?p=639</guid> <description><![CDATA[My last post described a problem I experienced with VMware HA after upgrading to vSphere 4.1.  Here is my experience with a similar issue after applying the ESXi410-201010401-SG patch to one of my test/dev ESXi clusters.  The patch, released on November 15th and weighing in at a hefty 212MB, fixes a number of issues from [...]]]></description> <content:encoded><![CDATA[<p></p><p>My <a
title="HA Errors after vSphere 4.1 Upgrade" href="http://vmtoday.com/2010/12/ha-errors-after-vsphere-4-1-upgrade/" target="_blank">last post</a> described a problem I experienced with VMware HA after upgrading to vSphere 4.1.  Here is my experience with a similar issue after applying the <a
title="VMware ESXi 4.1 Patch ESXi410-201010401-SG: Updates Firmware" href="http://kb.vmware.com/kb/1027021" target="_blank">ESXi410-201010401-SG</a> patch to one of my test/dev ESXi clusters.  The patch, released on November 15th and weighing in at a hefty 212MB, fixes a number of issues from Likewise authentication on ESXi hosts to allowing configurable NOOP timout and interval values for faster failover of certain iSCSI arrays (<a
title="IBM DS3300 iSCSI Write Performance Solved" href="http://vmtoday.com/2009/06/ibm-ds3300-iscsi-write-performance-solved/" target="_blank">like the DS3300 or MD3000i</a>).</p><p>The environment where this problem occured has a single vCenter server managing both a production cluster and the test/dev cluster.  After applying this particular update to the ESXi hosts in the cluster, the vCenter server began to crash every 5 minutes or so.  The crash was logged on the vCenter server with Event ID 7031: The VMware VirtualCenter Server service terminated unexpectedly.  My go-to troubleshooting question (&#8220;What changed?&#8221;) pointed at the ESXi patch, but a VMware KB search and a little <a
title="why vcenter no worky?" href="http://lmgtfy.com/?q=why+vcenter+no+worky%3F" target="_blank">Google</a> action yielded no results directly related to ESXi410-201010401-SG and the vCenter Server service terminating unexpectedly.  <a
title="Troubleshooting the VMware VirtualCenter Server service when it does not start or fails" href="http://kb.vmware.com/kb/1003926" target="_blank">VMware KB article 1003926</a> provides some basic troubleshooting steps for vCenter Server, such as checking for port conflicts, vCenter DB health &amp; availability, and log locations.  The environment was healthy until the patch was applied to a sub-set of my ESXi hosts so I could confidently eliminate credentials, port conflicts and the like as the cause of the problem, so I jumped right to the log files for vCenter.  The vpxd-*.log is found in &#8220;C:\ProgramData\VMware\VMware VirtualCenter\Logs&#8221; on Windows 2008 vCenter servers and &#8220;%ALLUSERSPROFILE%\VMware\VMware VirtualCenter\Logs\vpxd.log&#8221; on Windows 2003 servers.  I found a few lines of interest in the log file but decided I had better call VMware Support to further analyze the issue.</p><p>To make a long story short, what the logs revealed is a bug that is triggered whenever VMware Distributed Resource Scheduler (DRS) ran on the updated test/dev cluster.  Disabling DRS stopped the symptom of the vCenter Server Service terminating unexpectedly, but this was obviously not a long-term solution.  A bit more digging by my VMware support rep led to VMware Distributed Power Management (DPM) being enabled on the cluster as the root cause of the issue.  Disabling DPM but leaving DRS enabled on the cluster fixed the glitch.  I can live without DPM, but DRS is pretty darn handy.</p><p>At this point, VMware engineering knows about the issue, and a fix is planned for vCenter 4.1 Update 1.  Interesting that DPM was fingered in this case, as well as in <a
title="HA Errors after vSphere 4.1 Upgrade" href="http://vmtoday.com/2010/12/ha-errors-after-vsphere-4-1-upgrade/" target="_blank">the case I wrote about last week</a> where HA and DPM apparently do not always play well together.  It seems like DPM is not fully baked, even though it is now officially supported.  This is unfortunate as DPM is promising to me &#8211; I can imagine the technology behind DPM being used for intelligent load shedding during peak electrical cost hours, power outages, or cooling outages in datacenters with some good integration between a DPM API and environmental management and monitoring systems like APC&#8217;s NetBotz.  Anyone else using DPM without having problems?  Any ideas for extending DPM or leveraging it for other purposes in the datacenter &#8211; I&#8217;d love to hear ideas in the comments.</p> ]]></content:encoded> <wfw:commentRss>http://vmtoday.com/2010/12/vcenter-crashes-after-applying-esxi-patch-esxi410-201010401-sg/feed/</wfw:commentRss> <slash:comments>1</slash:comments> </item> <item><title>HA Errors after vSphere 4.1 Upgrade</title><link>http://vmtoday.com/2010/12/ha-errors-after-vsphere-4-1-upgrade/?utm_source=rss&#038;utm_medium=rss&#038;utm_campaign=ha-errors-after-vsphere-4-1-upgrade</link> <comments>http://vmtoday.com/2010/12/ha-errors-after-vsphere-4-1-upgrade/#comments</comments> <pubDate>Tue, 07 Dec 2010 18:11:32 +0000</pubDate> <dc:creator>Joshua Townsend</dc:creator> <category><![CDATA[Issues & Troubleshooting]]></category> <category><![CDATA[VMware]]></category> <category><![CDATA[VMware How To]]></category> <category><![CDATA[DPM]]></category> <category><![CDATA[HA]]></category> <category><![CDATA[High Availability]]></category> <category><![CDATA[troubleshooting]]></category> <category><![CDATA[vsphere]]></category><guid
isPermaLink="false">http://vmtoday.com/?p=632</guid> <description><![CDATA[Troubleshooting &#038; fixing VMware High Availability (HA) error 'Error &#60;date&#62; &#60;time&#62; HA agent on &#60;host&#62; in cluster &#60;clustername&#62; in &#60;datacenter&#62; has an error: Error while running health check script' on a vSphere 4.1 cluster.]]></description> <content:encoded><![CDATA[<p></p><p>I recently ran into an issue with one of my vSphere clusters after upgrading from vSphere 4.0 to vSphere 4.1 (with ESXi 4.1 and vCenter 4.1).  After the upgrade, I attempted to enable VMware High Availability (HA) on the upgraded cluster.  Each of the ESXi hosts in the cluster appeared to have been properly configured for HA (as observed in the &#8216;Recent Tasks&#8217; pane of the vSphere Client).  Despite having appeared to configure HA correctly, I found that each host in the cluster was displaying an error on the Summary tab of the vSphere Client that read &#8216;Error &lt;date&gt; &lt;time&gt; HA agent on &lt;host&gt; in cluster &lt;clustername&gt; in &lt;datacenter&gt; has an error: Error while running health check script&#8217;.</p><p>I&#8217;ve dealt with HA errors in the past, so I quickly jumped into my standard troubleshooting and quick-fixes proceedure:</p><ol><li>Verify host connectivity.</li><li>Right-click on each host and choose &#8216;Reconfigure for VMware HA&#8217;</li><li>Disable &amp; Re-enable HA on the cluster.</li><li>Disable HA, place hosts into Maintenance Mode &amp; Reboot (one at a time).  Re-enable HA.</li><li>Get frustrated that a quick fix is not probably not in my future&#8230;.</li><li>Verify host name resolution for each host in the cluster from the service console/tech support mode of each host.</li><li>Review log files on vCenter Server and each host for glaring issues.  All Greek to me in this case&#8230;.</li><li>Call VMware Support.</li></ol><p>VMware Support reviewed the log files I had attached to my Service Request (SR) when I opened the case and had me try a few different things to fix the issue.  First, we verified the steps I had taken and collected some fresh logs.  Next, the support rep had me verify that Distributed Power Management (DPM) was not enabled on the cluster as there apparently is a known issue (although a KB is not available at this time) with configuring HA when DPM is enabled under certain circumstances.  I did not have DPM enabled on this particular cluster so I didn&#8217;t spend time chasing down this particular bug.</p><p>Finally, the following proceedure, run on each ESXi server in the cluster, resolved the issue (Note &#8211; this procedure is safe to do during normal operations as it does not affect running VM&#8217;s):</p><ol><li>Verify SSH or Console access to the host (this requires enabling Remote SSH/Tech Support Mode on ESXi hosts on the Configuration tab | Security Profile node of the vSphere Client, or by pressing F2 to login to ESXi 4.1 | troubleshooting options | enable remote SSH.</li><li>Disable HA on the affected cluster.</li><li>Right-click | Disconnect each host in the cluster from the &#8216;Hosts &amp; Clusters&#8217; view of the vSphere Client.</li><li>SSH to the host and run the following commands:</li><blockquote><p>services.sh stop<br
/> /opt/vmware/uninstallers/VMware-vpxa-uninstall.sh<br
/> /opt/vmware/uninstallers/VMware-aam-ha-uninstall.sh<br
/> services.sh start</p></blockquote><li>In the vSphere Client, right-click on each host and Connect.</li><li>Enable HA on the cluster.</li></ol><p>This procedure cleanly removes the VMware vCenter agent and the VMware HA agent from the ESX or ESXi host.  Reconnecting the host to vCenter pushes the vCenter management agent back to the host and installs it cleanly.  Enabling HA on the cluster re-installs the HA agent.  After completing these steps I had no further issues with HA on the cluster &#8211; case closed.  I hope this is helpful for anyone else who might be experiencing HA errors after upgrading to vSphere 4.1.</p><p>For those wanting to learn HA best practices or go a bit deeper into the inner workings of VMware HA, I highly recommend Duncan Epping&#8217;s <a
title="HA Deepdive" href="http://www.yellow-bricks.com/vmware-high-availability-deepdiv/" target="_blank">VMware HA Deep Dive article</a> and/or <a
title="VMware vSphere 4.1 HA and DRS Technical deepdive (Volume 1)" href="http://www.amazon.com/gp/product/1456301446?ie=UTF8&amp;tag=vm09-20&amp;linkCode=as2&amp;camp=1789&amp;creative=9325&amp;creativeASIN=1456301446 " target="_blank">VMware vSphere 4.1 HA and DRS Technical Deepdive (Volume 1) book</a>.</p> ]]></content:encoded> <wfw:commentRss>http://vmtoday.com/2010/12/ha-errors-after-vsphere-4-1-upgrade/feed/</wfw:commentRss> <slash:comments>2</slash:comments> </item> <item><title>High CPU Ready, Poor Performance</title><link>http://vmtoday.com/2010/08/high-cpu-ready-poor-performance/?utm_source=rss&#038;utm_medium=rss&#038;utm_campaign=high-cpu-ready-poor-performance</link> <comments>http://vmtoday.com/2010/08/high-cpu-ready-poor-performance/#comments</comments> <pubDate>Wed, 25 Aug 2010 19:52:07 +0000</pubDate> <dc:creator>Joshua Townsend</dc:creator> <category><![CDATA[Issues & Troubleshooting]]></category> <category><![CDATA[VMware]]></category> <category><![CDATA[VMware How To]]></category> <category><![CDATA[best practices]]></category> <category><![CDATA[cpu ready]]></category> <category><![CDATA[esxtop]]></category> <category><![CDATA[performance]]></category> <category><![CDATA[troubleshooting]]></category> <category><![CDATA[vsphere]]></category><guid
isPermaLink="false">http://vmtoday.com/?p=566</guid> <description><![CDATA[I ran into an issue with a customer today where a VM was performing terribly.  From within the guest OS (a Windows 2003 application server running .NET in IIS which I will call BigBadServer) things appeared sluggish and CPU time was high.  The amount of time being spent on the kernel was notably high.  The [...]]]></description> <content:encoded><![CDATA[<p></p><p>I ran into an issue with a customer today where a VM was performing terribly.  From within the guest OS (a Windows 2003 application server running .NET in IIS which I will call BigBadServer) things appeared sluggish and CPU time was high.  The amount of time being spent on the kernel was notably high.  The VM in question had 4 vCPU’s and a good helping of memory.</p><p><a
href="http://cloudfront.vmtoday.com/wp-content/uploads/2010/08/highkerneltime.png" rel="lightbox[566]"><img
class="aligncenter size-medium wp-image-589" title="high kernel time" src="http://cloudfront.vmtoday.com/wp-content/uploads/2010/08/highkerneltime-220x300.png" alt="high kernel time in perfmon" width="220" height="300" /></a></p><p>I don’t have access to the VMware client at this particular site – just some of the guests, so I was flying blind.  Gut feeling told me that I was dealing with a resource contention issue.  I had the VMstats provider running in the guest (<a
href="http://vpivot.com/2009/09/17/using-perfmon-for-accurate-esx-performance-counters/">http://vpivot.com/2009/09/17/using-perfmon-for-accurate-esx-performance-counters/</a>) showed me that there was no ballooning or swapping going on, and that the vCPU’s were not limited and the CPU share value seemed to be at the default.</p><p>I strongly suspected that the physical server running VMware ESX was oversubscribed on physical CPU (pCPU) resources.  Essentially, the guest VM’s that are sharing the resources of the physical machine are demanding more resources than the machine can handle.  To verify this theory, I had the client check the ‘CPU Ready’ metric on BigBadServer and bingo!</p><p>CPU Ready is a measure of the amount of time that the guest VM is ready to run against the pCPU, but the VMware CPU Scheduler cannot find time to run the VM because other VM’s are competing for the same resources.</p><p>From the stats the customer provided on our phone call, the CPU Ready for any one of the 4 vCPU’s on the BigBadServer was on average 3723ms (min: 1269ms, max:8491ms).  (Update 8/25/2010 to clarify summation stat) The summation for the entire VM was around 12,000ms on average and peaked around 35,000.  The stats came from the real-time performance  graph/table in the vSphere client. The real-time stats in the vSphere Client update every 20 seconds, so  the CPU Ready summation value  should be divided by 20,000 to get a  percentage of CPU ready for the 20 second time slice.  If I take the  worst case scenario of 8491ms per vCPU, this VM spent nearly 43%  (8491/20,000) of the 20 second time slice waiting for CPU resources.</p><p>The CPU Ready summation in milliseconds counter in the vCenter Client is not always the most accurate or easy to interpret stat – to better quantify the problem it might be best to go to the ESX command line and run ESXTOP.  CPU Ready over 5% could be a sign of trouble, over 10% and there is a problem.  Running ESXTOP in batch mode and then analyzing the output using Windows Perfmon or Excel might be a good way to go on this to get a view over several hours rather than the realtime stats we were looking at.  I wrote a post a while back with more info on ESXTOP batch mode: <a
href="../2009/09/esxtop-batch-mode-windows-perfmon/">http://vmtoday.com/2009/09/esxtop-batch-mode-windows-perfmon/</a></p><p>To help quantify the problem a bit more, the BigBadServer is on an ESX 4.0 server with about 10 other servers.  The physical blade has two dual-core CPU’s (AMD Opteron 2218HE’s which are not hyperthreaded).  The other VM’s on the blade have different vCPU and vMemory configurations.  3 VM’s (including BigBadServer) have 4 vCPU’s.  A couple have 2 vCPU’s, and the remainder are configured with 1 vCPU.  In ESX 4.x, the VMware console OS actually runs as a hidden VM, pegged to pCPU #1.</p><p>I generally recommend a pCPU:vCPU ration of 1:4 for mid-sized VMware deployments of single vCPU VM’s.  The blade we are running on is a 1:5 with several multi-vCPU VM’s.  The multi-vCPU’s start to skew the ratio recommendation and require some advanced design decisions.  VMware’s scheduler requires that all the vCPU’s on a VM run concurrently (even if the Guest OS is trying to execute a single thread).  Also, the VMware CPU Scheduler prefers to have all the vCPU’s from a VM run on the same pCPU.  As workloads are bounced around between pCPU’s, the benefits of CPU cache are lost.  This is one of those ‘<a
title="Balloon Driver Problems with SQL" href="http://vmtoday.com/2009/09/balloon-driver-problems-with-sql/">more-is-less</a>’ situations that you run into on virtualized environments.</p><p>What this CPU Scheduler nonsense means in this case is that the 4 vCPU’s on BigBadServer have to wait until all logical pCPU’s on the box are idle (including the one that runs ESX itself) before it can run.  If ESX can’t accomplish that (we are experiencing resource contention) it starts prioritizing workloads according to what it can best run.  It is much easier to schedule the smaller VM’s, so it tends to run those on pCPU more frequently.  The larger VM’s tend to suffer a bit more than the smaller ones.  We are competing with 2 other VM’s with 4 vCPU’s that use up all of the logical pCPU’s when they need to run, as well as with the smaller VM’s.</p><p>I suggested a few ways to fix this issue for the BigBadServer web server:</p><ol><li>Using Shares and/or Reservations on the VM.  This probably won’t work in our situation as the physical server is too over-subscribed.  We might see a slight improvement in BigBadServer (or we might not see any change), but possibly at the extreme expense of the other VM’s sharing the blade.</li><li>Reduce the number of vCPU’s on BigBadServer AND the other multi-vCPU VM’s on the same physical server.  This would reduce resource contention and open up a whole bunch of scheduling options for the VMware CPU Scheduler.  This is the quickest/cheapest fix, but will not work if the VM’s really do need 4 vCPU’s.  A little workload analysis should determine which can be made smaller (the vCenter server graphs/stats should be enough for this).  For what it’s worth, by our analysis BigBadServer seems to be happier with 4 vCPU assuming we can run with a low CPU Ready on those 4.</li><li>Move the BigBadServer VM to a physical ESX server with fewer multi-vCPU VM’s so there is less contention.</li><li>Move the BigBadServer VM to a physical ESX server with quad-core pCPU’s (ideally two quad-cores or bigger).  This would give a lot more flexibility to the VMware CPU Scheduler and allow it to run quad-vCPU VM’s on the same pCPU for greater efficiency.</li><li>Split BigBadServer into 2 smaller VM’s – The server currently runs a couple sites.  We could split them onto two servers &#8211; one for Project1 and one for Proejct2.  This configuration would take some design, testing, and time but could scale out better, give more flexibility and availability in the long run.</li></ol><p>I’m not sure which way the customer will go on this one yet, but I feel good having armed them with enough knowledge and options to make an informed decision.</p><p>To avoid problems like this in the future, I recommend these rules of thumb:</p><ul><li>Design your hosts for your guests.  Taking your Guest VM sizes into account when designing your environment and choosing physical hardware is crucial if you need bigger VM’s.</li><li>Don’t make your VM’s bigger than you have to.  It is always easier to add resources than take them away.  Hot Add of CPU and Memory in vSphere make adding incredibly easy.</li><li>Monitor your environment for CPU Ready, Swapping, and other metrics that can indicate an inefficient design.</li><li>Call for help when you can’t figure out what is going on (I’m happy to help!).  VMware is super powerful, but some things can be downright backwards when it comes to resource allocation on a fixed set of hardware.</li></ul><p>If you are looking for some resources to help explain CPU Scheduling a bit more, I recommend:</p><ul><li>VMware’s Official documentation of CPU Scheduler in      vSphere 4.1 &#8211; <a
href="http://www.vmware.com/files/pdf/techpaper/VMW_vSphere41_cpu_schedule_ESX.pdf">http://www.vmware.com/files/pdf/techpaper/VMW_vSphere41_cpu_schedule_ESX.pdf</a>.</li><li>A nice summary of co-scheduling from VMware’s      Performance Blog: <a
href="http://blogs.vmware.com/performance/2008/06/esx-scheduler-s.html">http://blogs.vmware.com/performance/2008/06/esx-scheduler-s.html</a></li><li>Description and stats on Ready Time metrics for VI3: <a
title="VMware Performance Study on Ready Time Observations" href="http://www.vmware.com/pdf/esx3_ready_time.pdf" target="_blank">http://www.vmware.com/pdf/esx3_ready_time.pdf</a></li><li>Understanding Virtual Center Performance Statistics: <a
title="Understanding Virtual Center Performance Statistics" href="http://communities.vmware.com/docs/DOC-5230.pdf" target="_blank">http://communities.vmware.com/docs/DOC-5230.pdf</a></li></ul><p>(Updated 8/25/2010 to include a few additional reference links and corrected summation divided by time slice to get accurate values)</p> ]]></content:encoded> <wfw:commentRss>http://vmtoday.com/2010/08/high-cpu-ready-poor-performance/feed/</wfw:commentRss> <slash:comments>10</slash:comments> </item> <item><title>EMC Virtual Storage Integrator Update</title><link>http://vmtoday.com/2010/07/emc-virtual-storage-integrator-update/?utm_source=rss&#038;utm_medium=rss&#038;utm_campaign=emc-virtual-storage-integrator-update</link> <comments>http://vmtoday.com/2010/07/emc-virtual-storage-integrator-update/#comments</comments> <pubDate>Tue, 06 Jul 2010 16:49:32 +0000</pubDate> <dc:creator>Joshua Townsend</dc:creator> <category><![CDATA[EMC]]></category> <category><![CDATA[Issues & Troubleshooting]]></category> <category><![CDATA[Storage]]></category> <category><![CDATA[VMware]]></category> <category><![CDATA[celerra]]></category> <category><![CDATA[clariion]]></category> <category><![CDATA[client]]></category> <category><![CDATA[plugin]]></category> <category><![CDATA[vsphere]]></category><guid
isPermaLink="false">http://vmtoday.com/?p=567</guid> <description><![CDATA[I upgraded my in-house VMware vSphere environment to 4.0 Update 2 last week.  After upgrading my vSphere Client to the Update 2 version I was greeted with a series of 7 ugly error messages stating: Method not found: &#8216;VMware.CustomControls.LabelEx VpxClient.Common.Util.Helper.AddLabel(System.Windows.Froms.Control,Int32, int32, System.String, System.Drawing.FontStyle, Boolean)&#8221;.I assumed a plug-in had caused the error message.  I started my [...]]]></description> <content:encoded><![CDATA[<p></p><p>I upgraded my in-house VMware vSphere environment to 4.0 Update 2 last week.  After upgrading my vSphere Client to the Update 2 version I was greeted with a series of 7 ugly error messages stating: Method not found: &#8216;VMware.CustomControls.LabelEx VpxClient.Common.Util.Helper.AddLabel(System.Windows.Froms.Control,Int32, int32, System.String, System.Drawing.FontStyle, Boolean)&#8221;.<a
href="http://cloudfront.vmtoday.com/wp-content/uploads/2010/07/emc-storage-viewer-vsphere-client-error.png" rel="lightbox[567]"><img
class="aligncenter size-medium wp-image-568" title="emc storage viewer vsphere client error" src="http://cloudfront.vmtoday.com/wp-content/uploads/2010/07/emc-storage-viewer-vsphere-client-error-300x107.png" alt="emc storage viewer vsphere client error" width="300" height="107" /></a>I assumed a plug-in had caused the error message.  I started my troubleshooting by disabling the 3rd-party plug-ins in the environment, beginning with the free EMC Storage Viewer.  Upon disabling the EMC Storage Viewer 2.x plug-in the problem was resolved.  I went out to EMC PowerLink to see if an update was available for the plug-in and was surprised to find that I had missed a major update/rebranding of the plug-in.  EMC now calls the plug-in the &#8216;EMC Virtual Storage Integrator&#8217;.  A hotfix was released on July 2nd to bring Update 2 support to the plug-in (version 3.0.0.32).</p><p>I updated the Solution Enabler installation (I installed Solution Enabler on my vCenter server, but it is also available as a SUSE based virtual appliance), and then updated the plug-in.</p><p><a
href="http://cloudfront.vmtoday.com/wp-content/uploads/2010/07/EMC_VSI_30032.png" rel="lightbox[567]"><img
class="aligncenter size-medium wp-image-569" title="EMC_VSI_30032" src="http://cloudfront.vmtoday.com/wp-content/uploads/2010/07/EMC_VSI_30032-300x228.png" alt="EMC Virtual Storage Integrator" width="300" height="228" /></a>The update appeared to install without any problems.  The vSphere Client launched like a champ after the update &#8211; no errors, but no EMC Storage plug-in either.  Odd.  The problem occurred on both my vCenter Server&#8217;s vSphere client and my workstation, so seems to be more than an isolated issue.</p><p>I did an uninstall of the plug-in using Add/Remove Programs, and then reinstalled it.  After the reinstall the EMC Storage plug-in icon appeared in my vSphere Client as pictured below.</p><p><a
href="http://cloudfront.vmtoday.com/wp-content/uploads/2010/07/emc-storage-viewer-icon.png" rel="lightbox[567]"><img
class="aligncenter size-medium wp-image-570" title="emc storage viewer icon" src="http://cloudfront.vmtoday.com/wp-content/uploads/2010/07/emc-storage-viewer-icon-300x120.png" alt="emc storage viewer icon in vSphere Client" width="300" height="120" /></a></p><p>The EMC Virtual Storage Integrator (VSI) plug-in is free &#8211; installing it is a no-brainer for anyone running Clariion or Celerra storage arrays.  The VSI simplifies the job of mapping vSphere Datastores to LUN&#8217;s and NFS shares on your EMC storage, and helps pinpoint the location of VM&#8217;s and RDM&#8217;s on your array.  This visibility for the VMware administrator into the storage layer can go a long way in helping troubleshoot storage performance issues and simplifying communication between server, storage, and virtulization teams.</p><p>EMC actually offers three different free vSphere plug-ins, including the VSI.  The EMC Unified Block plug-in and the EMC Unified NAS plug-in round out the trio.  EMC&#8217;s Virtual Geek, Chad Sakac, covers all three in his blog post here: <a
title="Update on EMC vSphere plugins…" href="http://virtualgeek.typepad.com/virtual_geek/2010/06/update-on-emc-vsphere-plugins.html" target="_blank">http://virtualgeek.typepad.com/virtual_geek/2010/06/update-on-emc-vsphere-plugins.html</a>.</p> ]]></content:encoded> <wfw:commentRss>http://vmtoday.com/2010/07/emc-virtual-storage-integrator-update/feed/</wfw:commentRss> <slash:comments>1</slash:comments> </item> <item><title>Update: SVGA Drivers on Windows 2008 R2 and Windows 7</title><link>http://vmtoday.com/2010/03/update-svga-drivers-on-windows-2008-r2-and-windows-7/?utm_source=rss&#038;utm_medium=rss&#038;utm_campaign=update-svga-drivers-on-windows-2008-r2-and-windows-7</link> <comments>http://vmtoday.com/2010/03/update-svga-drivers-on-windows-2008-r2-and-windows-7/#comments</comments> <pubDate>Mon, 29 Mar 2010 00:23:43 +0000</pubDate> <dc:creator>Joshua Townsend</dc:creator> <category><![CDATA[Issues & Troubleshooting]]></category> <category><![CDATA[VMware]]></category> <category><![CDATA[VMware How To]]></category> <category><![CDATA[drivers]]></category> <category><![CDATA[ESX]]></category> <category><![CDATA[performance]]></category> <category><![CDATA[svga]]></category> <category><![CDATA[Update 1]]></category> <category><![CDATA[vmware tools]]></category><guid
isPermaLink="false">http://vmtoday.com/?p=439</guid> <description><![CDATA[I posted an article in December on how the SVGA driver included with VMware Tools caused the guest VM to freeze.  I referenced VMware&#8217;s KB Article 1011709, which directed you to not use the SVGA drivers included with VMware Tools.  KB1011709 has since been updated (as of February 25, 2010) to indicate that the VMware [...]]]></description> <content:encoded><![CDATA[<p></p><p>I posted an <a
title="Windows Server 2008 R2 &amp; Windows 7 Freeze When Using SVGA Drivers" href="http://vmtoday.com/2009/12/windows-2008-r2-svga-drivers/">article </a>in December on how the SVGA driver included with VMware Tools caused the guest VM to freeze.  I referenced VMware&#8217;s <a
title="Disabling SVGA drivers installed with VMware Tools on Windows 7 and Windows 2008 R2 running on ESX 4.0" href="http://kb.vmware.com/selfservice/microsites/search.do?cmd=displayKC&amp;docType=kc&amp;externalId=1011709&amp;sliceId=1&amp;docTypeID=DT_KB_1_1&amp;dialogID=55160139&amp;stateId=1%200%2055162014" target="_blank">KB Article 1011709</a>, which directed you to not use the SVGA drivers included with VMware Tools.  KB1011709 has since been updated (as of February 25, 2010) to indicate that the VMware Tools package included with ESX 4.0 Update 1 includes a new WDDM driver that is fully supported.  If you have updated to Update 1, you should upgrade VMware Tools to take advantage of the new driver.</p><p>If you followed the KB1011709&#8242;s original advice and did a custom install of VMware Tools (leaving out the SVGA driver through a custom install), you may have to do a re-install of VMware Tools before the new driver is available.   Once you get VMware Tools upgraded, the new driver can be found in the guest VM at C:\Program Files\Common Files\VMware\Drivers\wddm_video.  These drivers are not automatically installed, so you&#8217;ll have to update your guest&#8217;s video adapter driver in Device Manager.</p><p>It&#8217;s a bummer that the WDDM SVGA drivers are not automatically installed.  You could  probably copy these drivers to other VM&#8217;s and use Windows Device Manager to replace the standard driver with the newer WDDM driver without having to do the uninstall, reboot, reinstall of VMware tools on all of your VM&#8217;s.</p><p>Just as I was about to publish this, I saw a TweetDeck pop-up from <a
title="Jason Boche on Twitter" href="http://www.twitter.com/jasonboche">@jasonboche</a> saying that he had published pretty much the same update here:<a
title="Windows 2008 R2 and Windows 7 on vSphere" href="http://www.boche.net/blog/index.php/2010/03/28/windows-2008-r2-and-windows-7-on-vsphere/" target="_blank"> http://www.boche.net/blog/index.php/2010/03/28/windows-2008-r2-and-windows-7-on-vsphere/</a>.  Not only does he have pretty pictures to go with his post, but also points out that VMware Tools installs/upgrades executed with VMware Update Manager (VUM) will not install the upgraded SVGA driver.  He also recommends updating templates to include the upgraded drivers.  Great points, Jason.</p> ]]></content:encoded> <wfw:commentRss>http://vmtoday.com/2010/03/update-svga-drivers-on-windows-2008-r2-and-windows-7/feed/</wfw:commentRss> <slash:comments>2</slash:comments> </item> <item><title>Installing PowerPath/VE using VMware Update Manager</title><link>http://vmtoday.com/2010/02/installing-powerpathve-using-vmware-update-manager/?utm_source=rss&#038;utm_medium=rss&#038;utm_campaign=installing-powerpathve-using-vmware-update-manager</link> <comments>http://vmtoday.com/2010/02/installing-powerpathve-using-vmware-update-manager/#comments</comments> <pubDate>Fri, 05 Feb 2010 19:17:07 +0000</pubDate> <dc:creator>Joshua Townsend</dc:creator> <category><![CDATA[Documentation]]></category> <category><![CDATA[Issues & Troubleshooting]]></category> <category><![CDATA[Storage]]></category> <category><![CDATA[VMware]]></category> <category><![CDATA[VMware How To]]></category> <category><![CDATA[EMC]]></category> <category><![CDATA[esxi]]></category> <category><![CDATA[I/O]]></category> <category><![CDATA[multipathing]]></category> <category><![CDATA[performance]]></category> <category><![CDATA[powerpath]]></category> <category><![CDATA[vcenter]]></category> <category><![CDATA[vsphere]]></category><guid
isPermaLink="false">http://vmtoday.com/?p=368</guid> <description><![CDATA[I am finishing up an installation of an EMC Clariion CX4 SAN. One of the final steps of the installation is to configure PowerPath/VE on the ESXi hosts. PowerPath/VE is EMC&#8217;s multipathing extension module for VMware (and Hyper-V), designed to replace the Native Multipathing Plugin (NMP) for increased I/O performance and failover management.  To simplify [...]]]></description> <content:encoded><![CDATA[<p></p><p>I am finishing up an installation of an EMC Clariion CX4 SAN.  One of the final steps of the installation is to configure PowerPath/VE on the ESXi hosts. <a
title="PowerPath/VE" href="http://www.emc.com/products/detail/software/powerpath-ve.htm" target="_blank">PowerPath/VE</a> is EMC&#8217;s multipathing extension module for VMware (and Hyper-V), designed to replace the Native Multipathing Plugin (NMP) for increased I/O performance and failover management.  To simplify and automate the installation of PowerPath/VE, I decided to use VMware Update Manager (VUM) to push the extension to the ESXi 4.x hosts in the environment.</p><p>The process of setting up an additional VUM patch repository to host PowerPath/VE (and other 3rd party extensions such as the Cisco Nexus 1000v) is pretty straight forward.  3rd party extensions are supported in VUM beginning with vSphere 4.0 Update 1.  <a
title="Chad Sakac - Virtual Geek blog" href="http://virtualgeek.typepad.com/virtual_geek/2009/11/vsphere-update-1-and-other-friday-goodies.html" target="_blank">Chad Sakac</a> has posted a great video guide on YouTube that covers the setup:</p><p
style="text-align: center;"><object
classid="clsid:d27cdb6e-ae6d-11cf-96b8-444553540000" width="425" height="344" codebase="http://download.macromedia.com/pub/shockwave/cabs/flash/swflash.cab#version=6,0,40,0"><param
name="allowFullScreen" value="true" /><param
name="allowScriptAccess" value="always" /><param
name="src" value="http://www.youtube.com/v/V5dtxqSJCyQ&amp;color1=0xb1b1b1&amp;color2=0xcfcfcf&amp;hl=en_US&amp;feature=player_embedded&amp;fs=1" /><param
name="allowfullscreen" value="true" /><embed
type="application/x-shockwave-flash" width="425" height="344" src="http://www.youtube.com/v/V5dtxqSJCyQ&amp;color1=0xb1b1b1&amp;color2=0xcfcfcf&amp;hl=en_US&amp;feature=player_embedded&amp;fs=1" allowfullscreen="true" allowscriptaccess="always"></embed></object></p><p>I opted to use the tomcat installation on the environment&#8217;s vCenter server to host the PowerPath/VE repository.  To accomplish this, I simply created a new directory in the tomcat root directory.  The default path for the root directory on a vSphere vCenter Server is &#8220;C:\Program Files\VMware\Infrastructure\tomcat\webapps&#8221; (or C:\Program Files (x86)\VMware\Infrastructure\tomcat\webapps on a 64-bit installation).</p><p>I created a directory named &#8216;depot&#8217; and within that directory created a PowerPathVE folder.  I extracted the contents of the VUM folder from the PowerPath .zip file that I downloaded from <a
title="EMC PowerLink" href="http://powerlink.emc.com" target="_blank">http://powerlink.emc.com</a>.  A screenshot of the directory is below:</p><div
id="attachment_371" class="wp-caption aligncenter" style="width: 579px"> <a
href="http://cloudfront.vmtoday.com/wp-content/uploads/2010/02/PPVEDepot.jpg" rel="lightbox[368]"><img
class="size-full wp-image-371 " title="PowerPath/VE Depot Folder" src="http://cloudfront.vmtoday.com/wp-content/uploads/2010/02/PPVEDepot.jpg" alt="PowerPath/VE Depot Directory Tree" width="579" height="455" /></a><p
class="wp-caption-text">PowerPath/VE Depot Directory Tree</p></div><p>After creating the directory for the patch repository, I simply added an Extension Repository to VMware Update Manager as Chad shows in his <a
href="http://www.youtube.com/watch?v=V5dtxqSJCyQ&amp;feature=player_embedded" target="_blank">video</a>.  I would like to call out one caveat &#8211; Because vCenter may not listen on standard HTTP/HTTPS ports, I used <img
src="file:///C:/Users/JOSH%7E1.TOW/AppData/Local/Temp/moz-screenshot.png" alt="" />https://vcenter.domain.local:8443/depot/PowerPathVE/index.xml as the path to the source.</p><div
id="attachment_373" class="wp-caption aligncenter" style="width: 524px"> <a
href="http://cloudfront.vmtoday.com/wp-content/uploads/2010/02/patchsource.jpg" rel="lightbox[368]"><img
class="size-full wp-image-373 " title="VUM Patch Source" src="http://cloudfront.vmtoday.com/wp-content/uploads/2010/02/patchsource.jpg" alt="VUM Patch Source" width="524" height="201" /></a><p
class="wp-caption-text">VUM Patch Source</p></div><p>Once PowerPath was added to an Extension Baseline in VUM, I simply had to scan my hosts for updates and remediate.  Installation of PowerPath/VE requires the host to be in Maintenance Mode and concludes with a reboot.  Pretty simple.</p><p>Then all you have to do is fight through an overly-complex licensing setup (seriously, a 112 page <a
title="PowerPath/VE for VMware vSphere Licensing Guide" href="https://powerlink.emc.com/nsepn/webapps/btg548664833igtcuup4826/km/live1/en_US/Offering_Technical/Technical_Documentation/300-009-188.pdf" target="_blank">PDF</a> on how to install licenses???), a bit of configuration, and you are multi-pathing with the best of them.  If you are interested in learning more about PowerPath/VE, start with this whitepaper: <a
title="EMC PowerPath/VE for VMware vSphere Best Practices Planning" href="http://www.emc.com/collateral/software/white-papers/h6340-powerpath-ve-for-vmware-vsphere-wp.pdf" target="_blank">EMC PowerPath/VE for VMware vSphere Best Practices Planning</a>.  For a bit of real-world insight into the performance increase you might see with PowerPath/VE, check out this blog post from Eric Sloof: <a
rel="bookmark" href="http://www.ntpro.nl/blog/archives/1294-Massive-IO-power-increase-using-EMC-PowerPathVE.html">Massive I/O power increase using EMC PowerPath/VE</a>.</p><p>Update &#8211; 3/27/09: VMware published a Knowledge Base article on this procedure a few weeks after I wrote this post.  You can find it in  article <a
title="Install PowerPath/VE for VMware vSphere by using vCenter Update Manager" href="http://kb.vmware.com/selfservice/microsites/search.do?cmd=displayKC&amp;docType=kc&amp;externalId=1018740&amp;sliceId=1&amp;docTypeID=DT_KB_1_1&amp;dialogID=76207021&amp;stateId=0%200%2076203931" target="_blank">1018740</a>.</p><p>Update &#8211; 4/15/11: You may have to set the NTFS permissions on the &#8216;depot&#8217; folder to allow &#8216;anonymous&#8217; read access when running on a 2008 or 2008 R2 server before you can validate and download from the new repository.</p> ]]></content:encoded> <wfw:commentRss>http://vmtoday.com/2010/02/installing-powerpathve-using-vmware-update-manager/feed/</wfw:commentRss> <slash:comments>5</slash:comments> </item> <item><title>Windows Server 2008 R2 &amp; Windows 7 Freeze When Using SVGA Drivers</title><link>http://vmtoday.com/2009/12/windows-2008-r2-svga-drivers/?utm_source=rss&#038;utm_medium=rss&#038;utm_campaign=windows-2008-r2-svga-drivers</link> <comments>http://vmtoday.com/2009/12/windows-2008-r2-svga-drivers/#comments</comments> <pubDate>Mon, 21 Dec 2009 21:16:27 +0000</pubDate> <dc:creator>Joshua Townsend</dc:creator> <category><![CDATA[Issues & Troubleshooting]]></category> <category><![CDATA[Microsoft]]></category> <category><![CDATA[VMware]]></category> <category><![CDATA[VMware How To]]></category> <category><![CDATA[driver]]></category> <category><![CDATA[tools]]></category> <category><![CDATA[VM]]></category> <category><![CDATA[windows]]></category> <category><![CDATA[windows 2008 R2]]></category> <category><![CDATA[windows 7]]></category><guid
isPermaLink="false">http://vmtoday.com/?p=295</guid> <description><![CDATA[I recently ran into an issue when installing my first Windows Server 2008 R2 virtual machine.  The VM would hang/freeze randomly when used through the VMware vCenter Client&#8217;s console.  It turns out this is a known issue (see this VMware KB Article) with the SVGA driver that is installed as part of the default installation [...]]]></description> <content:encoded><![CDATA[<p></p><p>I recently ran into an issue when installing my first Windows Server 2008 R2 virtual machine.  The VM would hang/freeze randomly when used through the VMware vCenter Client&#8217;s console.  It turns out this is a known issue (see <a
title="Disable SVGA drivers installed with VMware Tools on Windows 7 and Windows 2008 R2" href="http://kb.vmware.com/selfservice/microsites/search.do?cmd=displayKC&amp;docType=kc&amp;externalId=1011709&amp;sliceId=1&amp;docTypeID=DT_KB_1_1&amp;dialogID=55160139&amp;stateId=1%200%2055162014" target="_blank">this VMware KB Article</a>) with the SVGA driver that is installed as part of the default installation of VMware Tools.  While the article does not explain why you should disable the SVGA driver, it&#8217;s advice is correct if you want to avoid problems in your guest VM.  To correct my problem, I removed the SVGA driver from the Windows Device Manager and rebooted.  If you are having problems removing the SVGA driver before the VM hangs, use Remote Desktop to access the guest machine to perform the driver uninstall.  I have not observed hanging/freezing in the VM since removing the SVGA driver from my Windows 2008 R2 guest.  Note that this same issue is present in Windows 7.</p> ]]></content:encoded> <wfw:commentRss>http://vmtoday.com/2009/12/windows-2008-r2-svga-drivers/feed/</wfw:commentRss> <slash:comments>7</slash:comments> </item> <item><title>Upgrading Virtual Hardware in a VMware Virtual Machine May Cause Disks to go Offline</title><link>http://vmtoday.com/2009/11/upgrading-virtual-hardware-in-a-vmware-virtual-machine-may-cause-disks-to-go-offline/?utm_source=rss&#038;utm_medium=rss&#038;utm_campaign=upgrading-virtual-hardware-in-a-vmware-virtual-machine-may-cause-disks-to-go-offline</link> <comments>http://vmtoday.com/2009/11/upgrading-virtual-hardware-in-a-vmware-virtual-machine-may-cause-disks-to-go-offline/#comments</comments> <pubDate>Mon, 23 Nov 2009 22:51:44 +0000</pubDate> <dc:creator>Joshua Townsend</dc:creator> <category><![CDATA[Issues & Troubleshooting]]></category> <category><![CDATA[VMware]]></category> <category><![CDATA[VMware How To]]></category> <category><![CDATA[upgrade]]></category> <category><![CDATA[virtual hardware]]></category> <category><![CDATA[virtualization]]></category> <category><![CDATA[vsphere]]></category><guid
isPermaLink="false">http://vmtoday.com/?p=273</guid> <description><![CDATA[I recently posted an article on how specific actions during the upgrade of a VMware Virtual Machine&#8217;s hardware from v4 to v7 can cause problems with certain services, including DNS, DHCP, and WINS. In that case, the problem was related to Microsoft Windows leaving non-present devices with networking configurations and  the failure of the VMware [...]]]></description> <content:encoded><![CDATA[<p></p><p>I recently posted an <a
title="vSphere Upgrade Breaks Active Directory" href="http://vmtoday.com/2009/11/vsphere-upgrade-breaks-active-directory/" target="_blank">article </a>on how specific actions during the upgrade of a VMware Virtual Machine&#8217;s hardware from v4 to v7 can cause problems with certain services, including DNS, DHCP, and WINS.  In that case, the problem was related to Microsoft Windows leaving non-present devices with networking configurations and  the failure of the VMware Upgrade Helper service to copy WINS settings when updating the NIC.  As my fellow blogger and VMUG leader, <a
href="http://boche.net/blog/" target="_blank">Jason Boche</a>, <a
href="http://twitter.com/jasonboche/" target="_blank">responded on Twitter</a>: &#8220;Same gotchas, different version.&#8221;  And right he is &#8211; anyone with experience in P2V or V2V, or who has been working with VMware long enough to have done a 2.5 to 3.0 upgrade experienced the same gotchas.</p><p>There are other issues with VMware virtual hardware upgrades, however, that you may not have experienced.  One such issue that I have experienced is highlighted in VMware Knowledge Base article <a
href="http://kb.vmware.com/selfservice/microsites/search.do?language=en_US&amp;cmd=displayKC&amp;externalId=1013109" target="_blank">1013109</a>: <em>&#8220;Upgrading virtual hardware in ESX 4 may cause Windows 2008 disks to go offline</em>&#8220;.  The problems described in the article are unique to Windows 2008 Enterprise and Datacenter editions only.  The problem is pretty well described in the title of the article &#8211; Upgrading virtual hardware in ESX 4 may cause Windows 2008 disks to go offline.  In this case, like with the ghost NIC&#8217;s I <a
title="vSphere Upgrade Breaks Active Directory" href="http://vmtoday.com/2009/11/vsphere-upgrade-breaks-active-directory/" target="_blank">described </a>last week, is more of a Microsoft issue, but it will rear its head when a VMware Administrator least desires it.  With this particular problem, the Windows Virtual Disk Service (part of the native Storage Management suite) is set to not auto-mount newly discovered disks that do reside on a shared bus.  Microsoft has a MSDN article on the VDS SANS policy <a
title="VDS_SAN_POLICY Enumeration" href="http://msdn.microsoft.com/en-us/library/bb525577%28VS.85%29.aspx" target="_blank">here</a>.  Upgrading the virtual hardware version causes the disks to be re-discovered and not auto-mounted.  This can potentially impact all non-system disks on a VM.</p><p>You may also experience similar issues when upgrading the vSCSI adapter in a VM from a standard LSI Logic Parallel SCSI adapter to a (new in vSphere 4.0) paravirtualized SCSI (<a
title="Configuring disks to use VMware Paravirtual SCSI (PVSCSI) adapters" href="http://kb.vmware.com/selfservice/microsites/search.do?language=en_US&amp;cmd=displayKC&amp;externalId=1010398" target="_blank">pvSCSI</a>) adapter, move virtual disks to new vSCSI adapters to increase the number of concurrent disk IO operations, or when you change the SCSI node ID of a virtual disk.  These may all trigger a re-discovery of the disks by the Windows Virtual Disk Service, leaving data disks offline on Windows 2008 Enterprise and Datacenter Edition guests.</p><p>In my opinion, these issues are not reasons to forgo upgrading your virtual hardware version.  However, when your upgrade/migration plans call for upgrading the virtual hardware version of your guests you should be prepared to resolve any issues caused by &#8216;ghost hardware&#8217;, offline disks, and the like.  Both the MSDN and VMware articles I cited above offer workarounds for the offline disk issue.  Here are the links again:</p><li>http://kb.vmware.com/selfservice/microsites/search.do?language=en_US&amp;cmd=displayKC&amp;externalId=1013109</li><li>http://msdn.microsoft.com/en-us/library/bb525577%28VS.85%29.aspx</li> ]]></content:encoded> <wfw:commentRss>http://vmtoday.com/2009/11/upgrading-virtual-hardware-in-a-vmware-virtual-machine-may-cause-disks-to-go-offline/feed/</wfw:commentRss> <slash:comments>3</slash:comments> </item> <item><title>vSphere Upgrade Breaks Active Directory</title><link>http://vmtoday.com/2009/11/vsphere-upgrade-breaks-active-directory/?utm_source=rss&#038;utm_medium=rss&#038;utm_campaign=vsphere-upgrade-breaks-active-directory</link> <comments>http://vmtoday.com/2009/11/vsphere-upgrade-breaks-active-directory/#comments</comments> <pubDate>Wed, 18 Nov 2009 21:40:24 +0000</pubDate> <dc:creator>Joshua Townsend</dc:creator> <category><![CDATA[Issues & Troubleshooting]]></category> <category><![CDATA[Microsoft]]></category> <category><![CDATA[VMware]]></category> <category><![CDATA[active directory]]></category> <category><![CDATA[DHCP]]></category> <category><![CDATA[DNS]]></category> <category><![CDATA[NIC]]></category> <category><![CDATA[upgrade]]></category> <category><![CDATA[upgrade virtual hardware]]></category> <category><![CDATA[vsphere]]></category><guid
isPermaLink="false">http://vmtoday.com/?p=249</guid> <description><![CDATA[I recently completed a VMware VI 3.5 to vSphere upgrade in a small environment (5 hosts, 80 VM&#8217;s).  Being a small environment, the upgrade was planned for one big overnight blitz.  Unfortunately, the size of the environment did not afford a test environment to uncover potential issues before the upgrade.  The upgrade to vSphere itself [...]]]></description> <content:encoded><![CDATA[<p></p><p>I recently completed a VMware VI 3.5 to vSphere upgrade in a small environment (5 hosts, 80 VM&#8217;s).  Being a small environment, the upgrade was planned for one big overnight blitz.  Unfortunately, the size of the environment did not afford a test environment to uncover potential issues before the upgrade.  The upgrade to vSphere itself went swimmingly (the vCenter server had been upgraded a couple weeks earlier).  However, some things in the environment started to go wonky once the upgrade was complete.  Specifically, name resolution (DNS), DHCP, WINS, Group Policy, and really anything Microsoft Active Directory related just did not work.</p><p>Let me explain a bit about the environment so you can better understand what the problem was and how it was corrected.  The environment was an all Microsoft shop, except for VMware of course.  The company follows a virtualize-first policy and is about 90% virtualized, including the Active Directory Domain Controllers.  The DC&#8217;s are Windows 2008 and serve up DHCP, DNS, and WINS in addition to their Directory Services roles.</p><p>The problems really began after I upgraded the virtual hardware version from v4 to v7 (check out page 97 of the <a
href="http://www.vmware.com/pdf/vsphere4/r40/vsp_40_upgrade_guide.pdf">vSphere Upgrade Guide</a> for the upgrade procedure).  When a Windows server is upgrade from VMware Hardware Version 4 to 7, the VMware Upgrade Helper Service handles the reconfiguration of network adapters on the upgraded virtual machine.  The VMware Upgrade Helper Service is installed with VMware Tools and is one of the reasons, along with getting drivers installed for the new hardware, for upgrading VMware Tools before upgrading the hardware version.  If you review the Event Viewer Application log on an upgraded machine you will see several entries from VMUpgradeHelper (Source) with several different Event ID&#8217;s (26, 280, 272, 108, &amp; 105).  An examination of these events will show that the VMware Upgrade Helper service 1.) Backed up the network configuration at OS shutdown, 2.) Started Automatically with the OS, 3.) Checks the device ID for the network adapter, 4.) If the device ID has changed (as a result of a hardware upgrade), the backed up configuration is restored and Event ID 269 is logged.</p><p>This behavior should be transparent for most configurations, with the exception of a slightly longer boot time following the upgrade.  However, I did notice a few problems with the NIC settings being restored under certain conditions.  First, on servers with a statically configured IPv4 stack, IP addresses and DNS server addresses were restored, but the WINS server addresses were not restored.  I suspect this is an oversight in the VMware Upgrade Helper service, but is probably not a major issue for many servers/environments as WINS is infrequently used.  However, for a WINS server itself to lose its configuration to use itself as a WINS server, <a
href="http://lmgtfy.com/?q=what+happens+when+a+WINS+server+doesn%27t+use+itself+as+a+WINS+server" target="_blank">bad things happen</a>.  There are several ways to correct this &#8211; scripts, DHCP Options, etc.  In the end, this wasn&#8217;t really a show stopper for me in this small environment.</p><p>The second, and bigger issue for me, was that after the virtual hardware was upgraded and the VMware Upgrade Helper Service did it&#8217;s job my Active Directory and related services were not available.  DNS was not functioning, DHCP was not handing out addresses, and I couldn&#8217;t connect to AD using ADUC, GPMC or LDAP.  It took me a few minutes to figure out what was going on.  This seems to be what happened: the virtual hardware upgrade caused a new virtual network adapter to be installed in the VM and all of the settings, including the MAC, address to be restored.  The HW v4 NIC was removed from the machine, but Windows held onto the device as a &#8216;ghost NIC&#8217; in Device Manager.  The core AD services, including DNS and DHCP, were still attempting to bind to the ghost NIC.  This behavior persisted through service restarts and reboots of the guest.  It wasn&#8217;t until I examined the IP configuration on the new NIC and clicked Apply (instead of canceling out) that I was prompted with a message indicating that there was more than one network interface configured with the same IP address, queuing me into the solution.</p><p>The error message should be familiar to anyone who has performed a Physical-to-Virtual migration (P2V) and is easily corrected by removing the old device through Windows Device Manager.  The device is hidden so you first have to expose it before deleting it.  Check <a
href="http://support.microsoft.com/kb/315539" target="_blank">http://support.microsoft.com/kb/315539</a> for details or simply follow my instructions below.  To expose the non-present NIC, open a command prompt and enter:</p><blockquote><p>set devmgr_show_nonpresent_devices=1</p></blockquote><p>You can then open Device Manager (enter <em>devmgmt.msc</em> at the command prompt to save some time).  In Device Manager, click View | Show Hidden Devices.  Expand Network Adapters and find the grayed-out entry for the old NIC as pictured below.</p><p
style="text-align: center;"><a
href="http://cloudfront.vmtoday.com/wp-content/uploads/2009/11/GhostNIC1.JPG" rel="lightbox[249]"><img
class="size-full wp-image-262 aligncenter" title="GhostNIC" src="http://cloudfront.vmtoday.com/wp-content/uploads/2009/11/GhostNIC1.JPG" alt="GhostNIC" width="357" height="256" /></a></p><p
style="text-align: left;">Select the ghost NIC and right-click | Uninstall to remove it.</p><p>The final gotcha for me on this is that the set devmgr_show_nonpresent_devices=1 command does not work on Windows 2008 (or Vista, Windows 7, or Windows 2008 R2).  To see and remove ghost NICs from Windows 2008, and environmental variable must be defined.  To set the variable, open Server Manager from the Windows Start Menu.  Highlight &#8216;Server Manager (%SERVERNAME%)&#8217; in the left-side tree-view pane.  Click &#8216;Change System Properties&#8217; in the right-hand pane.  Switch to the Advanced tab and click &#8216;Environment Variables.  Create a new System variable by clicking the New button.  The Variable name should be &#8216;devmgr_show_nonpresent_devices&#8217; and the value should be &#8217;1&#8242; as pictured below.</p><p><a
href="http://cloudfront.vmtoday.com/wp-content/uploads/2009/11/EnvVariable.JPG" rel="lightbox[249]"><img
class="aligncenter size-full wp-image-263" title="EnvVariable" src="http://cloudfront.vmtoday.com/wp-content/uploads/2009/11/EnvVariable.JPG" alt="EnvVariable" width="349" height="139" /></a></p><p>Click OK to close out of any open Windows.  A reboot is not necessary for the variable to take effect, although you may have to close out of all open Device Manager Windows and then reopen devmgmt.msc.  Click View | Show Hidden Devices and remove the ghost NIC as described above.  A quick reboot after I removed the ghost NIC from the domain controllers and all Active Directory, DNS, DHCP, and WINS services immediately began operating normally.  This second issue is more of a Microsoft problem in my opinion, and has been around for some time.</p><p>Before you start getting all upset and the FUD starts flying (&#8220;this is Microsoft/VMware&#8217;s latest attempt to break VMware/Microsoft?&#8221;), it wasn&#8217;t really vSphere that broke Active Directory; It was me.  A little better planning and not rushing through the last wee hours of the upgrade Window could have saved some trouble.  If you are planning a similar upgrade, it would be best to upgrade your domain controllers/DNS servers one at a time and remediate the issues I have decribed before upgrading the next.  This will ensure continued availability of your Active Directory and other critical services during your upgrade.</p> ]]></content:encoded> <wfw:commentRss>http://vmtoday.com/2009/11/vsphere-upgrade-breaks-active-directory/feed/</wfw:commentRss> <slash:comments>10</slash:comments> </item> <item><title>The Skinny on ESXTOP</title><link>http://vmtoday.com/2009/09/the-skinny-on-esxtop/?utm_source=rss&#038;utm_medium=rss&#038;utm_campaign=the-skinny-on-esxtop</link> <comments>http://vmtoday.com/2009/09/the-skinny-on-esxtop/#comments</comments> <pubDate>Thu, 17 Sep 2009 22:39:01 +0000</pubDate> <dc:creator>Joshua Townsend</dc:creator> <category><![CDATA[Issues & Troubleshooting]]></category> <category><![CDATA[VMware]]></category> <category><![CDATA[VMware How To]]></category> <category><![CDATA[analysis]]></category> <category><![CDATA[analyze]]></category> <category><![CDATA[batch mode]]></category> <category><![CDATA[cpu]]></category> <category><![CDATA[disk]]></category> <category><![CDATA[ESX]]></category> <category><![CDATA[esxi]]></category> <category><![CDATA[esxtop]]></category> <category><![CDATA[memory]]></category> <category><![CDATA[network]]></category> <category><![CDATA[performances]]></category> <category><![CDATA[rCLI]]></category> <category><![CDATA[resxtop]]></category> <category><![CDATA[statistics]]></category> <category><![CDATA[vCLI]]></category> <category><![CDATA[vMA]]></category> <category><![CDATA[vsphere]]></category><guid
isPermaLink="false">http://vmtoday.com/?p=244</guid> <description><![CDATA[A reader named Mark contacted me today and asked if there was a way to reduce the size of the batch output from an ESXTOP run.  And he asks for good reason: Depending on the number of VM&#8217;s on your host, the delay between ESXTOP samplings and the number of samples you collect, using the [...]]]></description> <content:encoded><![CDATA[<p></p><p>A reader named Mark contacted me today and asked if there was a way to reduce the size of the batch output from an ESXTOP run.  And he asks for good reason: Depending on the number of VM&#8217;s on your host, the delay between ESXTOP samplings and the number of samples you collect, using the All Stats option (-a) can yield a massive file in a short period of time.  If written to a partition on your ESX Service Console you run the risk of filling the partition, and forget about actually being able to analyze the data in PERFMON or Excel.  For example, on an ESX host running ~15 VM&#8217;s I produced 100MB worth of CSV using the -a switch, sampling every 15 seconds, for just under 2 hours.  ESXTOP uses 10-second intervals by default; I used <span
style="color: #993300;">-d 15</span> to change the sampling delay.  Had I went with the default my output would have been bigger.</p><p>To reduce the size of your output, you can change your sampling delay to something larger, say 30-seconds.  I suppose you could also capture statistics when the host is not busy so you get fewer characters in the results, but that&#8217;s just being goofy. <img
src='http://cloudfront.vmtoday.com/wp-includes/images/smilies/icon_wink.gif' alt=';-)' class='wp-smiley' /></p><p>A better way to reduce your ESXTOP output size is to selectively include only the statistics you are interested in, and is really what Mark was asking.  After all, all statistics from ESXTOP can be too many statistics, and chances are you already know what stats you are interested in.  Here&#8217;s how you can narrow down the collected stats for easier analysis and smaller output:</p><ol><li>Enter ESXTOP in interactive mode on the Service Console by simply typing <span
style="color: #993300;">esxtop</span> at the # prompt</li><li>Switch to a component you are NOT interested in capturing statistics on by pressing the corresponding menu option (<span
style="color: #993300;">c</span>: ESX cpu, <span
style="color: #993300;">m</span>: ESX memory, <span
style="color: #993300;">d</span>: ESX disk adapter, <span
style="color: #993300;">u</span>: ESX disk device, <span
style="color: #993300;">v</span>: ESX disk VM).</li><li>Press <span
style="color: #993300;">f</span> when viewing the component you do not want to capture.  A list of fields will be displayed.  You can toggle the fields on and off by pressing the letter corresponding to each field.  An * indicates that the field is on.  You want to turn off all of the fields you don&#8217;t want to collect.</li><li>Repeat steps 2 &amp; 3 for the remaining components, leaving only what you want to capture.</li><li>Switch to the component you want to capture in batch mode and repeat step #3, except you will now enable what you want to capture.</li><li>Press <span
style="color: #993300;">W</span> (capital W &#8211; case sensitive) to write out the ESXTOP configuration file.  You can accept the default or create new configuration files.  You may want to create a CPU-only config file, memory-only, and so forth.</li><li>Press <span
style="color: #993300;">CTRL+C</span> to stop ESXTOP.</li><li>Now, invoke ESXTOP in batch mode, calling your updated or new configuration file you created in step #6 using the -c switch.  Here&#8217;s an example:# <span
style="color: #993300;">esxtop -b -d 30 -n 480 -c .esxtopcpustats &gt; /tmp/esxtop_cpu_stats.cs</span><span
style="color: #993300;">v</span> where .esxtopcpustats is an ESXTOP config file with only CPU stats.  -d sets your capture interval to 30 seconds, and -n sets the number of samples to 480 (or 4 hours with a delay of 30 seconds).</li></ol><p>Once your capture is complete you can replay the sampling in ESXTOP using replay mode (-R), or you can copy the .csv to a Windows system and use PERFMON or Excel to analyze the stats.  If using PERFMON or Excel you will notice that the system summary information displayed at the top of an interactive ESXTOP session is included in the output (console memory, console cpu, etc.).  As far as I know, there is no way to disable this, nor would you want to as it includes the time stamp necessary to interpret your data.</p><p>It is possible to use the <a
title="vSphere CLI" href="http://communities.vmware.com/community/vmtn/vsphere/automationtools/vsphere_cli" target="_blank">vSphere CLI</a> or the <a
title="vSphere Management Assistant vMA" href="http://www.vmware.com/support/developer/vima/" target="_blank">vSphere Management Assistant (vMA)</a> to run RESXTOP, a version of ESXTOP designed for remote administration of ESXi or ESX.  You may note, however, RESXTOP from the vSphere CLI only works from a Linux client.  Using either of these tools will help you to automate ESXTOP statistics collection from multiple hosts using customized configuration files.</p> ]]></content:encoded> <wfw:commentRss>http://vmtoday.com/2009/09/the-skinny-on-esxtop/feed/</wfw:commentRss> <slash:comments>6</slash:comments> </item> <item><title>vCenter Database Stats Rollup Troubleshooting</title><link>http://vmtoday.com/2009/09/vcenter-database-stats-rollup-troubleshooting/?utm_source=rss&#038;utm_medium=rss&#038;utm_campaign=vcenter-database-stats-rollup-troubleshooting</link> <comments>http://vmtoday.com/2009/09/vcenter-database-stats-rollup-troubleshooting/#comments</comments> <pubDate>Thu, 17 Sep 2009 14:33:40 +0000</pubDate> <dc:creator>Joshua Townsend</dc:creator> <category><![CDATA[Issues & Troubleshooting]]></category> <category><![CDATA[VMware]]></category> <category><![CDATA[VMware How To]]></category> <category><![CDATA[configuration]]></category> <category><![CDATA[database]]></category> <category><![CDATA[design]]></category> <category><![CDATA[performance]]></category> <category><![CDATA[sql]]></category> <category><![CDATA[statistics]]></category> <category><![CDATA[vcenter]]></category> <category><![CDATA[vi client]]></category> <category><![CDATA[viclient]]></category> <category><![CDATA[virtual center]]></category> <category><![CDATA[virtualcenter]]></category> <category><![CDATA[vsphere]]></category><guid
isPermaLink="false">http://vmtoday.com/?p=240</guid> <description><![CDATA[VMware vCenter collects performance statistics, tasks and events for historical performance analysis and auditing.  The collection level and retention of performance statistics can be controlled through the vCenter GUI (see Administration &#124; vCenter Server Settings &#124; Statistics).   The level of statistics collection and retention periods can have a dramatic impact on your vCenter Server&#8217;s performance [...]]]></description> <content:encoded><![CDATA[<p></p><p>VMware vCenter collects performance statistics, tasks and events for historical performance analysis and auditing.  The collection level and retention of performance statistics can be controlled through the vCenter GUI (see Administration | vCenter Server Settings | Statistics).   <a
href="http://cloudfront.vmtoday.com/wp-content/uploads/2009/09/image2.png" rel="lightbox[240]"><img
style="border-bottom: 0px; border-left: 0px; margin: 10px 15px 10px 0px; display: inline; border-top: 0px; border-right: 0px" title="vCenter Statistics Settings" src="http://cloudfront.vmtoday.com/wp-content/uploads/2009/09/image_thumb2.png" border="0" alt="vCenter Statistics Settings" width="289" height="282" align="left" /></a>The level of statistics collection and retention periods can have a dramatic impact on your vCenter Server&#8217;s performance if not carefully planned and monitored.  In particular, the vCenter database can grow quite large and the database server required to support the increase in statistics increases in size and performance characteristics (increased disk IO capacity, CPU, and memory).  Fortunately, VMware has provided a vCenter database sizing tool within the vCenter client (see picture).  This is all well and good for initial sizing, and my experience shows that vCenter&#8217;s sizing estimates are fairly accurate assuming the environment remains healthy.</p><p>I recently migrated an environment from vCenter 2.5 to 4.0 and in the process switched from a Windows 2003 32-bit vCenter host and a SQL 2005 server (remote to vCenter) to a Windows 2008 64-bit vCenter server with a SQL 2008 server (again, a remote SQL server).  I experienced a few issues during the migration and thought I had worked through them all (I&#8217;ll post on those at a later date).  However, after a bit of time I found that performance statistics for objects in the vCenter were missing of not rendering at an acceptable pace.  Upon further investigation, I discovered warnings in the vCenter Service Status node indicating that performance rollups within the vCenter database were not taking place.</p><p><a
href="http://cloudfront.vmtoday.com/wp-content/uploads/2009/09/image3.png" rel="lightbox[240]"><img
style="border-bottom: 0px; border-left: 0px; display: inline; border-top: 0px; border-right: 0px" title="image" src="http://cloudfront.vmtoday.com/wp-content/uploads/2009/09/image_thumb3.png" border="0" alt="image" width="428" height="50" /></a></p><p>In a SQL-backed vCenter, statistics rollups are handled by the SQL Server Agent (note: if you are using SQL Server Express, statistics rollups are handled by vCenter itself as SQL Express does not offer SQL Server Agent jobs).  <a
title="Missing Performance Data in VirtualCenter 2.5" href="http://kb.vmware.com/selfservice/microsites/search.do?language=en_US&amp;cmd=displayKC&amp;externalId=1003570" target="_blank">KB 1003570</a> describes this process (it applies to vCenter 2.5, but the principles in it can be applied to 4.0).  To troubleshoot and resolve the issue I opened SQL Server Management Studio and checked several items:</p><ol><li><span
style="color: #35383d;">Is the SQL Server Agent running?</span></li><li><span
style="color: #35383d;">Are there statistics rollup jobs defined for SQL server agent?</span></li><li><span
style="color: #35383d;">Are those jobs running?</span></li></ol><p>In my case, the SQL Server Agent was running (you are prompted to configure this during the vCenter install).  However, when I checked for the presence of rollup jobs, I discovered that only a Past Day job had migrated with the database to the new SQL server.  Upon investigating the job history for that job I discovered that the job had not run since the migration (note to self: add these checks to your standard vCenter migration checklist).</p><p>To remediate the problem I completed the following steps:</p><ol><li><span
style="color: #35383d;">Remove the bad &#8216;Past Day stats rollupVirtualCenter&#8217; job from the list of SQL Server Agent Jobs.</span></li><li><span
style="color: #35383d;">Recreate the three standard stats rollup jobs.  To recreate the jobs, find SQL scripts on your vCenter server in C:\Program Files (x86)\VMware\Infrastructure\VirtualCenter Server.  The .sql scripts you&#8217;ll need are stats_rollup1_proc_mssql.sql, stats_rollup2_proc_mssql.sql, and stats_rollup3_proc_mssql.sql.  Run these scripts in SQL Query Analyzer against your VirtualCenter Database in order from 1 to 3.  These scripts should create the rollup jobs and their associated stored procedures (this procedure is detailed at <a
title="http://communities.vmware.com/thread/123715?start=0&amp;tstart=0" href="http://communities.vmware.com/thread/123715">http://communities.vmware.com/thread/123715</a>).</span></li><li><span
style="color: #35383d;">After recreating the jobs I took a backup of the vCenter database.  The Past Day job soon kicked off to begin a stats rollup (this runs every 30 minutes by default).</span></li></ol><p>I checked the server several hours later and discovered that rather than completing successfully, the Past Day job was still running and the drive holding my vCenter database transaction log was full.  Back to the drawing board..</p><ol><li><span
style="color: #35383d;">I disabled the Past Week and Past Month rollup jobs to avoid job conflicts.</span></li><li><span
style="color: #35383d;">I backed up the vCenter database and then performed a shrink of the log file to get it back down to size.</span></li><li><span
style="color: #35383d;">The vCenter was running as a VM, so I was able to quickly increase its disk size and use diskpart from within the guest to extend the partition.  The space required to process weeks of performance statistics is not included in the vCenter Database Sizing tool as it is assumed that the rollup/purge jobs will run as designed.</span></li></ol><p>I wanted to see how bad the problem was before kicking off another job so I ran:</p><blockquote><p>select count(*) from vpx_hist_stat1</p></blockquote><p>against the vCenter database in SQL Query Analyzer.  The query ran for several hours (never a good sign) and eventually returned well over 20 million rows of performance statistics (thanks to <a
title="http://communities.vmware.com/message/1318736" href="http://communities.vmware.com/message/1318736">http://communities.vmware.com/message/1318736</a> for pointing me in this direction).  I investigated options to truncate the tables (see above link), and also looked at a script from VMware KB <a
href="http://kb.vmware.com/selfservice/microsites/search.do?language=en_US&amp;cmd=displayKC&amp;externalId=1000125" target="_blank">1000125</a>: Purging old data from the database used by vCenter Server.  In the end, I decided to try to let the Past Day stats job run.</p><p>I stopped the vCenter  Server Service to prevent new statistics from being written to the database.  I also disabled the Past Week and Past Month SQL Agent jobs to prevent job conflicts and then manually started the Past Day job.  I had to stop the job several times as it filled the 100GB transaction log volume.  A backup &amp; shrink operation gave me back the space on the log volume.  I saw about 300GB of transaction logs written over the course of this process, but the Past Day job eventually completed.</p><p>Finally, I re-enabled the Past Week and Past Month jobs and manually ran both of them (Past Week first, then Past Month), followed by a backup and shrink of the vCenter database.  I was impressed with the performance increase I saw in the vCenter client.  Lists and performance graphs rendered much faster than when stats rollups were not taking place.</p><p>It would be a good idea to include checking stats rollup job status and a count of rows from the vpx_hist_stat tables in the vCenter database in your regular maintenance tasks.  For other vCenter Database best practices, check out breakout session PO2061 from VMworld 2008.  If you did not attend or subscribe to <a
title="VMworld" href="http://www.vmworld.com" target="_blank">VMworld</a>, Scott Lowe <a
title="PO2061: VMware VirtualCenter 2.5 Database Best Practices" href="http://blog.scottlowe.org/2008/09/18/po2061-vmware-virtualcenter-25-database-best-practices/" target="_blank">covered the session in this post</a>.  A VMworld 2009 &#8220;<a
title="Exclusive &quot;Online Only&quot; Sessions for VMworld 2009" href="http://www.vmworld.com/blogs/vmworld/2009/09/01/exclusive-online-only-sessions-for-vmworld-2009" target="_blank">online only</a>&#8221; session entitled <a
title="VM3237: vCenter Databases: Setup, Management and Best Practices" href="http://www.vmworld.com/docs/DOC-3763" target="_blank">VM3237 vCenter Databases: Setup, Management and Best Practices</a> was also offered (subscription required).  I have not viewed this session so I cannot comment on its content.</p> ]]></content:encoded> <wfw:commentRss>http://vmtoday.com/2009/09/vcenter-database-stats-rollup-troubleshooting/feed/</wfw:commentRss> <slash:comments>4</slash:comments> </item> <item><title>RAMDisk Usage in a vSphere Environment</title><link>http://vmtoday.com/2009/09/ramdisk-usage-in-a-vsphere-environment/?utm_source=rss&#038;utm_medium=rss&#038;utm_campaign=ramdisk-usage-in-a-vsphere-environment</link> <comments>http://vmtoday.com/2009/09/ramdisk-usage-in-a-vsphere-environment/#comments</comments> <pubDate>Fri, 11 Sep 2009 19:00:55 +0000</pubDate> <dc:creator>Joshua Townsend</dc:creator> <category><![CDATA[Issues & Troubleshooting]]></category> <category><![CDATA[VMware]]></category> <category><![CDATA[memory]]></category> <category><![CDATA[performance]]></category> <category><![CDATA[ramdisk]]></category><guid
isPermaLink="false">http://vmtoday.com/?p=202</guid> <description><![CDATA[I had some folks from our .NET development team come to me with a problem today &#8211; their ASP.NET code was taking forever to recompile after updates to the code base. But these guys are cool &#8211; they came with a proposed solution (most people who grace my office door are simply dropping off problems). [...]]]></description> <content:encoded><![CDATA[<p></p><p>I had some folks from our .NET development team come to me with a problem today &#8211; their ASP.NET code was taking forever to recompile after updates to the code base.  But these guys are cool &#8211; they came with a proposed solution (most people who grace my office door are simply dropping off problems).  Their solution?  A RAMDisk mounted in a VMware Windows guest.   I give them credit for a novel approach, but I could see some issues:</p><ul><li>What would happen if the balloon driver kicked in and demanded the memory the RAMDisk was running on?</li><li>A reservation would get around the balloon driver issue, but there is no way to specifically target the 512MB of RAMDisk, all memory in the VM must be reserved.</li><li>I&#8217;m a pragmatic Windows systems administrator at heart, with a heap of systems and processes to manage and monitor.  I don&#8217;t want the additional burden of making sure the RAMDisk loads at boot, keeps a consistent image across boots, can be easily updated by new code pushes, and remains compatible with new VM hardware and Tools versions.</li><li>A RAMDisk would take from what are already memory constrained VM&#8217;s, possibly hurting performance more than helping.</li><li>If the disk subsystem is slow enough to get you thinking down the path of a RAMDisk, maybe it&#8217;s time for a new SAN&#8230;</li></ul><p>I did some Googling around and couldn&#8217;t find any decent info.  I did find a few hits on people running VMware guests entirely inside a RAMDisk &#8211; a concept that peaked my interest almost enough to think about trying it just to say I did&#8230;.  Have any of you experimented with a RAMDisk inside a VMware guest?  If so, what did you take away from the setup?  Was there a performance gain?  Where there gotcha&#8217;s?  Leave a comment if you have experience, guesses, or advice on this idea.</p> ]]></content:encoded> <wfw:commentRss>http://vmtoday.com/2009/09/ramdisk-usage-in-a-vsphere-environment/feed/</wfw:commentRss> <slash:comments>2</slash:comments> </item> <item><title>ESXTOP Batch Mode &amp; Windows Perfmon</title><link>http://vmtoday.com/2009/09/esxtop-batch-mode-windows-perfmon/?utm_source=rss&#038;utm_medium=rss&#038;utm_campaign=esxtop-batch-mode-windows-perfmon</link> <comments>http://vmtoday.com/2009/09/esxtop-batch-mode-windows-perfmon/#comments</comments> <pubDate>Thu, 10 Sep 2009 14:24:22 +0000</pubDate> <dc:creator>Joshua Townsend</dc:creator> <category><![CDATA[Issues & Troubleshooting]]></category> <category><![CDATA[VMware]]></category> <category><![CDATA[VMware How To]]></category> <category><![CDATA[ESX]]></category> <category><![CDATA[esxtop]]></category> <category><![CDATA[I/O]]></category> <category><![CDATA[perfmon]]></category> <category><![CDATA[performance]]></category> <category><![CDATA[sizing]]></category> <category><![CDATA[statistics]]></category> <category><![CDATA[Storage]]></category><guid
isPermaLink="false">http://vmtoday.com/?p=192</guid> <description><![CDATA[I needed to grab some stats from my ESX hosts for off-line analysis so I fired up my trusty ESXTOP intent on using batch mode to capture a .csv formatted output.  I started to manually select the counters I was interested in while working in ESXTOP interactive mode (you can save your selected counters to [...]]]></description> <content:encoded><![CDATA[<p></p><p>I needed to grab some stats from my ESX hosts for off-line analysis so I fired up my trusty ESXTOP intent on using batch mode to capture a .csv formatted output.  I started to manually select the counters I was interested in while working in ESXTOP interactive mode (you can save your selected counters to the esxtop configuration file with the &#8216;w&#8217; command) and thought that there must be a better way.  I found that better way in the VMware Performance Community: <a
title="http://communities.vmware.com/docs/DOC-3930" href="http://communities.vmware.com/docs/DOC-3930">http://communities.vmware.com/docs/DOC-3930</a>.  There is now a -a switch that can be used to include ALL performance counters.  I&#8217;m sold.</p><p>I wanted detailed information, so I decided on a 15 second capture interval to run for a 2 hour window.  Here&#8217;s the command I used:</p><blockquote><p>esxtop -a -b -d 15 -n 480 &gt; /tmp/esxtopout.csv</p></blockquote><p>where -a is for ALL, -b is for batch mode, -d is for delay, and -n is for the number of iterations ((60/15)*60*2).  I wrote out the results to a .csv in /tmp.  The resulting CSV weighed in at a whopping 100MB after 2 hours.</p><p>The CSV can be analyzed in Excel (pivot tables work well for this) or in Windows Perfmon.  I opened the log in Perfmon as I was after basic Min/Average/Max counters and Perfmon makes those easy to see.  When adding the CSV log to Perfmon, you are prompted to select counters.  I added all instances of Commands/sec, Reads/sec, and Writes/sec from Physical Disk (I was gathering some IOPS counts for a new storage proposal). I got a bit more than I bargained for: a mostly unresponsive Perfmon window and the ugliest darn graph I&#8217;ve ever seen.</p><p><a
href="http://cloudfront.vmtoday.com/wp-content/uploads/2009/09/image.png" rel="lightbox[192]"><img
style="border-bottom: 0px; border-left: 0px; display: inline; border-top: 0px; border-right: 0px" title="image" src="http://cloudfront.vmtoday.com/wp-content/uploads/2009/09/image_thumb.png" border="0" alt="image" width="420" height="313" /></a></p><p>Switching from a graph view to the report view allows you to easily view and remove specific counters that you are not interested in, or open the Properties of the data set, switch to the data tab and bulk select counters that you want to remove.  I was not interested in vmhba1:x, specific VM&#8217;s or worlds, so I killed all of those, leaving just the base iSCSI device (vmhba32 in my case).</p><p>After some cleanup the graph looked a bit better and more importantly, I was able to easily read my Min/Average/Max stats:</p><p><a
href="http://cloudfront.vmtoday.com/wp-content/uploads/2009/09/image1.png" rel="lightbox[192]"><img
style="border-bottom: 0px; border-left: 0px; display: inline; border-top: 0px; border-right: 0px" title="image" src="http://cloudfront.vmtoday.com/wp-content/uploads/2009/09/image_thumb1.png" border="0" alt="image" width="416" height="327" /></a></p><p>Here are the takeaways -</p><ul><li><span
style="color: #35383d;">ESXTOP is a powerful utility for performance monitoring</span></li><li><span
style="color: #35383d;">All stats (-a) can result in a huge file &#8211; use it wisely in batch mode; else use interactive mode to select your counters and write them to the user-defined configuration file.  Invoke the config file with the -c option when running in batch mode.</span></li><li><span
style="color: #35383d;">Consider using vscsiStats for more granular reporting.</span></li><li><span
style="color: #35383d;">ESXTOP physical disk stats do not include NFS volumes.</span></li></ul><p>Do you use other tools or methods to collect basic disk IO counters for storage sizing purposes?  If so, leave a comment describing your approach!</p> ]]></content:encoded> <wfw:commentRss>http://vmtoday.com/2009/09/esxtop-batch-mode-windows-perfmon/feed/</wfw:commentRss> <slash:comments>4</slash:comments> </item> <item><title>Balloon Driver Problems with SQL</title><link>http://vmtoday.com/2009/09/balloon-driver-problems-with-sql/?utm_source=rss&#038;utm_medium=rss&#038;utm_campaign=balloon-driver-problems-with-sql</link> <comments>http://vmtoday.com/2009/09/balloon-driver-problems-with-sql/#comments</comments> <pubDate>Thu, 10 Sep 2009 01:47:47 +0000</pubDate> <dc:creator>Joshua Townsend</dc:creator> <category><![CDATA[Issues & Troubleshooting]]></category> <category><![CDATA[VMware]]></category> <category><![CDATA[VMware How To]]></category> <category><![CDATA[VMworld]]></category> <category><![CDATA[3.5]]></category> <category><![CDATA[AWE]]></category> <category><![CDATA[balloon]]></category> <category><![CDATA[balloon driver]]></category> <category><![CDATA[driver]]></category> <category><![CDATA[ESX]]></category> <category><![CDATA[I/O]]></category> <category><![CDATA[kernel]]></category> <category><![CDATA[memory]]></category> <category><![CDATA[memory contention]]></category> <category><![CDATA[optimization]]></category> <category><![CDATA[PAE]]></category> <category><![CDATA[performance]]></category> <category><![CDATA[sizing]]></category> <category><![CDATA[sql]]></category> <category><![CDATA[troubleshooting]]></category> <category><![CDATA[tuning]]></category> <category><![CDATA[vCPU]]></category> <category><![CDATA[VM]]></category> <category><![CDATA[vmworld]]></category><guid
isPermaLink="false">http://vmtoday.com/?p=175</guid> <description><![CDATA[I have been meaning to write this up for a while; Scott Drummonds&#8217; &#8216;Love Your Balloon Driver&#8217; post today at his Virtual Performance blog gave me a nice reminder.  I actually caught a sneak peak at the graphs with an explanation from Scott at his instructor-led lab at VMworld 2009.  Scott calls out that the [...]]]></description> <content:encoded><![CDATA[<p></p><p>I have been meaning to write this up for a while; Scott Drummonds&#8217; <a
title="Love Your Balloon Driver" href="http://communities.vmware.com/blogs/drummonds/2009/09/09/love-your-balloon-driver">&#8216;Love Your Balloon Driver&#8217; post</a> today at his Virtual Performance blog gave me a nice reminder.  I actually caught a sneak peak at the graphs with an explanation from Scott at his instructor-led lab at VMworld 2009.  Scott calls out that the only workload they discovered suffers from balloon driver activity is Java.  The reason for Java&#8217;s problems with balloon driver activity is that Java itself runs in a VM and so the guest OS cannot properly determine which pages should be swapped out when the balloon driver calls for it.</p><p>My experiences causes me to agree with Scott and the whitepaper he cites &#8211; in a properly designed and equipped environment the balloon driver is not detrimental for most every workload to a point.   However, I recently discovered in a client site that the balloon driver can cause significant issues when the environment is poorly designed and under-sized.  Here the background:</p><p>I was called into an already established environment where the client was running on an older blade with VMware ESX 3.5.  The blade maxed out at 16GB RAM and had dual dual-core CPU&#8217;s with no hope for an upgrade.  On the blade was a single guest VM running Windows 2003 with SQL 2005, in it&#8217;s full 32-bit glory.  The VM was configured with 4 vCPU&#8217;s and 16GB of memory.  Some of you can probably already guess where this is going&#8230;.</p><p>The x86 Windows guest had <a
href="http://technet.microsoft.com/en-us/library/cc784574(WS.10).aspx">PAE </a>configured, and SQL took advantage of <a
href="http://technet.microsoft.com/en-us/library/ms190673.aspx" target="_blank">AWE </a>to use the additional memory beyond the 4GB limit of a 32-bit system.  Additionally, the Windows guest had the /3GB switch enabled in boot.ini.  Finally, as per SQL best practices, the &#8216;<a
href="http://technet.microsoft.com/en-us/library/ms190730.aspx" target="_blank">Lock Pages in Memory</a>&#8216; permission was granted to the SQL Server service account.  What the guest was left with was 1GB of kernel mode memory and 15GB of User Mode/Extended addressable memory.</p><p>And here&#8217;s the problem.  The client was using ESX, not ESX 3.5, so the Service Console required memory.  In this case, the service console had approximately 512MB allocated to it.  Futhermore, VM&#8217;s require some overhead on ESX to run.  The memory overhead consumed by a Windows guest on ESX 3.5 with 4 vCPU and 16GB of memory is a bit more than 512MB.  On a properly sized ESX server with multiple similar guests/workloads, you could probably gain much of the overhead back through transparent page sharing; but in this case I had a 1:1 P2V ratio.  If you are any good at math you see that the environment is running about 1GB short of memory.  A quick check of the balloon driver stat in vCenter show that the balloon driver was constantly active and demanding about 1GB back from the guest&#8230; constantly.</p><p>Under normal circumstances this might not be an issue, but in this case the Windows guest was being absolutely punished.  The guest CPU&#8217;s were pegged at 100% with an excessive amount of kernel time, often indicating IO issues.  And indeed I did experience terrible disk and network performance on the guest.  At the root of the problem is this &#8211; the Lock Pages in Memory permission allows SQL to get a firm grasp on the user mode memory available to it (15GB) and lock it up.  This left the already starved (because of the 3GB switch in the boot.ini) guest kernel with it&#8217;s 1GB the only thing the balloon driver could really swap out.</p><p>The client suggested a reservation of 16GB on the VM, knowing that memory reservations prevent balloon driver activity.  I calmly asked them to back away from the keyboard as I explained how if a starved guest was bad, how much worse a starved Service Console would be.  In the end the fix was quiet easy &#8211; I convinced the customer that they should reduce the amount of memory allocated to the guest by about 1GB, enough to let the 512MB SC and the 512MB of overhead run without contention.  I was able to show them the difference between allocated and active memory in vCenter &#8211; the 1GB being surrendered was not really being actively used, SQL just had it locked up.  In fact, surrendering the 1GB of memory back to ESX breathed new life into the guest VM, bringing its performance back in line with expectations.</p><p>Ideally, I would have brought in a bigger ESX server that could serve additional VM&#8217;s, driving greater levels of efficiency across the environment.  It just wasn&#8217;t an option for the client in this case.  In the end, the problem was fixed and I was reminded just how fun it can be to explain some of these backwards sounding virtualization concepts to customers &#8211; fewer vCPU&#8217;s can lead to better performance of guests, less guest memory can fix performance issues, and increasing the quantity of similar guests on a host can drive better performance to a point because of transparent page sharing.</p><p>Stay tuned over the next few weeks as I digest and write on my VMworld experience &#8211; from VMUG activities to Paul Maritz&#8217;s press conference announcing the vCloud Express, and plenty of great sessions in between.  Like many of you, I returned from VMworld with quite a backlog of work but I&#8217;ll do my best to squeeze in some posts and tweets.</p> ]]></content:encoded> <wfw:commentRss>http://vmtoday.com/2009/09/balloon-driver-problems-with-sql/feed/</wfw:commentRss> <slash:comments>1</slash:comments> </item> <item><title>Virtual Infrastructure Client Opens Off Screen</title><link>http://vmtoday.com/2009/07/virtual-infrastructure-client-opens-off-screen/?utm_source=rss&#038;utm_medium=rss&#038;utm_campaign=virtual-infrastructure-client-opens-off-screen</link> <comments>http://vmtoday.com/2009/07/virtual-infrastructure-client-opens-off-screen/#comments</comments> <pubDate>Thu, 16 Jul 2009 15:39:44 +0000</pubDate> <dc:creator>Joshua Townsend</dc:creator> <category><![CDATA[Issues & Troubleshooting]]></category> <category><![CDATA[VMware]]></category> <category><![CDATA[infrastructure]]></category> <category><![CDATA[Microsoft]]></category> <category><![CDATA[vi client]]></category> <category><![CDATA[VI3]]></category> <category><![CDATA[windows]]></category> <category><![CDATA[windows 7]]></category><guid
isPermaLink="false">http://vmtoday.com/?p=135</guid> <description><![CDATA[A user reported an issue with one of the VM&#8217;s in my environment this morning.  It seems that an automated process had spun up the CPU to 100% in the Windows guest and the system was deadlocked.  I was still at home when I received the message on my BlackBerry, so I fired up the [...]]]></description> <content:encoded><![CDATA[<p></p><p>A user reported an issue with one of the VM&#8217;s in my environment this morning.  It seems that an automated process had spun up the CPU to 100% in the Windows guest and the system was deadlocked.  I was still at home when I received the message on my BlackBerry, so I fired up the VPN on my Windows 7 laptop, opened the VI3 client and&#8230;.., um, where is it?  The VI3 client icon was in the taskbar, but the app was nowhere to be found &#8211; it had opened off-screen where my secondary monitor usually lives.  This is nothing new for the VI client &#8211; I have experienced it numerous times in the past.  But this was my first time with the problem on Windows 7.</p><p>Pre-Windows 7, I would have right-clicked the Windows taskbar for the app, selected &#8216;Move&#8217;, and then used the keyboard arrow keys to guide the phantom window home.  Windows 7 does not have the same Windows positioning options on a right-click to the taskbar so I had to find another way. Enter Windows shortcut keys.  Here&#8217;s how I brought the VI3 Client window back into view:</p><ol><li>Make sure that the VI3 Client window is in the foreground by selecting it in the taskbar.  You&#8217;ll know that it is in the foreground when the taskbar icon gets a white glow as pictured here: <a
href="http://cloudfront.vmtoday.com/wp-content/uploads/2009/07/vi3_client_in_taskbar.png" rel="lightbox[135]"><img
class="alignnone size-full wp-image-136" title="vi3_client_in_taskbar" src="http://cloudfront.vmtoday.com/wp-content/uploads/2009/07/vi3_client_in_taskbar.png" alt="vi3_client_in_taskbar" width="122" height="39" /></a></li><li>Press the hotkey combination: &#8220;<strong>ALT+Space, M</strong>&#8221; for Move.</li><li>Use the keyboard arrow keys to move the window to your active monitor, pressing &#8220;Enter&#8221; once the window is visible to commit the move.</li><li>If the arrow keys fail to move the window and/or you hear the Windows error sound, your VI3 Client windows is probably maximized.  The move option is not available when a window is maximized.  To work around this condition use the hotkey combination: &#8220;<strong>ALT+Space, R</strong>&#8221; for Restore.  You should now be able to move the window using steps #2 &amp; #3 above.</li></ol><p>If you are still really struggling, break out the trusty old registry editor and follow along:</p><ol><li>Close any open VMware Infrastructure Client windows</li><li>Navigate to <strong>HKEY_CURRENT_USER\Software\VMware\VMware Infrastructure Client\Preferences\UI</strong></li><li>Locate the <strong>ApplicationLocation</strong> key.  This key provides the X-Y coordinate for the VI Client window at startup.</li><li>Modify the string value to <strong>0-0</strong>.  This value will cause the VI3 client to open in the center of your primary display.</li><li>If you run different sized/resolution displays, you may also want to change the<strong> ApplicationMaximized</strong> or <strong>ApplicationSize</strong> keys to fit your needs.</li><li>Launch the VMware Infrastructure Client and get back to work.</li></ol> ]]></content:encoded> <wfw:commentRss>http://vmtoday.com/2009/07/virtual-infrastructure-client-opens-off-screen/feed/</wfw:commentRss> <slash:comments>0</slash:comments> </item> <item><title>IBM DS3300 iSCSI Write Performance Solved</title><link>http://vmtoday.com/2009/06/ibm-ds3300-iscsi-write-performance-solved/?utm_source=rss&#038;utm_medium=rss&#038;utm_campaign=ibm-ds3300-iscsi-write-performance-solved</link> <comments>http://vmtoday.com/2009/06/ibm-ds3300-iscsi-write-performance-solved/#comments</comments> <pubDate>Tue, 16 Jun 2009 22:53:57 +0000</pubDate> <dc:creator>Joshua Townsend</dc:creator> <category><![CDATA[Issues & Troubleshooting]]></category> <category><![CDATA[Storage]]></category> <category><![CDATA[VMware]]></category> <category><![CDATA[VMware How To]]></category> <category><![CDATA[Dell]]></category> <category><![CDATA[ESX]]></category> <category><![CDATA[IBM]]></category> <category><![CDATA[iscsi]]></category> <category><![CDATA[lun]]></category> <category><![CDATA[performance]]></category> <category><![CDATA[SAN]]></category> <category><![CDATA[VI3]]></category> <category><![CDATA[virtualization]]></category> <category><![CDATA[write caching]]></category><guid
isPermaLink="false">http://vmtoday.com/?p=94</guid> <description><![CDATA[I have been pulling my hair out with a small VI3 implementation running against an IBM DS3300 iSCSI array.  Performance, for lack of a better term, sucked.  Granted, the DS3300 is not an enterprise level workhorse of a storage system, but it fit the budget.  Read performance was decent from the array, but write performance [...]]]></description> <content:encoded><![CDATA[<p></p><p>I have been pulling my hair out with a small VI3 implementation running against an IBM DS3300 iSCSI array.  Performance, for lack of a better term, sucked.  Granted, the DS3300 is not an enterprise level workhorse of a storage system, but it fit the budget.  Read performance was decent from the array, but write performance was terrible, maxing out at 10Mpbs throughput and insanely high latencies on long writes when the system was under load.  This led to some long P2V operations, poor guest performance, and some questions from the project sponsors on why I couldn&#8217;t make the environment sing.</p><p>The system was configured with a single controller with dual GigE NIC&#8217;s.  The controller had 512MB of battery backed cache (there is also a 1GB cache upgrade option available).  I wrote off some of the poor performance to a single controller with a less-than-optimal amount of cache; blamed the SAS controller to SATA disk command translation overhead; cringed at the 6 disk RAID5 configuration; and engaged in some self doubting.  I convinced the powers that be that we were IO constrained and got some funds to fill out the 3U chassis to a full 12 SATA disks, and reconfigured the array as a RAID10.  Performance gains were almost unnoticeable with these changes.  In addition, I did some basic troubleshooting of the network environment, verifying multiple paths to the storage, setting Flow Control on the switches to receive only, and double-checked my iSCSI initiator settings.  Note: The DS3300 is only supported with the ESX software initiator.  I found documentation on the DS3300 to be lacking, but did discover that the Dell MD3000i is based on the same LSI Engenio array.  Some Googling on the Dell solution led to to the &#8216;SMcli&#8217; command line interface for both arrays.   The commands are slighly different for the Dell and IBM.  The links to the IBM CLI documentation were broken, so I had to do a bit of trial and error to get the commands right.  I used the <a
href="http://support.dell.com/support/edocs/systems/md3000i/en/CLI/PDF/CLIMR2g.pdf" target="_blank">Dell documentation</a> as a starting point.  (Rant: Seriously, IBM?  Can you make your documentation any harder to get through &#8211; is it a Redbook, is it an Engineering Whitepaper, is it a support document, is it a case study &#8211; and why can I only find these with complex Google searches, not on your own product pages, and why can&#8217;t you name for documents intelligently, not with some random string of characters).</p><p><strong>Update</strong>:<strong> The IBM System Storage DS3000, DS4000, and DS5000Command Line Interface and Script Commands Programming Guide is here:</strong> <a
href="http://cloudfront.vmtoday.com/wp-content/uploads/2009/06/DS3k4k5kCLIreference.pdf">IBM System Storage DS3000, DS4000, and DS5000Command Line Interface and Script Commands Programming Guide &#8211; DS3k4k5kCLIreference, SMCLI</a></p><p>Moving on&#8230; I received an automated alert from the DS3300 about an incomplete battery learn cycle.  Using the IBM Storage Manager GUI I generated a  Storage Subsystem Profile&#8217; from the Support tab to check the battery status.  In the profile I discovered that while write cache was enabled, it had a status of &#8220;Enabled (Suspended)&#8221;.   Ah ha!  Now I&#8217;ve got some decent Google material that led me to this: http://communities.vmware.com/thread/195838.  Hot damn I love the VMware Community Forums!</p><p>It turns out that in a single-controller configuration the setting for cache mirroring remains enabled by default.  Because there is no 2nd controller to mirror to, the array suspends write caching.  This is probably a safety thing &#8211; loss of high availability on the controllers puts data in cache at risk should the only controller fail.  I weighed my options and decided that the poor performance I was experiencing beat HA concerns, so I enabled write cache on the array using this command:</p><p
style="padding-left: 30px;">c:\program files\ibm_ds4000\client&gt;smcli -n &lt;ARRAYNAME&gt; -c &#8220;set allLogicalDrives mirrorEnabled=false;&#8221;</p><p>And then followed with this for good measure:</p><p
style="padding-left: 30px;">c:\program files\ibm_ds4000\client&gt;smcli -n &lt;ARRAYNAME&gt; -p &lt;arraypassword&gt; -c &#8220;set allLogicalDrives writeCacheEnabled=true;&#8221;</p><p>The results were immediately noticeable:</p><div
id="attachment_98" class="wp-caption aligncenter" style="width: 430px"> <a
href="http://cloudfront.vmtoday.com/wp-content/uploads/2009/06/ds3300-performance-with-write-cache1.jpg" target="_blank" rel="lightbox[94]"><img
class="size-large wp-image-98" title="ds3300-performance-with-write-cache1" src="http://cloudfront.vmtoday.com/wp-content/uploads/2009/06/ds3300-performance-with-write-cache1-1023x392.jpg" alt="DS3300 Performance Improvement when Write Cache is Enabled" width="430" height="164" /></a><p
class="wp-caption-text">DS3300 Performance Improvement when Write Cache is Enabled - Click for a Larger View</p></div><p>The screen shot is from <a
href="http://www.veeam.com/esxi-monitoring-free.html" target="_blank">Veeam Monitor Free Edition</a>, taken during 4 concurrent V2V operations from Hyper-V to VMware.  With the write cache fully functional, disk usage peaked at 54MBps, latency dropped to about 6ms, and my blood pressure dropped a few notches.</p><p>While poking around the CLI I also found that you can dump performance stats from the array (performance is otherwise hard to find on the thing) using this command:</p><p
style="padding-left: 30px;">C:\Program Files\IBM_DS4000\client&gt;smcli -n &lt;ARRAYNAME&gt; -c &#8220;set session performanceMonitorInterval=5 performanceMonitorIterations=120;save storageSubsystem performanceStats file=\&#8221;c:<a
href="file://///ds3300perfstats.csv/">\\ds3300perfstats.csv\</a>&#8220;;&#8221;</p><p>This will give you a 10 minute record of performance from the array which you can analyze using Excel.  The Dell Enterprise Center TechCenter Wiki has a great write-up on how to efficiently analyze the data from this command here: <a
href="http://www.delltechcenter.com/page/MD3000i+Performance+Monitoring" target="_blank">http://www.delltechcenter.com/page/MD3000i+Performance+Monitoring</a>, complete with a YouTube video that walks you through the process:</p><p
style="text-align: center;"><object
width="425" height="344" classid="clsid:d27cdb6e-ae6d-11cf-96b8-444553540000" codebase="http://download.macromedia.com/pub/shockwave/cabs/flash/swflash.cab#version=6,0,40,0"><param
name="allowFullScreen" value="true" /><param
name="allowScriptAccess" value="always" /><param
name="src" value="http://www.youtube.com/v/SoRR1VVuETs&amp;color1=0xb1b1b1&amp;color2=0xcfcfcf&amp;feature=player_embedded&amp;fs=1" /><param
name="allowfullscreen" value="true" /><embed
width="425" height="344" type="application/x-shockwave-flash" src="http://www.youtube.com/v/SoRR1VVuETs&amp;color1=0xb1b1b1&amp;color2=0xcfcfcf&amp;feature=player_embedded&amp;fs=1" allowFullScreen="true" allowScriptAccess="always" allowfullscreen="true" /></object></p><p
style="text-align: left;">I am beginning to think that the DS3300 (and MD3000i) may actually be a viable starter solution for SMB&#8217;s starting out on a virtualization project.  But I would recommend the cache upgrade, 2nd controller, SAS disks instead of SATA to eliminate the SAS-to-SATA translation overhead and more faster disks instead of fewer slower disks so you can drive throughput and IOPS to a higher level.</p><p
style="text-align: left;">Have any of you deployed the DS3300 or MD3000i (or the generic LSI solution)?  Do you have any performance tuning tips for these arrays?  If so, share in the comments!</p> ]]></content:encoded> <wfw:commentRss>http://vmtoday.com/2009/06/ibm-ds3300-iscsi-write-performance-solved/feed/</wfw:commentRss> <slash:comments>31</slash:comments> </item> <item><title>VMFS Volumes Missing!?!?!</title><link>http://vmtoday.com/2009/06/vmfs-volumes-missing/?utm_source=rss&#038;utm_medium=rss&#038;utm_campaign=vmfs-volumes-missing</link> <comments>http://vmtoday.com/2009/06/vmfs-volumes-missing/#comments</comments> <pubDate>Wed, 03 Jun 2009 15:52:17 +0000</pubDate> <dc:creator>Joshua Townsend</dc:creator> <category><![CDATA[Issues & Troubleshooting]]></category> <category><![CDATA[VMware]]></category> <category><![CDATA[VMware How To]]></category> <category><![CDATA[vi3 esx vmware storage iscsi vmfs lun fc fiber san]]></category><guid
isPermaLink="false">http://vmtoday.com/?p=86</guid> <description><![CDATA[Here&#8217;s the scenario: After performing maintenance on an ESX server (patches, storage re-scan, reboot), VMFS volumes are no longer visible, even though the hosting LUN can be seen on the Storage Adapters page of the ESX Configuration tab.  Most VMware administrators will see this play out at some point; I saw it in one of [...]]]></description> <content:encoded><![CDATA[<p></p><p>Here&#8217;s the scenario:</p><p>After performing maintenance on an ESX server (patches, storage re-scan, reboot), VMFS volumes are no longer visible, even though the hosting LUN can be seen on the Storage Adapters page of the ESX Configuration tab.  Most VMware administrators will see this play out at some point; I saw it in one of my environments today and figured I should make a note of the steps required to correct the issue.</p><p>Typically, the root cause of the issue is a change on the storage array that causes the h(id) of the LUN(s) in question to change.  This change could be anything from an array firmware update, LUN removal/recreation, or RAID/LUN reconfiguration.  These changes could cause the h(id) of the LUN to be updated.  When a rescan takes place on the ESX storage adapters (through a manual instantiation, reboot, etc.), the new h(id) is observed.  Because it does not match the previously observed ID, the LUN is tagged as a snapshot LUN and access to that LUN is disabled.</p><p>Diagnosis of this problem is fairly easy.  In addition to the behavior I have described, as observed through the Virtual Center Client, the problem can also be confirmed through the ESX command line.</p><p><span
style="font-size: small;">To diagnosis this issue from the console, view the vmkernel log by issuing the following command: tail -f /var/log/vmkernel</span></p><p><span
style="font-size: small;">You will see messages in the log similar to the following:<br
/> </span></p><p>Jun  2 16:01:29 esx04 vmkernel: 0:00:31:14.543 cpu3:1039)ALERT: LVM: 4482: vml.0200020000600a0b80005add7800000a494a1d0be6313732362d33:1 may be snapshot: disabling access. See resignaturing section in SAN config guide.<br
/> Jun  2 16:01:29 esx04 vmkernel: 0:00:31:14.552 cpu3:1039)LVM: 5579: Device vml.0200010000600a0b80005add7800000a474a1d0bc8313732362d33:1 detected to be a snapshot:<br
/> Jun  2 16:01:29 tccesx04 vmkernel: 0:00:31:14.552 cpu3:1039)LVM: 5586:   queried disk ID: &lt;type 2, len 22, lun 1, devType 0, scsi 6, h(id) 5103533129706062046&gt;<br
/> Jun  2 16:01:29 esx04 vmkernel: 0:00:31:14.552 cpu3:1039)LVM: 5593:   on-disk disk ID: &lt;type 2, len 22, lun 1, devType 0, scsi 6, h(id) 2153359415130143165&gt;</p><p>After confirming that this is indeed the problem you are experiencing, stop and take a deep breath.  The fix is easy, but you need to take steps before fixing it to prevent further damage.  If you are lucky, the problem has only manifested itself on one ESX server (and hopefully that ESX was not hosting any VM&#8217;s because you put it into maintenance mode).  Prevent your other ESX servers from rescanning storage &#8211; don&#8217;t reboot them, don&#8217;t manually rescan, don&#8217;t update them.</p><p>If the affected ESX server was hosting running VM&#8217;s, HA (if licensed and properly configured) should have kicked in if applicable and restarted the VM&#8217;s on another node in the ESX cluster.</p><p>If multiple ESX servers (or all of them) are affected, your VM&#8217;s are likely all powered off after hard stops, so there is not much you can do but to get on with fixing the issue and trust your backups (you do have backups, right?).  This is where array-level snapshots come in handy.  In my experience, most if not all VM&#8217;s recover after a hard stop like this, but don&#8217;t let that keep you from having a robust DR plan.</p><p>To correct the issue you must not have any running VM&#8217;s on the affected VMFS volumes to alternate volumes.  Shut down the VM&#8217;s or use Storage VMotion to move running VM&#8217;s to alternate LUN&#8217;s.</p><p>In the VI Client, select the affected ESX host in the Hosts &amp; Clusters view.  Switch to the Configuration Tab.  Click &#8216;Advanced Settings&#8217; and then choose the LVM node.  Change the LVM.DisallowSnapshotLun from the default setting of &#8217;1&#8242; to &#8217;0&#8242; and click OK.  Next, rescan your storage from the &#8216;Configuration | Storage Adapters&#8217; pane.  Your missing VMFS volumes should re-appear.  You&#8217;re doing fine, but not done yet.</p><p>Even if the other hosts that use the affected VMFS volume appear to be fine, they will most likely lose access to the volume once a rescan/reboot takes place.  You need to perform the LVM.DisallowSnapshotLun = 0 setting change on all ESX servers connected to the volume, followed by a re-scan of your storage.</p><p>Once all affected ESX servers see the VMFS volumes, change the LVM.DisallowSnapshotLun setting back to the default of 1.  Migrate back and/or power up VM&#8217;s on the volume and see what the damage is.  If you are lucky, everything is good to go.  If not, it&#8217;s a great time to check out those backups.</p><p>If you do not know what caused the storage change, check your ESX logs to try to determine if the server was rebooted or if storage was rescanned. This will give you an idea of when the change occurred &#8211; a starting point to work back from to find the root cause.  Use this command to get started: less /var/log/vmksummary</p><p>Here are some suggestions on how to avoid this problem:</p><p>1.) Minimize changes to LUN&#8217;s once configured on an ESX.</p><p>2.) Coordinate Storage Maintenance with VMware maintenance windows.</p><p>3.) Have stand-by storage so you can Storage VMotion running VM&#8217;s off of the affected LUNS.</p><p>4.) Consider NFS, as NFS volumes are not impacted by resignaturing.</p><p>For more information on this problem, or to better understand the advanced settings changes involved, check out the VMware SAN Configuration Guide at <a
title="VMware VI3 SAN Configuration Guide" href="http://www.vmware.com/pdf/vi3_35/esx_3/r35u2/vi3_35_25_u2_3_server_config.pdf" target="_blank">http://www.vmware.com/pdf/vi3_35/esx_3/r35u2/vi3_35_25_u2_3_server_config.pdf</a>, page 114, or the VMware iSCSI SAN Configuration Guide at <a
title="VMware VI3 iSCSI SAN Configuration Guide" href="http://www.vmware.com/pdf/vi3_35/esx_3/r35u2/vi3_35_25_u2_iscsi_san_cfg.pdf" target="_blank">http://www.vmware.com/pdf/vi3_35/esx_3/r35u2/vi3_35_25_u2_iscsi_san_cfg.pdf</a>, page 117.</p> ]]></content:encoded> <wfw:commentRss>http://vmtoday.com/2009/06/vmfs-volumes-missing/feed/</wfw:commentRss> <slash:comments>5</slash:comments> </item> <item><title>DL380 BIOS Configuration for VMware</title><link>http://vmtoday.com/2009/03/dl380-bios-configuration-for-vmware/?utm_source=rss&#038;utm_medium=rss&#038;utm_campaign=dl380-bios-configuration-for-vmware</link> <comments>http://vmtoday.com/2009/03/dl380-bios-configuration-for-vmware/#comments</comments> <pubDate>Mon, 16 Mar 2009 20:08:15 +0000</pubDate> <dc:creator>Joshua Townsend</dc:creator> <category><![CDATA[Issues & Troubleshooting]]></category> <category><![CDATA[VMware]]></category> <category><![CDATA[3.5]]></category> <category><![CDATA[compatibility]]></category> <category><![CDATA[DL380]]></category> <category><![CDATA[ESX]]></category> <category><![CDATA[G3]]></category> <category><![CDATA[Patch]]></category> <category><![CDATA[Update 3]]></category> <category><![CDATA[virtual]]></category> <category><![CDATA[virtualization]]></category><guid
isPermaLink="false">http://vmtoday.com/?p=71</guid> <description><![CDATA[One more post to wrap up the nonsense with my DL380 G3 ESX servers&#8230;. Vincent Vlieghe noted that you must make a couple changes to your DL380 G3&#8242;s for ESX to work correctly.  His post was written back in 2006 when we were still working with ESX 2.x, but the same appears to be true [...]]]></description> <content:encoded><![CDATA[<p></p><p>One more post to wrap up the nonsense with my DL380 G3 ESX servers&#8230;.</p><p><a
href="http://virtrix.blogspot.com">Vincent Vlieghe</a> noted that you must make a couple changes to your DL380 G3&#8242;s for ESX to work correctly.  His <a
href="http://virtrix.blogspot.com/2006/07/hp-proliant-and-compaq-mps-table-bios.html">post </a>was written back in 2006 when we were still working with ESX 2.x, but the same appears to be true of ESX 3.5 RTM (<a
href="http://vmtoday.com/2009/03/double-check-the-hcl/">Updates are not supported on this hardware per the HCL</a>).  The changes you must make to BIOS are:</p><blockquote><p>For stable operation on these systems, ESX Server requires a BIOS MPS Table Mode setting of Full Table APIC. With the exception of the specific systems referenced below, the following BIOS settings must be applied in order if available:</p><ol><li>System Options &gt; OS Selection: Select Windows 2000.</li><li>Advanced Options &gt; MPS Table Mode: Select Full Table APIC.</li><li>When presented with multiple Windows options (Windows 2000, Windows Server 2003, Windows .NET, and so on) select Windows 2000. If both BIOS settings are available and can be modified, both must be set correctly. You should confirm these settings after any BIOS upgrade operation.</li></ol></blockquote><p>I have seen other references that say that you should also disable hyperthreading on this platform, but I was able to successfully run with Hyperthreading enabled with no performance degradation or stability issues.  I hope this information is helpful to those of you still running these dinosaurs!</p> ]]></content:encoded> <wfw:commentRss>http://vmtoday.com/2009/03/dl380-bios-configuration-for-vmware/feed/</wfw:commentRss> <slash:comments>1</slash:comments> </item> <item><title>Double-Check the HCL</title><link>http://vmtoday.com/2009/03/double-check-the-hcl/?utm_source=rss&#038;utm_medium=rss&#038;utm_campaign=double-check-the-hcl</link> <comments>http://vmtoday.com/2009/03/double-check-the-hcl/#comments</comments> <pubDate>Thu, 12 Mar 2009 15:57:18 +0000</pubDate> <dc:creator>Joshua Townsend</dc:creator> <category><![CDATA[Issues & Troubleshooting]]></category> <category><![CDATA[Uncategorized]]></category> <category><![CDATA[VMware]]></category> <category><![CDATA[3.5]]></category> <category><![CDATA[best practices]]></category> <category><![CDATA[compatibility]]></category> <category><![CDATA[DL380]]></category> <category><![CDATA[ESX]]></category> <category><![CDATA[G3]]></category> <category><![CDATA[HCL]]></category> <category><![CDATA[HP]]></category> <category><![CDATA[Patch]]></category> <category><![CDATA[Update 3]]></category> <category><![CDATA[virtual]]></category> <category><![CDATA[virtualization]]></category><guid
isPermaLink="false">http://vmtoday.com/?p=56</guid> <description><![CDATA[I wrote some time back about networking problems with a clean install of ESX 3.5 U3 on a HP DL380 G3 server in a lab environment.  A simple downgrade to ESX 3.5 RTM corrected the issue and I didn&#8217;t think much about it.  One of the servers in the lab died and I went about [...]]]></description> <content:encoded><![CDATA[<p></p><p>I wrote some time back about <a
href="http://vmtoday.com/2008/11/networking-problems-with-esx-35-update-3-on-the-dl380-g3/">networking problems with a clean install of ESX 3.5 U3 on a HP DL380 G3 server</a> in a lab environment.  A simple downgrade to ESX 3.5 RTM corrected the issue and I didn&#8217;t think much about it.  One of the servers in the lab died and I went about the business of rebuilding it.  Having learned my lesson, I started with an ESX 3.5 RTM install and then patched to Update 3 plus other applicable updates.  Much to my chagrin, the server began crapping out on me randomly.  Some reboots, some networking issues, and other assorted not so good things.  Now the DL380 G3 is not the spring chicken it used to be, so I assumed some faulty hardware was probably to blame.  Some diagnostics and log reviews yielded no hardware issues.</p><p>On a whim, I decided to check the VMware HCL to see if the DL380 G3 was still on the list of compatible servers for ESX.  Now, I had checked, or rather &#8216;remembered&#8217; checking, the HCL before that first problematic install, but a recheck never hurts.  When I arrived at the VMware <a
title="VMware HCL" href="http://www.vmware.com/resources/techresources/458" target="_blank">HCL page</a> I saw the same old trusty PDF link with a slightly newer revision date than my previous visit.  I was pleasantly surprised when I clicked the PDF link to find that I was redirected to a <a
title="New VMware HCL" href="http://www.vmware.com/resources/compatibility/search.php" target="_blank">searchable, filterable forms-based version of the HCL</a>.  Nice!  Let&#8217;s do this thing&#8230;.</p><p>I&#8217;m a little lazy, so I simply used a keyword search to look up &#8216;DL380 G3&#8242;.  Presto-chango: I&#8217;ve got results, and I like what I see:</p><div
id="attachment_62" class="wp-caption alignleft" style="width: 383px"> <img
class="size-full wp-image-62" title="dl380g3hclsearch" src="http://cloudfront.vmtoday.com/wp-content/uploads/2009/03/dl380g3hclsearch.png" alt="Search Results for DL380 G3 on the VMware HCL" width="383" height="34" /><p
class="wp-caption-text">Search Results for DL380 G3 on the VMware HCL</p></div><p>My eyes jump right to ESX 3.5 &#8211; Supported, on my platform, no further questions your honor.  Close the old browser window and move on with my life, my life being troubleshooting this darn server.</p><p>A few hours later I am still struggling with the server and turn to Ebay for salvation.  &#8220;If you can&#8217;t beat em, cheat em,&#8221; my grandfather used to say.  I&#8217;ll find new hardware for my lab.  I identified some other hunk of junk that just might work and decided to check the HCL for it.  That&#8217;s when it jumped out at me: there are Update versions included in the HCL and I had been to quick to see it on my DL380 G3 search.  Back to the HCL.</p><p>This time I just do a search for &#8216;DL380&#8242;, leaving off the Generational notation and get the following:</p><div
id="attachment_63" class="wp-caption alignleft" style="width: 382px"> <img
class="size-full wp-image-63" title="DL380 HCL Search" src="http://cloudfront.vmtoday.com/wp-content/uploads/2009/03/dl380hclsearch.png" alt="Search Results for DL380 from the VMware HCL" width="382" height="211" /><p
class="wp-caption-text">Search Results for DL380 from the VMware HCL</p></div><p>The ProLiant DL380 G5 with Quad-core Intel Xeon processors lists ESX 3.5 U3, ESX 3.5 U2, and ESX 3.5 U1 as supported releases, along with the RTM ESX 3.5.  The Update versions are not listed for the G3 or G4.  After some self-deprecating curses and a reinstall of ESX 3.5 Update-nada, stability returned.</p><p>The lesson learned, double-check the HCL (or if you are a little slow like me, a triple-check doesn&#8217;t hurt).  The HCL is major version and Update-revision sensitive.  And, not all models are treated equally.  You&#8217;ll notice in the picture to the left that the DL380 G5 has different supported releases depending on the CPU Model.</p><p>Also, keep in mind that you need to verify that all components of your VMware infrastructure are on the HCL from Servers and Systems to IO Devices, and Storage/SAN.  The VMware HCL site offers some basic tips for searching here: <a
title="Help on VMware HCL Search Fields" href="http://www.vmware.com/resources/compatibility/help.php">http://www.vmware.com/resources/compatibility/help.php</a>.</p><p>Here&#8217;s the real take-away: The VMware HCL is there for a reason.  Sure, you might be able to get something that is not on the HCL to work, but you may experience instability along the way.  In the event that you are running a non-HCL system you may also find that VMware Support may be limited in what they can do for you.</p> ]]></content:encoded> <wfw:commentRss>http://vmtoday.com/2009/03/double-check-the-hcl/feed/</wfw:commentRss> <slash:comments>0</slash:comments> </item> </channel> </rss>
<!-- Performance optimized by W3 Total Cache. Learn more: http://www.w3-edge.com/wordpress-plugins/

Minified using disk: basic
Page Caching using disk: enhanced
Database Caching 78/224 queries in 0.382 seconds using disk: basic
Object Caching 3053/3291 objects using disk: basic
Content Delivery Network via Amazon Web Services: CloudFront: cloudfront.vmtoday.com

Served from: vmtoday.com @ 2012-02-10 07:20:48 -->
