<?xml version="1.0" encoding="UTF-8"?> <rss
version="2.0"
xmlns:content="http://purl.org/rss/1.0/modules/content/"
xmlns:wfw="http://wellformedweb.org/CommentAPI/"
xmlns:dc="http://purl.org/dc/elements/1.1/"
xmlns:atom="http://www.w3.org/2005/Atom"
xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
><channel><title>VMtoday &#187; networking</title> <atom:link href="http://vmtoday.com/tag/networking/feed/" rel="self" type="application/rss+xml" /><link>http://vmtoday.com</link> <description>VMware News, Views, &#38; How-To&#039;s from vExpert Josh Townsend</description> <lastBuildDate>Fri, 18 May 2012 19:03:15 +0000</lastBuildDate> <language>en</language> <sy:updatePeriod>hourly</sy:updatePeriod> <sy:updateFrequency>1</sy:updateFrequency> <generator>http://wordpress.org/?v=3.3.2</generator> <item><title>vSphere 5 Networking Bug #2 Affects Management Network Connectivity</title><link>http://vmtoday.com/2012/02/vsphere-5-networking-bug-2-affects-management-network-connectivity/?utm_source=rss&#038;utm_medium=rss&#038;utm_campaign=vsphere-5-networking-bug-2-affects-management-network-connectivity</link> <comments>http://vmtoday.com/2012/02/vsphere-5-networking-bug-2-affects-management-network-connectivity/#comments</comments> <pubDate>Fri, 10 Feb 2012 19:06:16 +0000</pubDate> <dc:creator>Joshua Townsend</dc:creator> <category><![CDATA[Issues & Troubleshooting]]></category> <category><![CDATA[VMware]]></category> <category><![CDATA[bug]]></category> <category><![CDATA[connectivity]]></category> <category><![CDATA[esxi 5]]></category> <category><![CDATA[networking]]></category> <category><![CDATA[troubleshooting]]></category> <category><![CDATA[vDS]]></category> <category><![CDATA[vsphere 5]]></category> <category><![CDATA[vswitch]]></category><guid
isPermaLink="false">http://vmtoday.com/?p=873</guid> <description><![CDATA[On Wednesday, I wrote about a VMware vSphere 5 networking bug that caused issues with iSCSI networking.  That bug, described in VMware KB 2008144 caused vmk traffic to be sent over the unused vmnic uplink in a team where there is an unused uplink and an explicit failover policy present.  See the diagram below to [...]]]></description> <content:encoded><![CDATA[<p></p><p>On Wednesday, I wrote about <a
href="../2012/02/vsphere-5-networking-bug-affects-software-iscsi/">a VMware vSphere 5 networking bug that caused issues with iSCSI networking</a>.  That bug, described in <a
href="http://kb.vmware.com/selfservice/microsites/search.do?language=en_US&amp;cmd=displayKC&amp;externalId=2008144">VMware KB 2008144</a> caused vmk traffic to be sent over the unused <strong><span
style="text-decoration: underline;">vmnic</span></strong> uplink in a team where there is an unused uplink and an explicit failover policy present.  See the diagram below to better understand what was going on there….</p><p><a
href="http://cloudfront.vmtoday.com/wp-content/uploads/2012/02/bug1-diagram.jpg" rel="lightbox[873]"><img
class="aligncenter size-full wp-image-874" title="Incorrect NIC failback occurs when an unused uplink is present" src="http://cloudfront.vmtoday.com/wp-content/uploads/2012/02/bug1-diagram.jpg" alt="Incorrect NIC failback occurs when an unused uplink is present" width="559" height="473" /></a></p><p>The second bug vSphere 5 networking bug I experienced was similar to the first: traffic was sent out of an unexpected interface after upgrading to ESXi 5.  This particular bug surfaced while troubleshooting my iSCSI bug (because why not have two unrelated bugs at the same time).   Many of the troubleshooting steps I used in the first networking bug were employed on this, so I won’t bore you again with the details.  I will, however, give you a quick overview of the network setup that this issue appeared in.</p><p><strong><span
style="text-decoration: underline;">Configuration</span></strong></p><p>Here’s layer 1 connectivity for ESXi host vmnics to the switching stack.</p><p><a
href="http://cloudfront.vmtoday.com/wp-content/uploads/2012/02/layer11.jpg" rel="lightbox[873]"><img
class="aligncenter size-full wp-image-877" title="Layer 1 Connectivity for ESXi" src="http://cloudfront.vmtoday.com/wp-content/uploads/2012/02/layer11.jpg" alt="" width="513" height="245" /></a></p><p>&nbsp;</p><p>Here’s the ESXi network config:</p><p><a
href="http://cloudfront.vmtoday.com/wp-content/uploads/2012/02/network_config_bug2.jpg" rel="lightbox[873]"><img
class="aligncenter size-full wp-image-876" title="ESXi Networking Configuration" src="http://cloudfront.vmtoday.com/wp-content/uploads/2012/02/network_config_bug2.jpg" alt="ESXi Networking Configuration" width="409" height="483" /></a></p><p>The specific portion of the configuration that was impacted by this bug was vSwitch0, which contained vmnic0 &amp; vmnic1, my Management Network vmknic and vMotion vmk port group.  The Management and vMotion port groups had a manually set failover order as pictured below:</p><p><a
href="http://cloudfront.vmtoday.com/wp-content/uploads/2012/02/mgmt_vmotion_config.jpg" rel="lightbox[873]"><img
class="aligncenter size-full wp-image-878" title="Management and vMotion Network Config on Same vSwitch" src="http://cloudfront.vmtoday.com/wp-content/uploads/2012/02/mgmt_vmotion_config.jpg" alt="Management and vMotion Network Config on Same vSwitch" width="581" height="512" /></a></p><p>This is all pretty standard network configuration for a VMware ESXi host with 6 physical network adapters, and follows best practice for management network redundancy for VMware HA (I highly recommend reading more on HA best practices in Duncan Epping and Frank Dennemon’s <a
href="http://www.amazon.com/gp/product/1463658133/ref=as_li_ss_tl?ie=UTF8&amp;tag=vm09-20&amp;linkCode=as2&amp;camp=1789&amp;creative=390957&amp;creativeASIN=1463658133">VMware vSphere 5 Clustering Technical Deepdive</a><img
class=" fkrbksoczpawgdussrdz fkrbksoczpawgdussrdz fkrbksoczpawgdussrdz fkrbksoczpawgdussrdz fkrbksoczpawgdussrdz fkrbksoczpawgdussrdz" style="border: none !important; margin: 0px !important;" src="http://www.assoc-amazon.com/e/ir?t=vm09-20&amp;l=as2&amp;o=1&amp;a=1463658133" alt="" width="1" height="1" border="0" /> book).<br
/> <a
href="http://www.amazon.com/gp/product/1463658133/ref=as_li_ss_il?ie=UTF8&amp;tag=vm09-20&amp;linkCode=as2&amp;camp=1789&amp;creative=390957&amp;creativeASIN=1463658133"><img
class=" fkrbksoczpawgdussrdz fkrbksoczpawgdussrdz fkrbksoczpawgdussrdz fkrbksoczpawgdussrdz fkrbksoczpawgdussrdz fkrbksoczpawgdussrdz" src="http://ws.assoc-amazon.com/widgets/q?_encoding=UTF8&amp;Format=_SL110_&amp;ASIN=1463658133&amp;MarketPlace=US&amp;ID=AsinImage&amp;WS=1&amp;tag=vm09-20&amp;ServiceVersion=20070822" alt="" border="0" /></a><img
class=" fkrbksoczpawgdussrdz fkrbksoczpawgdussrdz fkrbksoczpawgdussrdz fkrbksoczpawgdussrdz fkrbksoczpawgdussrdz fkrbksoczpawgdussrdz" style="border: none !important; margin: 0px !important;" src="http://www.assoc-amazon.com/e/ir?t=vm09-20&amp;l=as2&amp;o=1&amp;a=1463658133" alt="" width="1" height="1" border="0" /></p><p>&nbsp;</p><p><strong><span
style="text-decoration: underline;">The Problem</span></strong></p><p>The problems that were manifested as a result of the bug were:</p><ul><li>ESXi hosts would intermittently fall out of manageability by vCenter, the vSphere Client, and SSH (which was enabled from the console of the hosts).  Management connectivity could be restored (most times) by restarting the ESXi Management Network from the console.  I could usually ping the management network IP address even though the host was not manageable.</li><li>ESXi syslogs stopped being sent to the vCenter Syslog collector.</li><li>vMotion between hosts in the cluster intermittently worked.  vMotion success was not always in sync with management network connectivity.  vMotion capabilities could be restored by restarting the ESXi Management Network from the console.</li><li>As an added bonus, VMware High Availability (HA) would sometimes detect host failures and restart VM’s on the surviving HA nodes.</li></ul><p>Notice my use of ‘intermittently, usually, and sometimes’ – this made for tough troubleshooting.  If you’re gonna fail, fail big.  None of this wimpy on-again, off-again nonsense.</p><p><strong><span
style="text-decoration: underline;">The Resolution</span></strong></p><p>Luckily, I had VMware support on the phone as this problem appeared.  The support tech seemed to know just what the problem was:</p><p><em>A known issue on ESXi 5 occurs when two or more vmkernel NIC’s (vmknic) are on the same <strong>standard</strong> vSwitch.  Under this configuration traffic may be sent out the incorrect vmknic.</em></p><p>As far as I am aware, there is no VMware Knowledgebase article for this issue yet (comment if you know of one), so details are based on my own conversation with the support engineers working the case.  From what I was able to infer, this bug appears:</p><ul><li>More often when ESXi hosts are under stress (my iSCSI involving network bug really stressed out the hosts – and me &#8211; when all paths were down)</li><li>Seems to happen more on Broadcom NIC’s then Intel</li><li>Triggered and/or fixed by a network up/down event (such as restarting the management network on the host).</li><li>Does NOT happen with a Distributed Virtual Switch.</li><li>Is scheduled to be patched with or after vSphere 5 Update 1.</li></ul><p>Whereas my iSCSI bug involved a vmnic team with an unused uplink, and traffic being sent out <strong>the wrong vmnic</strong>, this second bug occurred with two vmnic’s (one active, one standby), two vmk ports on a standard switch, and traffic being sent out <strong>the wrong vmk port, which happened to have a different active vmnic than the correct vmk port had</strong>.<strong>  Here’s a diagram of the traffic flow gone wrong:</strong></p><p><a
href="http://cloudfront.vmtoday.com/wp-content/uploads/2012/02/Wrong_mgmt_vmotion_config.jpg" rel="lightbox[873]"><img
class="aligncenter size-full wp-image-882" title="Wrong_mgmt_vmotion_config" src="http://cloudfront.vmtoday.com/wp-content/uploads/2012/02/Wrong_mgmt_vmotion_config.jpg" alt="Network Traffic Uses the wrong VMkernel port on ESXi 5" width="581" height="512" /></a></p><p>I still find it a bit odd that ICMP traffic continued to flow to the interface, but that the Management traffic took an alternate route out and landed on my non-routed vMotion VLAN (and different subnet).</p><p><strong><span
style="text-decoration: underline;">Workaround</span></strong></p><p>The workaround for this bug is simple – remove the second NIC and second VMkernel Port (vMotion for me) from the vSwitch and restart the ESXi Management Network.  Once this was done, management traffic flowed normally.</p><p>I then created a new vSwitch, attached the second vmnic to it, and then re-created the VMkernel port for vMotion.</p><p>While the work-around was great for getting my hosts back into manageability, it was not so great for the redundant architecture I had originally implemented.  After splitting the VMkernel ports onto two different vSwitches, I received warnings in vCenter that “<code>Host  currently has no management network redundancy.</code>”  <a
href="http://kb.vmware.com/selfservice/microsites/search.do?language=en_US&amp;cmd=displayKC&amp;externalId=1004700">KB 1004700</a> addresses this message if you are looking for more info on it.  I could disable the warning, but that would be like slapping a fresh coat of paint on a jalopy.</p><p><strong><span
style="text-decoration: underline;">Architecture Changes</span></strong></p><p>The workaround for this bug kills redundancy.  Simply adding another two physical NIC’s and, in my case, binding one to the Management vSwitch and one to the vMotion vSwitch.  This change would require host downtime to install new hardware if your host only had 6 NIC’s like this environment did.</p><p>Alternatively, you could <a
href="http://kb.vmware.com/selfservice/microsites/search.do?cmd=displayKC&amp;docType=kc&amp;externalId=1010614&amp;sliceId=1&amp;docTypeID=DT_KB_1_1&amp;dialogID=287196402&amp;stateId=1%200%20287198946">migrate your Management network and vMotion networks to a virtual Distributed Switch</a> (vDS) as this bug does not appear to impact vDS – only standard vSwitches.  Side note: Check <a
href="http://www.yellow-bricks.com/2012/02/08/distributed-vswitches-and-vcenter-outage-whats-the-deal/">Duncan Epping’s post on using a virtual vCenter server connected to a vDS</a> if that’s holding you back from going to a vDS.  Also read the new <a
href="http://www.vmware.com/resources/techresources/10250">vDS Best Practices whitepaper</a> from VMware.</p><p><strong><span
style="text-decoration: underline;">Final Note</span></strong></p><p>This bug could impact more configurations than the one I highlighted.  For example, I could see it causing issues with <a
href="http://kb.vmware.com/selfservice/microsites/search.do?cmd=displayKC&amp;docType=kc&amp;externalId=2007467&amp;sliceId=1&amp;docTypeID=DT_KB_1_1&amp;dialogID=287196402&amp;stateId=1%200%20287198946">Multiple-NIC vMotion in Sphere 5</a>.</p><p>Drop a comment if you have experienced this bug, know of a KB article, or can think of any other ways it might be manifested.</p> ]]></content:encoded> <wfw:commentRss>http://vmtoday.com/2012/02/vsphere-5-networking-bug-2-affects-management-network-connectivity/feed/</wfw:commentRss> <slash:comments>9</slash:comments> </item> <item><title>vSphere 5 Networking Bug Affects Software iSCSI</title><link>http://vmtoday.com/2012/02/vsphere-5-networking-bug-affects-software-iscsi/?utm_source=rss&#038;utm_medium=rss&#038;utm_campaign=vsphere-5-networking-bug-affects-software-iscsi</link> <comments>http://vmtoday.com/2012/02/vsphere-5-networking-bug-affects-software-iscsi/#comments</comments> <pubDate>Wed, 08 Feb 2012 20:33:54 +0000</pubDate> <dc:creator>Joshua Townsend</dc:creator> <category><![CDATA[Issues & Troubleshooting]]></category> <category><![CDATA[VMware]]></category> <category><![CDATA[VMware How To]]></category> <category><![CDATA[esxi 5]]></category> <category><![CDATA[iscsi]]></category> <category><![CDATA[network]]></category> <category><![CDATA[networking]]></category> <category><![CDATA[NIC]]></category> <category><![CDATA[troubleshooting]]></category> <category><![CDATA[vDS]]></category> <category><![CDATA[vmknic]]></category> <category><![CDATA[vmnic]]></category> <category><![CDATA[vsphere 5]]></category> <category><![CDATA[vswitch]]></category><guid
isPermaLink="false">http://vmtoday.com/?p=854</guid> <description><![CDATA[Update: This issue was fixed as of 3/15/2012 in ESXi 5.0 Update 1 per the original knowledge base article: VMware KB 2008144. To download ESXi 5.0/vCenter Server 5.0 Update 1, see the VMware Download Center.  &#160; I recently stumbled on two vSphere 5 ESXi networking bugs that I thought I would share. The issues are very [...]]]></description> <content:encoded><![CDATA[<p></p><p><strong>Update: This issue was fixed as of 3/15/2012 in ESXi 5.0 Update 1 per the original knowledge base article: <a
href="http://kb.vmware.com/selfservice/microsites/search.do?language=en_US&amp;cmd=displayKC&amp;externalId=2008144">VMware KB 2008144</a>. To download ESXi 5.0/vCenter Server 5.0 Update 1, see the <a
href="http://downloads.vmware.com/d/info/datacenter_cloud_infrastructure/vmware_vsphere/5_0" target="_blank">VMware Download Center</a>. </strong></p><p>&nbsp;</p><p>I recently stumbled on two vSphere 5 ESXi networking bugs that I thought I would share. The issues are very similar from a cursory level, but have different symptoms, troubleshooting steps, and implications for your architecture, so I’m going to split the issues into two separate posts. Because troubleshooting these issues was a real pain, I’ll provide some details on how to identify these issues in your environments and wrap up with a third post on what I believe to be some best practices to avoid these same problems and achieve greater redundancy and resiliency in your vSphere environments.</p><p><strong><span
style="text-decoration: underline;">The Problem</span></strong></p><p>Today, we’ll look at an ESXi 5 networking issue that caused massive iSCSI latency, lost iSCSI sessions, and lost network connectivity. I’ve been able to reproduce this issue in several environments, on different hardware configurations. Here’s the background information on how all this started: I upgraded an ESXi 4.1 host to ESXi 5 using vSphere Update Manager (VUM). Note that I did use the host upgrade image that contained the <a
href="http://kb.vmware.com/selfservice/microsites/search.do?language=en_US&amp;cmd=displayKC&amp;externalId=2007108">ESXi500-201109001 iSCSI fixes</a> – if you are upgrading to vSphere 5 and have iSCSI in your environment, use this image. Here’s a quick look at how the networking was configured on this host:</p><p>The iSCSI networking was configured in a very typical setup, and per best practices, as outline in <a
href="http://pubs.vmware.com/vsphere-50/topic/com.vmware.vsphere.storage.doc_50/GUID-8AE88758-20C1-4873-99C7-181EF9ACFA70.html">VMware’s documentation</a>, as well as from many vendors (see EMC’s Chad Sakac’s ‘<a
href="http://virtualgeek.typepad.com/virtual_geek/2009/09/a-multivendor-post-on-using-iscsi-with-vmware-vsphere.html">A Multivendor Post on using iSCSI with VMware vSphere</a>’), with two vmnic uplinks, two vmknics, with one active adapter on the correct layer-2/layer-3 network, and the other unused.</p><p><a
href="http://cloudfront.vmtoday.com/wp-content/uploads/2012/02/iSCSI1-config1.jpg" rel="lightbox[854]"><img
class="aligncenter size-full wp-image-864" title="vSwitch iSCSI vmknic override failover order with unused NIC" src="http://cloudfront.vmtoday.com/wp-content/uploads/2012/02/iSCSI1-config1.jpg" alt="vSwitch iSCSI vmknic override failover order with unused NIC" width="533" height="602" /></a><a
href="http://cloudfront.vmtoday.com/wp-content/uploads/2012/02/iscsi2-config1.jpg" rel="lightbox[854]"><img
class="aligncenter size-full wp-image-863" title="vSwitch iSCSI vmknic override failover order with unused NIC" src="http://cloudfront.vmtoday.com/wp-content/uploads/2012/02/iscsi2-config1.jpg" alt="vSwitch iSCSI vmknic override failover order with unused NIC" width="533" height="602" /></a></p><p>After the upgrade, the standard vSwitch with two vmnics for uplinks (Broadcom NetXtreme II BCM5709 1000Base-T) and two vmknics that serviced the software iSCSI adapter failed to pass traffic (vmkping to the iSCSI targets failed) and could not mount ANY iSCSI LUN&#8217;s. VM network, management, and vMotion ports were not affected.</p><p>If I let the host sit long enough, it *might* find a couple paths to the storage, but even then performance was deteriorated per the vmkernel.log:</p><pre>WARNING: ScsiDeviceIO: 1218: Device naa.60026b90003dcebb000003ca4af95792 performance has deteriorated. I/O latency increased from average value of 5619 microseconds to 495292 microseconds.</pre><p><strong><span
style="text-decoration: underline;">Troubleshooting</span></strong></p><p>I’m going to dump a whole bunch of my troubleshooting steps on you – hopefully they not only help folks dealing with this particular bug, but help with general network and configuration troubleshooting in VMware vSphere. During troubleshooting, I removed the vmk binding for these two on the iSCSI adapter, removed the software iSCSI Adapter itself, removed the vmknics on the vSwitch, and removed the vSwitch itself. I then recreated the vSwitch, set vSwitch MTU to 9000, recreated two vmk ports, set 9000MTU, assigned IP, and set failover order for multipath iSCSI. I then re-created the software iSCSI adapter and bound the two vmk ports. I was able to pass vmk traffic and mount iSCSI LUN&#8217;s. Great – problem solved!?!?! Not so much &#8211; I rebooted the host and the problem returned.</p><p>Here are my next troubleshooting steps:</p><ul><li>I repeated the procedure above and re-gained connectivity, but the problem returns on subsequent reboots. I can verifiably recreate the problem.</li><li>I verified end-to-end connectivity for other hosts on the same Layer 1, Layer 2, and Layer 3 network as the iSCSI initiator and iSCSI targets.</li><li>I verified the ESXi host’s networking configuration using the vSphere client, double-checking the vSwitch, vmnic uplinks, and vmknic configurations. Everything looked good so I canceled out.</li><li>I then reinstalled ESXi from scratch (maybe something was left over from 4.1 that a clean install would weed out), built up the same configuration, and was again able to re-create the problem.</li><li>I poured over logs (vmkernel.log, syslog.log and storagerm.log primarily). I could see an intermittent loss of storage connectivity, failure to log into the storage targets (duh – there is no connectivity, no vmkping) and high storage latency on hosts where I had rebuilt the iSCSI stack and run a few VM’s.</li><li>I switched out the Broadcom NIC for an Intel NIC (the Broadcom had hardware iSCSI capabilities – I wanted to be sure the hardware iSCSI was not interfering).</li><li>I verified TOE was enabled.</li></ul><p><strong><span
style="text-decoration: underline;">The ‘Ah-Ha’ Moment</span></strong></p><p>Next, I verified the ESXi host’s networking configuration using the vSphere client one more time – the properties of the vSwitch, the properties of the vmkernel (vmk) ports, the manual NIC teaming overrides, IP addressing, etc. Everything looked correct – I MADE NO CHANGES – but when I clicked <strong><span
style="text-decoration: underline;">OK</span></strong> (last time I canceled) to close the vSwitch properties and was greeted with this warning:</p><p><a
href="http://cloudfront.vmtoday.com/wp-content/uploads/2012/02/changing-an-iscsi-initiator-port-group-warning.jpg" rel="lightbox[854]"><img
class="aligncenter size-full wp-image-855" title="changing an iscsi initiator port group warning" src="http://cloudfront.vmtoday.com/wp-content/uploads/2012/02/changing-an-iscsi-initiator-port-group-warning.jpg" alt="changing an iscsi initiator port group warning" width="480" height="214" /></a></p><p>Wait a second… I didn’t change anything, why am I being prompted with a you’re ‘Changing an iSCSI Initiator Port Group’ warning? I like to live dangerously, and wanted to see what would happen, so I said ‘Yes’.</p><p>Much to my surprise, after only viewing and closing the vSwitch and iSCSI vmk port group settings, I was able to complete a vmkping on the iSCSI-bound vmk’s. And moreover, I completed a Rescan of all storage adapters and my iSCSI LUN’s were found, mounted, and ready for use. Problem solved? Nope. The same ugly issue re-appeared after a reboot.</p><p>While the problem wasn’t solved, I now had something to work with. My go-to troubleshooting question “What Changed?” could maybe be answered. Even though I didn’t change anything in the vSwitch Properties GUI, something changed. To see what changed in the background, I compared the output of the following ESXi Shell (or vCLI, or PowerCLI) commands before and after making ‘the change’ happen (by viewing the properties of the vSwitch/vmk ports), but found no changes.</p><ul><li>esxcfg-vswitch -l</li><li>esxcfg-vmknic -l</li><li>esxcfg-nics -l</li></ul><p>Then, I made backup copy of esx.conf</p><pre> cp /etc/vmware/esx.conf /etc/vmware/esx.conf.bak</pre><p>Then I caused ‘the change’ and then compared checksums using md5sum, but found no differences:</p><pre> md5sum /etc/vmware/esx.conf /etc/vmware/esx.conf.bak</pre><p>I compared the running .conf and the backup .conf, but found no differences:</p><pre> diff /etc/vmware/esx.conf /etc/vmware/esx.conf.bak</pre><p><strong><span
style="text-decoration: underline;">Call in Air Support</span></strong><br
/> At this point, I was out of ideas so I called for help: “Hello, 1-866-4VMWARE, option 4, option 2 – help!”</p><p>After repeating many of the same troubleshooting steps, the support engineer decided that I had hit on a known, and not yet patched, bug. The details of the bug are included in <a
title="Incorrect NIC failback occurs when an unused uplink is present" href="http://kb.vmware.com/selfservice/microsites/search.do?language=en_US&amp;cmd=displayKC&amp;externalId=2008144" target="_blank">KB 2008144: Incorrect NIC failback occurs when an unused uplink is present</a>. That’s right – my iSCSI traffic, vmkpings, etc were being sent down the wrong NIC – the <em>UNUSED</em> NIC. Ouch. The bug caused the networking stack to behave in a very unpredictable way, making my troubleshooting steps next to useless, and any other advanced troubleshooting ideas I had (sniffing, logs, etc.)</p><p>Once I knew what the issue was, I could see a bit of evidence in the logs:</p><pre>WARNING: VMW_SATP_LSI: satp_lsi_pathIsUsingPreferredController:714:Failed to get volume access control data for path "vmhba33:C0:T0:L4": No connection

NMP: nmp_DeviceUpdatePathStates:547: Activated path "<span style="color: #ff0000;">NULL</span>" for NMP device "naa.60026b90003dcebb0000c7454d5cc946".

WARNING: ScsiPath: 3576: Path vmhba33:C0:T0:L4 is being removed</pre><p>Notice the <span
style="color: #ff0000;">NULL</span> path – the path can’t be interpreted correctly when being sent down the wrong (unsued) vmnic that is on a different subnet and VLAN. The gotcha on this issue is that I had followed best practices where applicable, and accepted default settings on the vSwitch and vmknics.</p><p><strong><span
style="text-decoration: underline;">The Quick Fix</span></strong><br
/> <a
title="Incorrect NIC failback occurs when an unused uplink is present" href="http://kb.vmware.com/selfservice/microsites/search.do?language=en_US&amp;cmd=displayKC&amp;externalId=2008144" target="_blank">VMware KB 2008144</a> offers two workaround for this bug. The quick fix for the problem is to simply change the Failback setting on either the vSwitch running the software iSCSI vmknic’s to “<strong>No</strong>” (default is yes), or to change the setting on the vmknic itself if you have other port groups on the vSwitch (such as a VM Network port group to give your guest VM’s access to the iSCSI network).</p><p><a
href="http://cloudfront.vmtoday.com/wp-content/uploads/2012/02/failback-No.jpg" rel="lightbox[854]"><img
class="aligncenter size-full wp-image-859" title="Change vSwitch or Portgroup Failback" src="http://cloudfront.vmtoday.com/wp-content/uploads/2012/02/failback-No.jpg" alt="Change vSwitch or Portgroup Failback" width="536" height="663" /></a></p><p>Changing Failback = No on the iSCSI vmknics and then rescanning the storage adapters fix the glitch immediately.</p><p><strong><span
style="text-decoration: underline;">Architecture Changes</span></strong><br
/> The second workaround from VMware is “Do not have any unused NICs present in the team.”. This translates to a slightly different architecture than that described in many documents. To achieve this workaround, the configuration would have to change to two vSwitches, each with a single vmnic uplink and a single vmk port, bound to the iSCSI adapter. This change does not impact redundancy or availability when compared with the single-vSwitch:two-vmk configuration that I was running with as one of the vmnics was set to unused anyway. This workaround does add a bit more complexity, as there are a few more elements to configure, monitor, manage, and document.</p><p>This problem seems to only present itself on vSphere Standard Switches (vSwitch), although I could not get confirmation of this (please post a comment if you know!). Assuming this is true, a vDistributed Switch (vDS) could be used for Software iSCSI traffic. Mike Foley has a write-up on how to migrate iSCSI from a vSwitch to a vDS on his blog here: <a
title="Dr. iSCSI or How I learned to stop worrying and love virtual distributed switches on vSphere V5" href="http://www.yelof.com/?p=72" target="_blank">http://www.yelof.com/?p=72</a>.</p><p><strong><span
style="text-decoration: underline;">A Couple More Notes</span></strong><br
/> My troubleshooting fix of viewing the vSwitch settings and clicking ok seemed to temporarily resolve the issues because it triggered an up/down event on the vmk of the unused uplink. This caused the network stack to re-evaluate paths and start using the correct, Active, uplink.</p><p>Note that this problem can occur outside of my iSCSI use case – any vSwitch, Port Group, or VMKNIC with an unused adapter set in the NIC Teaming Failover Order are susceptible to this bug, so watch for it on redundant vMotion networks (vMotion randomly fails), VM Network networks (sudden loss of guest connectivity), or even your management network (hosts fall out of manageability from vCenter, and can’t be contacted via SSH, vSphere client, etc.<br
/> Leave a comment if you’ve experienced this bug – your notes on the problem may help others find and fix the issue until VMware releases a fix. I understand that a fix for this particular bug is not due out until at least vSphere 5 Update 1.</p><p>I&#8217;ll have another (shorter) writeup on the 2nd networking bug I found in ESXi 5 later in the week &#8211; check back here for a link once it is published.</p> ]]></content:encoded> <wfw:commentRss>http://vmtoday.com/2012/02/vsphere-5-networking-bug-affects-software-iscsi/feed/</wfw:commentRss> <slash:comments>26</slash:comments> </item> <item><title>VMware Networking Demysified</title><link>http://vmtoday.com/2009/03/vmware-networking-demysified/?utm_source=rss&#038;utm_medium=rss&#038;utm_campaign=vmware-networking-demysified</link> <comments>http://vmtoday.com/2009/03/vmware-networking-demysified/#comments</comments> <pubDate>Fri, 20 Mar 2009 15:02:26 +0000</pubDate> <dc:creator>Joshua Townsend</dc:creator> <category><![CDATA[VMware]]></category> <category><![CDATA[VMware How To]]></category> <category><![CDATA[ESX]]></category> <category><![CDATA[esxi]]></category> <category><![CDATA[network]]></category> <category><![CDATA[networking]]></category> <category><![CDATA[pswitch]]></category> <category><![CDATA[switch]]></category> <category><![CDATA[virtual]]></category> <category><![CDATA[Virtual Machine]]></category> <category><![CDATA[virtualization]]></category> <category><![CDATA[vlan]]></category> <category><![CDATA[vswitch]]></category><guid
isPermaLink="false">http://vmtoday.com/?p=76</guid> <description><![CDATA[VMware vExpert and fellow Northern Virginian, Ken Cline, has posted an excellent article on his Ken&#8217;s Virtual Reality blog that aims to demystify VMware networking.  The article, the first in a new series by Ken, provides an overview of networking in an ESX/ESXi environment and breaks down the intricacies of the vSwitch and VLANs.  The [...]]]></description> <content:encoded><![CDATA[<p></p><p>VMware vExpert and fellow Northern Virginian, Ken Cline, has posted an excellent <a
title="The Great vSwitch Debate – Part 1" href="http://kensvirtualreality.wordpress.com/2009/03/17/the-great-vswitch-debate-%E2%80%93-part-1/" target="_blank">article</a> on his <a
title="Ken's Virtual Reality Blog" href="http://kensvirtualreality.wordpress.com" target="_blank">Ken&#8217;s Virtual Reality</a> blog that aims to demystify VMware networking.  The article, the first in a new series by Ken, provides an overview of networking in an ESX/ESXi environment and breaks down the intricacies of the vSwitch and VLANs.  The article comes complete with some nifty diagrams to help make sense of the topic. The timing of this article is great for me as it helps to frame my thoughts as I delve into the design of my latest VMware project on an IBM BladeCenter with IP SAN storage.</p><p>Great article, Ken!  I look forward to reading the rest of the series.</p> ]]></content:encoded> <wfw:commentRss>http://vmtoday.com/2009/03/vmware-networking-demysified/feed/</wfw:commentRss> <slash:comments>1</slash:comments> </item> </channel> </rss>
<!-- Performance optimized by W3 Total Cache. Learn more: http://www.w3-edge.com/wordpress-plugins/

Minified using disk: basic
Page Caching using disk: enhanced
Database Caching 12/40 queries in 0.067 seconds using disk: basic
Object Caching 785/829 objects using disk: basic
Content Delivery Network via Amazon Web Services: CloudFront: cloudfront.vmtoday.com

Served from: vmtoday.com @ 2012-05-21 20:27:21 -->
