network Archives - Page 2 of 2

Update 1: This issue was fixed as of 3/15/2012 in ESXi 5.0 Update 1 per the original knowledge base article: VMware KB 2008144. To download ESXi 5.0/vCenter Server 5.0 Update 1, see the VMware Download Center.

Update 2 (Nov 7 2012): While VMware says the issue was fixed in vSphere 5.0 Update 1, I have not found this to be the case. I’ve had several environments where the bug was still present, and in some cases this nasty bug still appeared when implementing the Failback = NO setting on the vSwitch. Additionally, setting Failback = No may cause some other problems – see my update here for information from Chris Towles.

Update 3 (Nov 8 2012): I received an email from Nick Eggleston about this issue – Nick experienced this problem at a customer site and heard from VMware support that “There is a vswitch failback fix which may be relevant and has not yet shipped in ESXi 5.0 (it postdates 5.0 Update 1 and is currently targeted for fixing in Update 2). However this fix *has* already shipped in the ESXi 5.1 GA release.” Thanks, Nick!

Update 4 (Nov 9 2012): Once I knew what this bug looked like, I’ve found other symptoms that can help you diagnose the issue in your environment. One of the most telling is that the switches (assuming Cisco) carrying your iSCSI traffic may show high Output Queue Drops (OQD) counters when you run a ‘Show Interface Summary’. This is because the packets sent out the wrong vmnic have no business being on the switch they were routed too, so the switch drops the packet. I’ve also noticed that when this bug is causing issues the MAC address table (show mac address table) loses all entries for the iSCSI vmknic’s and in some cases the storage array’s front end ports.

I recently stumbled on two vSphere 5 ESXi networking bugs that I thought I would share. The issues are very similar from a cursory level, but have different symptoms, troubleshooting steps, and implications for your architecture, so I’m going to split the issues into two separate posts. Because troubleshooting these issues was a real pain, I’ll provide some details on how to identify these issues in your environments and wrap up with a third post on what I believe to be some best practices to avoid these same problems and achieve greater redundancy and resiliency in your vSphere environments.

The Problem

Today, we’ll look at an ESXi 5 networking issue that caused massive iSCSI latency, lost iSCSI sessions, and lost network connectivity. I’ve been able to reproduce this issue in several environments, on different hardware configurations. Here’s the background information on how all this started: I upgraded an ESXi 4.1 host to ESXi 5 using vSphere Update Manager (VUM). Note that I did use the host upgrade image that contained the ESXi500-201109001 iSCSI fixes – if you are upgrading to vSphere 5 and have iSCSI in your environment, use this image. Here’s a quick look at how the networking was configured on this host:

The iSCSI networking was configured in a very typical setup, and per best practices, as outline in VMware’s documentation, as well as from many vendors (see EMC’s Chad Sakac’s ‘A Multivendor Post on using iSCSI with VMware vSphere’), with two vmnic uplinks, two vmknics, with one active adapter on the correct layer-2/layer-3 network, and the other unused.

After the upgrade, the standard vSwitch with two vmnics for uplinks (Broadcom NetXtreme II BCM5709 1000Base-T) and two vmknics that serviced the software iSCSI adapter failed to pass traffic (vmkping to the iSCSI targets failed) and could not mount ANY iSCSI LUN’s. VM network, management, and vMotion ports were not affected.

If I let the host sit long enough, it *might* find a couple paths to the storage, but even then performance was deteriorated per the vmkernel.log:

WARNING: ScsiDeviceIO: 1218: Device naa.60026b90003dcebb000003ca4af95792 performance has deteriorated. I/O latency increased from average value of 5619 microseconds to 495292 microseconds.

Troubleshooting

I’m going to dump a whole bunch of my troubleshooting steps on you – hopefully they not only help folks dealing with this particular bug, but help with general network and configuration troubleshooting in VMware vSphere. During troubleshooting, I removed the vmk binding for these two on the iSCSI adapter, removed the software iSCSI Adapter itself, removed the vmknics on the vSwitch, and removed the vSwitch itself. I then recreated the vSwitch, set vSwitch MTU to 9000, recreated two vmk ports, set 9000MTU, assigned IP, and set failover order for multipath iSCSI. I then re-created the software iSCSI adapter and bound the two vmk ports. I was able to pass vmk traffic and mount iSCSI LUN’s. Great – problem solved!?!?! Not so much – I rebooted the host and the problem returned.

Here are my next troubleshooting steps:

I repeated the procedure above and re-gained connectivity, but the problem returns on subsequent reboots. I can verifiably recreate the problem.
I verified end-to-end connectivity for other hosts on the same Layer 1, Layer 2, and Layer 3 network as the iSCSI initiator and iSCSI targets.
I verified the ESXi host’s networking configuration using the vSphere client, double-checking the vSwitch, vmnic uplinks, and vmknic configurations. Everything looked good so I canceled out.
I then reinstalled ESXi from scratch (maybe something was left over from 4.1 that a clean install would weed out), built up the same configuration, and was again able to re-create the problem.
I poured over logs (vmkernel.log, syslog.log and storagerm.log primarily). I could see an intermittent loss of storage connectivity, failure to log into the storage targets (duh – there is no connectivity, no vmkping) and high storage latency on hosts where I had rebuilt the iSCSI stack and run a few VM’s.
I switched out the Broadcom NIC for an Intel NIC (the Broadcom had hardware iSCSI capabilities – I wanted to be sure the hardware iSCSI was not interfering).
I verified TOE was enabled.

The ‘Ah-Ha’ Moment

Next, I verified the ESXi host’s networking configuration using the vSphere client one more time – the properties of the vSwitch, the properties of the vmkernel (vmk) ports, the manual NIC teaming overrides, IP addressing, etc. Everything looked correct – I MADE NO CHANGES – but when I clicked OK (last time I canceled) to close the vSwitch properties and was greeted with this warning:

Wait a second… I didn’t change anything, why am I being prompted with a you’re ‘Changing an iSCSI Initiator Port Group’ warning? I like to live dangerously, and wanted to see what would happen, so I said ‘Yes’.

Much to my surprise, after only viewing and closing the vSwitch and iSCSI vmk port group settings, I was able to complete a vmkping on the iSCSI-bound vmk’s. And moreover, I completed a Rescan of all storage adapters and my iSCSI LUN’s were found, mounted, and ready for use. Problem solved? Nope. The same ugly issue re-appeared after a reboot.

While the problem wasn’t solved, I now had something to work with. My go-to troubleshooting question “What Changed?” could maybe be answered. Even though I didn’t change anything in the vSwitch Properties GUI, something changed. To see what changed in the background, I compared the output of the following ESXi Shell (or vCLI, or PowerCLI) commands before and after making ‘the change’ happen (by viewing the properties of the vSwitch/vmk ports), but found no changes.

esxcfg-vswitch -l
esxcfg-vmknic -l
esxcfg-nics -l

Then, I made backup copy of esx.conf

 cp /etc/vmware/esx.conf /etc/vmware/esx.conf.bak

Then I caused ‘the change’ and then compared checksums using md5sum, but found no differences:

 md5sum /etc/vmware/esx.conf /etc/vmware/esx.conf.bak

I compared the running .conf and the backup .conf, but found no differences:

 diff /etc/vmware/esx.conf /etc/vmware/esx.conf.bak

Call in Air Support
At this point, I was out of ideas so I called for help: “Hello, 1-866-4VMWARE, option 4, option 2 – help!”

After repeating many of the same troubleshooting steps, the support engineer decided that I had hit on a known, and not yet patched, bug. The details of the bug are included in KB 2008144: Incorrect NIC failback occurs when an unused uplink is present. That’s right – my iSCSI traffic, vmkpings, etc were being sent down the wrong NIC – the UNUSED NIC. Ouch. The bug caused the networking stack to behave in a very unpredictable way, making my troubleshooting steps next to useless, and any other advanced troubleshooting ideas I had (sniffing, logs, etc.)

Once I knew what the issue was, I could see a bit of evidence in the logs:

WARNING: VMW_SATP_LSI: satp_lsi_pathIsUsingPreferredController:714:Failed to get volume access control data for path "vmhba33:C0:T0:L4": No connection

NMP: nmp_DeviceUpdatePathStates:547: Activated path "NULL" for NMP device "naa.60026b90003dcebb0000c7454d5cc946".

WARNING: ScsiPath: 3576: Path vmhba33:C0:T0:L4 is being removed

Notice the NULL path – the path can’t be interpreted correctly when being sent down the wrong (unsued) vmnic that is on a different subnet and VLAN. The gotcha on this issue is that I had followed best practices where applicable, and accepted default settings on the vSwitch and vmknics.

The Quick Fix
VMware KB 2008144 offers two workaround for this bug. The quick fix for the problem is to simply change the Failback setting on either the vSwitch running the software iSCSI vmknic’s to “No” (default is yes), or to change the setting on the vmknic itself if you have other port groups on the vSwitch (such as a VM Network port group to give your guest VM’s access to the iSCSI network).

Changing Failback = No on the iSCSI vmknics and then rescanning the storage adapters fix the glitch immediately.

Architecture Changes
The second workaround from VMware is “Do not have any unused NICs present in the team.”. This translates to a slightly different architecture than that described in many documents. To achieve this workaround, the configuration would have to change to two vSwitches, each with a single vmnic uplink and a single vmk port, bound to the iSCSI adapter. This change does not impact redundancy or availability when compared with the single-vSwitch:two-vmk configuration that I was running with as one of the vmnics was set to unused anyway. This workaround does add a bit more complexity, as there are a few more elements to configure, monitor, manage, and document.

This problem seems to only present itself on vSphere Standard Switches (vSwitch), although I could not get confirmation of this (please post a comment if you know!). Assuming this is true, a vDistributed Switch (vDS) could be used for Software iSCSI traffic. Mike Foley has a write-up on how to migrate iSCSI from a vSwitch to a vDS on his blog here: https://www.yelof.com/?p=72.

A Couple More Notes
My troubleshooting fix of viewing the vSwitch settings and clicking ok seemed to temporarily resolve the issues because it triggered an up/down event on the vmk of the unused uplink. This caused the network stack to re-evaluate paths and start using the correct, Active, uplink.

Note that this problem can occur outside of my iSCSI use case – any vSwitch, Port Group, or VMKNIC with an unused adapter set in the NIC Teaming Failover Order are susceptible to this bug, so watch for it on redundant vMotion networks (vMotion randomly fails), VM Network networks (sudden loss of guest connectivity), or even your management network (hosts fall out of manageability from vCenter, and can’t be contacted via SSH, vSphere client, etc.
Leave a comment if you’ve experienced this bug – your notes on the problem may help others find and fix the issue until VMware releases a fix. I understand that a fix for this particular bug is not due out until at least vSphere 5 Update 1.

I’ll have another (shorter) writeup on the 2nd networking bug I found in ESXi 5 later in the week – check back here for a link once it is published.

A reader named Mark contacted me today and asked if there was a way to reduce the size of the batch output from an ESXTOP run. And he asks for good reason: Depending on the number of VM’s on your host, the delay between ESXTOP samplings and the number of samples you collect, using the All Stats option (-a) can yield a massive file in a short period of time. If written to a partition on your ESX Service Console you run the risk of filling the partition, and forget about actually being able to analyze the data in PERFMON or Excel. For example, on an ESX host running ~15 VM’s I produced 100MB worth of CSV using the -a switch, sampling every 15 seconds, for just under 2 hours. ESXTOP uses 10-second intervals by default; I used -d 15 to change the sampling delay. Had I went with the default my output would have been bigger.

To reduce the size of your output, you can change your sampling delay to something larger, say 30-seconds. I suppose you could also capture statistics when the host is not busy so you get fewer characters in the results, but that’s just being goofy. 😉

A better way to reduce your ESXTOP output size is to selectively include only the statistics you are interested in, and is really what Mark was asking. After all, all statistics from ESXTOP can be too many statistics, and chances are you already know what stats you are interested in. Here’s how you can narrow down the collected stats for easier analysis and smaller output:

Enter ESXTOP in interactive mode on the Service Console by simply typing esxtop at the # prompt
Switch to a component you are NOT interested in capturing statistics on by pressing the corresponding menu option (c: ESX cpu, m: ESX memory, d: ESX disk adapter, u: ESX disk device, v: ESX disk VM).
Press f when viewing the component you do not want to capture. A list of fields will be displayed. You can toggle the fields on and off by pressing the letter corresponding to each field. An * indicates that the field is on. You want to turn off all of the fields you don’t want to collect.
Repeat steps 2 & 3 for the remaining components, leaving only what you want to capture.
Switch to the component you want to capture in batch mode and repeat step #3, except you will now enable what you want to capture.
Press W (capital W – case sensitive) to write out the ESXTOP configuration file. You can accept the default or create new configuration files. You may want to create a CPU-only config file, memory-only, and so forth.
Press CTRL+C to stop ESXTOP.
Now, invoke ESXTOP in batch mode, calling your updated or new configuration file you created in step #6 using the -c switch. Here’s an example:# esxtop -b -d 30 -n 480 -c .esxtopcpustats > /tmp/esxtop_cpu_stats.csv where .esxtopcpustats is an ESXTOP config file with only CPU stats. -d sets your capture interval to 30 seconds, and -n sets the number of samples to 480 (or 4 hours with a delay of 30 seconds).

Once your capture is complete you can replay the sampling in ESXTOP using replay mode (-R), or you can copy the .csv to a Windows system and use PERFMON or Excel to analyze the stats. If using PERFMON or Excel you will notice that the system summary information displayed at the top of an interactive ESXTOP session is included in the output (console memory, console cpu, etc.). As far as I know, there is no way to disable this, nor would you want to as it includes the time stamp necessary to interpret your data.

It is possible to use the vSphere CLI or the vSphere Management Assistant (vMA) to run RESXTOP, a version of ESXTOP designed for remote administration of ESXi or ESX. You may note, however, RESXTOP from the vSphere CLI only works from a Linux client. Using either of these tools will help you to automate ESXTOP statistics collection from multiple hosts using customized configuration files.