vSphere Upgrade Breaks Active Directory

Beware of the Windows Ghost NICI recently completed a VMware VI 3.5 to vSphere upgrade in a small environment (5 hosts, 80 VM’s).  Being a small environment, the upgrade was planned for one big overnight blitz.  Unfortunately, the size of the environment did not afford a test environment to uncover potential issues before the upgrade.  The upgrade to vSphere itself went swimmingly (the vCenter server had been upgraded a couple weeks earlier).  However, some things in the environment started to go wonky once the upgrade was complete.  Specifically, name resolution (DNS), DHCP, WINS, Group Policy, and really anything Microsoft Active Directory related just did not work.

Let me explain a bit about the environment so you can better understand what the problem was and how it was corrected. The environment was an all Microsoft shop, except for VMware of course. The company follows a virtualize-first policy and is about 90% virtualized, including the Active Directory Domain Controllers. The DC’s are Windows 2008 and serve up DHCP, DNS, and WINS in addition to their Directory Services roles.

The problems really began after I upgraded the virtual hardware version from v4 to v7 (check out page 97 of the vSphere Upgrade Guide for the upgrade procedure).  When a Windows server is upgrade from VMware Hardware Version 4 to 7, the VMware Upgrade Helper Service handles the reconfiguration of network adapters on the upgraded virtual machine.  The VMware Upgrade Helper Service is installed with VMware Tools and is one of the reasons, along with getting drivers installed for the new hardware, for upgrading VMware Tools before upgrading the hardware version.  If you review the Event Viewer Application log on an upgraded machine you will see several entries from VMUpgradeHelper (Source) with several different Event ID’s (26, 280, 272, 108, & 105).  An examination of these events will show that the VMware Upgrade Helper service 1.) Backed up the network configuration at OS shutdown, 2.) Started Automatically with the OS, 3.) Checks the device ID for the network adapter, 4.) If the device ID has changed (as a result of a hardware upgrade), the backed up configuration is restored and Event ID 269 is logged.

This behavior should be transparent for most configurations, with the exception of a slightly longer boot time following the upgrade.  However, I did notice a few problems with the NIC settings being restored under certain conditions.  First, on servers with a statically configured IPv4 stack, IP addresses and DNS server addresses were restored, but the WINS server addresses were not restored.  I suspect this is an oversight in the VMware Upgrade Helper service, but is probably not a major issue for many servers/environments as WINS is infrequently used.  However, for a WINS server itself to lose its configuration to use itself as a WINS server, bad things happen.  There are several ways to correct this – scripts, DHCP Options, etc.  In the end, this wasn’t really a show stopper for me in this small environment.

The second, and bigger issue for me, was that after the virtual hardware was upgraded and the VMware Upgrade Helper Service did it’s job my Active Directory and related services were not available.  DNS was not functioning, DHCP was not handing out addresses, and I couldn’t connect to AD using ADUC, GPMC or LDAP.  It took me a few minutes to figure out what was going on.  This seems to be what happened: the virtual hardware upgrade caused a new virtual network adapter to be installed in the VM and all of the settings, including the MAC, address to be restored.  The HW v4 NIC was removed from the machine, but Windows held onto the device as a ‘ghost NIC’ in Device Manager.  The core AD services, including DNS and DHCP, were still attempting to bind to the ghost NIC.  This behavior persisted through service restarts and reboots of the guest.  It wasn’t until I examined the IP configuration on the new NIC and clicked Apply (instead of canceling out) that I was prompted with a message indicating that there was more than one network interface configured with the same IP address, queuing me into the solution.

The error message should be familiar to anyone who has performed a Physical-to-Virtual migration (P2V) and is easily corrected by removing the old device through Windows Device Manager.  The device is hidden so you first have to expose it before deleting it.  Check http://support.microsoft.com/kb/315539 for details or simply follow my instructions below.  To expose the non-present NIC, open a command prompt and enter:

set devmgr_show_nonpresent_devices=1

You can then open Device Manager (enter devmgmt.msc at the command prompt to save some time).  In Device Manager, click View | Show Hidden Devices.  Expand Network Adapters and find the grayed-out entry for the old NIC as pictured below.

GhostNIC

Select the ghost NIC and right-click | Uninstall to remove it.

The final gotcha for me on this is that the set devmgr_show_nonpresent_devices=1 command does not work on Windows 2008 (or Vista, Windows 7, or Windows 2008 R2).  To see and remove ghost NICs from Windows 2008, and environmental variable must be defined.  To set the variable, open Server Manager from the Windows Start Menu.  Highlight ‘Server Manager (%SERVERNAME%)’ in the left-side tree-view pane.  Click ‘Change System Properties’ in the right-hand pane.  Switch to the Advanced tab and click ‘Environment Variables.  Create a new System variable by clicking the New button.  The Variable name should be ‘devmgr_show_nonpresent_devices’ and the value should be ‘1’ as pictured below.

EnvVariable

Click OK to close out of any open Windows.  A reboot is not necessary for the variable to take effect, although you may have to close out of all open Device Manager Windows and then reopen devmgmt.msc.  Click View | Show Hidden Devices and remove the ghost NIC as described above.  A quick reboot after I removed the ghost NIC from the domain controllers and all Active Directory, DNS, DHCP, and WINS services immediately began operating normally.  This second issue is more of a Microsoft problem in my opinion, and has been around for some time.

Before you start getting all upset and the FUD starts flying (“this is Microsoft/VMware’s latest attempt to break VMware/Microsoft?”), it wasn’t really vSphere that broke Active Directory; It was me.  A little better planning and not rushing through the last wee hours of the upgrade Window could have saved some trouble.  If you are planning a similar upgrade, it would be best to upgrade your domain controllers/DNS servers one at a time and remediate the issues I have described before upgrading the next.  This will ensure continued availability of your Active Directory and other critical services during your upgrade.

Comments

  1. I had this exact same issue on a my vSphere upgrade. I ended up not changing to HW7 quite yet, mostly because the features are not that much needed right now and HW4 works fine.

    I noticed this issue seems to be more prevalent in W2K8 upgrades mostly because you need to add the new system value (which I wasnt aware of..thanks!!!).

    I will be bookmarking this for future reference when I change my HW to 7 … thanks!!!

  2. Does this also apply to 2003 DC’s?

    • Ricky,

      I have not specifically tested 2003 DC’s, but if my memory serves me correctly this is an issue you will have to plan for. Do one DC at a time, fix it, and then go onto the next.

      Josh

  3. I followed these instructions but DHCP with still not function, get an error when I start/Stop DHCP saying service is not running when it is. Checked to make sure it was bound to proper nic, removed/reinstalled service after reboot, still broken. Any suggestions?

  4. Thanks for the post – I upgraded a bunch of dev/test VMs over the weekend and ran into the WINS setting issue.

    Haven’t done any DCs yet, but I do have two that I’ll be upgrading over the coming weeks, so it’s good to see this before it bites me.

    I also ran into another hitch. On Server 2008 and 2008 R2 guests, my second virtual disk (for the E: drive) came up offline on the first bootup after the virtual hardware upgrade. This appears to be a default in 2008 when a new disk is detected with an existing volume, since I’ve had it happen with SAN LUNs before too. I had to manually set the disk online in Disk Management for the volume to be mounted.

  5. So does anyone know how to remove these “phantom” NICs from Windows 2008 Server Core? Can’t run device manager on it, and running it remotely lets you see the phantoms but you can’t remove them because it runs in read-only mode. Until I figure this out I can’t upgrade my domain controllers to hardware v7 because everything breaks 🙁

    • Nevermind I think I figured it out. Here’s the procedure in case you need it:

      1. Copy devcon.exe over to the server core server (I had to extract devcon.exe from SUPPORTTOOLSSUPPORT.CAB on a Windows 2003 R2 x64 disc).

      2. Run devcon.exe findall =net (this should list all NICs on the system, including the phantoms). This was my output:

      PCIVEN_15AD&DEV_0720&SUBSYS_072015AD&REV_104&B70F118&0&0088: VMware PCI Ethernet Adapter #2
      PCIVEN_15AD&DEV_0720&SUBSYS_072015AD&REV_103&18D45AA6&0&88: VMware PCI Ethernet Adapter
      PCIVEN_15AD&DEV_07B0&SUBSYS_07B015AD&REV_01FF565000EB16A3FE00: vmxnet3 Ethernet Adapter
      3 matching device(s) found.

      vmxnet3 was the active NIC and the others needed to be removed.

      3. devcon -r remove “@PCIVEN_15AD&DEV_0720&SUBSYS_072015AD&REV_103&18D45AA6&0&88” removed the first one. Repeated for the second unwanted NIC, rebooted the server for good measure and it’s serving up DNS again!

      What a pain.

Trackbacks

  1. […] This post was mentioned on Twitter by VMware Planet V12n, joshuatownsend. joshuatownsend said: New VMtoday.com post: vSphere Upgrade Breaks Active Directory http://cli.gs/PqzZQ #vmware […]

  2. […] recently posted an article on how specific actions during the upgrade of a VMware Virtual Machine’s hardware from v4 to […]

Drop a comment below:

%d bloggers like this: