Archive for June, 2009
We picked up a few new 17″ MacBook Pro’s at work. We’re a Microsoft shop, so Mac’s aren’t part of the basic knowledge for our IT staff, myself included. I don’t want to be the Windows guy who says “I don’t do Macs” – part of being a technologist is serving the user base where they are at with the technologies they require to do their job (but please, included me in determining your requirements and technological solutions – a Mac might be really cool, but might not fit with the organizations needs or your IT group’s ability to support your solution). Really, that’s what Web 2.0 is all about – compatible, interchangeable tools that offer customized functionality for the users’ abilities and needs. Come to think of it, that’s what VMware is all about too – the right resources in the right place at the right time, independent of underlying hardware, application/OS agnostic, able to rise above local shortcomings by pushing to the cloud….
To be fair, I was issued a Mac at a previous company, but didn’t care much for it as the programs I had to run for my job were Windows based. I ran VMware Fusion, but it could only take me so far – funny things start to happen when you are in a VM, RDC’ing to a client server, opening the VI client and console’ing to a VM. Shortcut keys behave strangely, and one can only create so many alternate key mappings before going insane. It wasn’t the right tool for me and my job, but Macs do serve some purposes very well – graphic design and iPhone app development in my current case.
I didn’t have a requirement to do much customization the new Mac’s, but they did have to allow users to authenticate to the current Microsoft Windows Active Directory Domain. I hit a few snags as I went through the process, including making domain users local administrators and allowing domain users to log in to the Mac while off-line. Here is what I came up with for a final process in my environment – adjust according to your needs:
1.) Configure OS X to talk to the Active Directory
- Using Spotlight (LeftCommand+Space), open the ‘Directory Utility’
- Switch to the Services tab
- Tick the box next to Active Directory plug-in (Note: You may have to click the lock icon to make configuration changes).
- Highlight the Active Directory plug-in and click the Configure icon (pencil icon).
- Enter an Active Directory Domain, using the FQDN (example: mydomain.local)
- Enter a Computer ID. This ID will be used to create a computer object in the AD.
- Expand Advanced Options:
- On the User Experience Tab:
- Check the box for ‘Create mobile account at login’.
- Uncheck the box for ‘Require confirmation before creating a mobile account’.
- Choose the ‘Use UNC path from Active Directory to derive network home location’ if your AD is set to map a user’s home location to a UNC and/or DFS path; if not, you may want to uncheck this option.
- On the Administrative tab:
- Check the box for ‘Allow Administration By:” and then Add the Active Directory ‘domain admins’ and ‘enterprise admins’ group
- Check the box for ‘Allow Authentication from any domain in the forest’ if appropriate for your environment
- On the User Experience Tab:
- Click the Bind button and enter credentials for an account with permissions to join the domain on the Active Directory domain you are joining. Note: The computer account may appear in the default AD ‘Computers’ container even if the redircmp utility was used on the domain to change the default Organizational Unit (OU) of new computers joining the domain.
- Click OK.
- Verify that the Active Directory Domain that you configured correctly appears with a green dot on the ‘Directory Servers’ tab of the Directory Utility.
- Close the Directory Utility.
2.) Configure basic login options
- Open the Accounts tool from Apple | System Preferences | Accounts
- Click Login Options (Note: you may have to click the lock icon to allow changes to be made).
- Configure the Login Options settings as follows:
- Automatic Login: Disabled
- Display login windows as: Name and Password
- Check the box for Allow network users to login to this computer.
- Click the Options button and configure all network users (i.e. – all Domain users) or only select users to have login permissions.
- Configure other options as desired.
- Log out of the local Admin account
3.) Log in using a domain user account (with permissions to login to the server (see above) while connected to the network) using the AD user.name and password
- The first login may take several minutes to complete as a local account is being created.
- Open the Accounts tool from Apple | System Preferences | Accounts
- Highlight the logged-in user’s account.
- Check the box for ‘Allow user to administer this computer’ as appropriate
- Verify that the ‘Settings’ button for Mobile Account is grayed out – this means that an offline account has been created for the user.
4.) Test the config by removing network connectivity (disable AirPort and/or pull the network cable) and log in as the user you just configured.
5.) Buy VMware Fusion so you can run Windows on your Mac when all the stuff you were used to just ain’t there anymore
While setting up a new 17″ MacBook Pro today I found that an update to VMware Fusion has been released. The update, version 2.0.5, has a release date of June 23. According to the release notes, the update includes:
- Support for Mac OS X Server guest operating systems on Macs with Intel Xeon 5500 and 3500 Series processors (based on Nehalem micro-architecture)
- Provides experimental support for Mac OS X 10.6 Snow Leopard Server as a guest operating system (32-bit only)
- Provides experimental support for Mac OS X 10.6 Snow Leopard as a host operating system (32-bit only)
- Supports Ubuntu 9.04 as a guest operating system, including features such as VMware Tools pre-built modules and Easy Install
- Reduces CPU usage when a virtual machine is idle under VMware Fusion
- Contains fixes for more than 80 bugs
Download a trial version or update your purchased copy here: http://vmware.com/download/fusion/.
The VMworld 2009 Content Catalog was released on Friday night according to a post on the VMworld.com blog. I have only started to browse the many breakout sessions, instructor-led labs, self-paced labs, and panel sessions that are currently planned for VMworld 2009. A quick glance shows a wide variety of content for all technical levels across many tracks, including Business Continuity and Disaster Recovery, Desktop Virtualization, Enterprise Applications, Technology and Architecture, Virtualization 101 and Virtualization Management. It also appears that the catalog includes a nice mix of VI3 and vSphere content, as well as expanded desktop virtualization information.
Be sure to browse the catalog now so you are ready for session registration later in July. From past experience, knowing what session and labs you want to attend before registration, and then jumping right into registration once it is available, is the best way to ensure you get a slot in the hottest sessions.

Today is the last day for VMworld early-bird registration, and I got in just under the wire. Here’s to hoping that VMworld 2009 is as informative, fun and valuable as last years. I remember sitting in Vegas last year as the first of the big financial institutions began to fall. It will be interesting to see how the current economic condition impacts attendance, vendor quality, tee-shirt availability, and the VMworld party. Here’s to fun, free beer, and virtualization!
I’m also hoping that my wife will be able to join me and ‘geek out’ (her term) this year. If so, I’ll have to get her connected with the unofficial VMworld 2009 Spouse Activities!
I have been pulling my hair out with a small VI3 implementation running against an IBM DS3300 iSCSI array. Performance, for lack of a better term, sucked. Granted, the DS3300 is not an enterprise level workhorse of a storage system, but it fit the budget. Read performance was decent from the array, but write performance was terrible, maxing out at 10Mpbs throughput and insanely high latencies on long writes when the system was under load. This led to some long P2V operations, poor guest performance, and some questions from the project sponsors on why I couldn’t make the environment sing.
The system was configured with a single controller with dual GigE NIC’s. The controller had 512MB of battery backed cache (there is also a 1GB cache upgrade option available). I wrote off some of the poor performance to a single controller with a less-than-optimal amount of cache; blamed the SAS controller to SATA disk command translation overhead; cringed at the 6 disk RAID5 configuration; and engaged in some self doubting. I convinced the powers that be that we were IO constrained and got some funds to fill out the 3U chassis to a full 12 SATA disks, and reconfigured the array as a RAID10. Performance gains were almost unnoticeable with these changes. In addition, I did some basic troubleshooting of the network environment, verifying multiple paths to the storage, setting Flow Control on the switches to receive only, and double-checked my iSCSI initiator settings. Note: The DS3300 is only supported with the ESX software initiator. I found documentation on the DS3300 to be lacking, but did discover that the Dell MD3000i is based on the same LSI Engenio array. Some Googling on the Dell solution led to to the ‘SMcli’ command line interface for both arrays. The commands are slighly different for the Dell and IBM. The links to the IBM CLI documentation were broken, so I had to do a bit of trial and error to get the commands right. I used the Dell documentation as a starting point. (Rant: Seriously, IBM? Can you make your documentation any harder to get through – is it a Redbook, is it an Engineering Whitepaper, is it a support document, is it a case study – and why can I only find these with complex Google searches, not on your own product pages, and why can’t you name for documents intelligently, not with some random string of characters).
Moving on… I received an automated alert from the DS3300 about an incomplete battery learn cycle. Using the IBM Storage Manager GUI I generated a Storage Subsystem Profile’ from the Support tab to check the battery status. In the profile I discovered that while write cache was enabled, it had a status of “Enabled (Suspended)”. Ah ha! Now I’ve got some decent Google material that led me to this: http://communities.vmware.com/thread/195838. Hot damn I love the VMware Community Forums!
It turns out that in a single-controller configuration the setting for cache mirroring remains enabled by default. Because there is no 2nd controller to mirror to, the array suspends write caching. This is probably a safety thing – loss of high availability on the controllers puts data in cache at risk should the only controller fail. I weighed my options and decided that the poor performance I was experiencing beat HA concerns, so I enabled write cache on the array using this command:
c:\program files\ibm_ds4000\client>smcli -n <ARRAYNAME> -c “set allLogicalDrives mirrorEnabled=false;”
And then followed with this for good measure:
c:\program files\ibm_ds4000\client>smcli -n <ARRAYNAME> -p <arraypassword> -c “set allLogicalDrives writeCacheEnabled=true;”
The results were immediately noticeable:
The screen shot is from Veeam Monitor Free Edition, taken during 4 concurrent V2V operations from Hyper-V to VMware. With the write cache fully functional, disk usage peaked at 54MBps, latency dropped to about 6ms, and my blood pressure dropped a few notches.
While poking around the CLI I also found that you can dump performance stats from the array (performance is otherwise hard to find on the thing) using this command:
C:\Program Files\IBM_DS4000\client>smcli -n <ARRAYNAME> -c “set session performanceMonitorInterval=5 performanceMonitorIterations=120;save storageSubsystem performanceStats file=\”c:\\ds3300perfstats.csv\“;”
This will give you a 10 minute record of performance from the array which you can analyze using Excel. The Dell Enterprise Center TechCenter Wiki has a great write-up on how to efficiently analyze the data from this command here: http://www.delltechcenter.com/page/MD3000i+Performance+Monitoring, complete with a YouTube video that walks you through the process:
I am beginning to think that the DS3300 (and MD3000i) may actually be a viable starter solution for SMB’s starting out on a virtualization project. But I would recommend the cache upgrade, 2nd controller, SAS disks instead of SATA to eliminate the SAS-to-SATA translation overhead and more faster disks instead of fewer slower disks so you can drive throughput and IOPS to a higher level.
Have any of you deployed the DS3300 or MD3000i (or the generic LSI solution)? Do you have any performance tuning tips for these arrays? If so, share in the comments!
Here’s the scenario:
After performing maintenance on an ESX server (patches, storage re-scan, reboot), VMFS volumes are no longer visible, even though the hosting LUN can be seen on the Storage Adapters page of the ESX Configuration tab. Most VMware administrators will see this play out at some point; I saw it in one of my environments today and figured I should make a note of the steps required to correct the issue.
Typically, the root cause of the issue is a change on the storage array that causes the h(id) of the LUN(s) in question to change. This change could be anything from an array firmware update, LUN removal/recreation, or RAID/LUN reconfiguration. These changes could cause the h(id) of the LUN to be updated. When a rescan takes place on the ESX storage adapters (through a manual instantiation, reboot, etc.), the new h(id) is observed. Because it does not match the previously observed ID, the LUN is tagged as a snapshot LUN and access to that LUN is disabled.
Diagnosis of this problem is fairly easy. In addition to the behavior I have described, as observed through the Virtual Center Client, the problem can also be confirmed through the ESX command line.
To diagnosis this issue from the console, view the vmkernel log by issuing the following command: tail -f /var/log/vmkernel
You will see messages in the log similar to the following:
Jun 2 16:01:29 esx04 vmkernel: 0:00:31:14.543 cpu3:1039)ALERT: LVM: 4482: vml.0200020000600a0b80005add7800000a494a1d0be6313732362d33:1 may be snapshot: disabling access. See resignaturing section in SAN config guide.
Jun 2 16:01:29 esx04 vmkernel: 0:00:31:14.552 cpu3:1039)LVM: 5579: Device vml.0200010000600a0b80005add7800000a474a1d0bc8313732362d33:1 detected to be a snapshot:
Jun 2 16:01:29 tccesx04 vmkernel: 0:00:31:14.552 cpu3:1039)LVM: 5586: queried disk ID: <type 2, len 22, lun 1, devType 0, scsi 6, h(id) 5103533129706062046>
Jun 2 16:01:29 esx04 vmkernel: 0:00:31:14.552 cpu3:1039)LVM: 5593: on-disk disk ID: <type 2, len 22, lun 1, devType 0, scsi 6, h(id) 2153359415130143165>
After confirming that this is indeed the problem you are experiencing, stop and take a deep breath. The fix is easy, but you need to take steps before fixing it to prevent further damage. If you are lucky, the problem has only manifested itself on one ESX server (and hopefully that ESX was not hosting any VM’s because you put it into maintenance mode). Prevent your other ESX servers from rescanning storage – don’t reboot them, don’t manually rescan, don’t update them.
If the affected ESX server was hosting running VM’s, HA (if licensed and properly configured) should have kicked in if applicable and restarted the VM’s on another node in the ESX cluster.
If multiple ESX servers (or all of them) are affected, your VM’s are likely all powered off after hard stops, so there is not much you can do but to get on with fixing the issue and trust your backups (you do have backups, right?). This is where array-level snapshots come in handy. In my experience, most if not all VM’s recover after a hard stop like this, but don’t let that keep you from having a robust DR plan.
To correct the issue you must not have any running VM’s on the affected VMFS volumes to alternate volumes. Shut down the VM’s or use Storage VMotion to move running VM’s to alternate LUN’s.
In the VI Client, select the affected ESX host in the Hosts & Clusters view. Switch to the Configuration Tab. Click ‘Advanced Settings’ and then choose the LVM node. Change the LVM.DisallowSnapshotLun from the default setting of ‘1′ to ‘0′ and click OK. Next, rescan your storage from the ‘Configuration | Storage Adapters’ pane. Your missing VMFS volumes should re-appear. You’re doing fine, but not done yet.
Even if the other hosts that use the affected VMFS volume appear to be fine, they will most likely lose access to the volume once a rescan/reboot takes place. You need to perform the LVM.DisallowSnapshotLun = 0 setting change on all ESX servers connected to the volume, followed by a re-scan of your storage.
Once all affected ESX servers see the VMFS volumes, change the LVM.DisallowSnapshotLun setting back to the default of 1. Migrate back and/or power up VM’s on the volume and see what the damage is. If you are lucky, everything is good to go. If not, it’s a great time to check out those backups.
If you do not know what caused the storage change, check your ESX logs to try to determine if the server was rebooted or if storage was rescanned. This will give you an idea of when the change occurred – a starting point to work back from to find the root cause. Use this command to get started: less /var/log/vmksummary
Here are some suggestions on how to avoid this problem:
1.) Minimize changes to LUN’s once configured on an ESX.
2.) Coordinate Storage Maintenance with VMware maintenance windows.
3.) Have stand-by storage so you can Storage VMotion running VM’s off of the affected LUNS.
4.) Consider NFS, as NFS volumes are not impacted by resignaturing.
For more information on this problem, or to better understand the advanced settings changes involved, check out the VMware SAN Configuration Guide at http://www.vmware.com/pdf/vi3_35/esx_3/r35u2/vi3_35_25_u2_3_server_config.pdf, page 114, or the VMware iSCSI SAN Configuration Guide at http://www.vmware.com/pdf/vi3_35/esx_3/r35u2/vi3_35_25_u2_iscsi_san_cfg.pdf, page 117.


