virtualization

A vGuy in Real Life

August 25, 2013 by Josh Townsend 3 Comments

I’ve been reflecting a bit tonight on my vJourney so far as I sit in my hotel room before VMworld 2013 kicks off tomorrow. I just realized that I’m coming up on five years of blogging on VMtoday. It’s been a very cool experience seeing people around the world take an interest in what I have to say, and even more awesome getting to know my fellow vExperts and readers on a personal level. I’ve strived to keep it real and personal, relating to real VMware users, not tossing out some fluffy cloud B.S. that has no bearing on what the real admin experiences every day in their job. I’ve got a few things in mind to celebrate five years of blogging, but thought I might start with a little autobiographical info. I wrote this up for a vExpert spotlight last year but never got it published, so I thought I would drop it here. My personal story is still unfolding, but I hope that some of what I have learned over the years can help others who want to follow a similar path. So without further ado, here’s my story:

How did I get into IT?

It was September, 2001 and I had just moved to Washington, DC after graduating with a degree in Political Science and International Relations, intent on getting a government job, running for president, and changing the world. And I had a fancy diamond ring ready to put on the finger of my future wife (and all the debt that goes with an engagement ring). The next week the world changed with the 9/11 attacks and the subsequent anthrax attacks and the DC snipers. With these terrible events, the job market in DC for entry level positions in federal offices dried up. To top it all off, my finance ended up in the hospital for an extended stay with what would be the beginning of a 12 year medical mystery [Side note: turns out her brain leaks – a still undiagnosed connective tissue disease that causes the membrane around her brain and spinal cord to tear open and not close up. This lets her cerebrospinal fluid (CSF) leak out so her brain doesn’t float. Talk about a headache (and then she has to put up with me on top of it)!].

Without a job to pay for the ring, I was sitting by her hospital bed wondering if I could slip the ring off her finger and take it back while she slept, feeling like the world’s biggest loser. As I was sitting there in her hospital room my cell phone rang – it was my wife’s aunt. She told me that their company, NetCentrics – with contracts in the Pentagon and with commercial clients in the DC area, needed some help with a few small projects. The projects were simple desktop deployments – put the PC on the desk, connect cables, move on. But it paid. So I took on the projects and put everything I had into doing the job with excellence (with Quality, in the metaphysical sense for those who have read Xen and the Art of Motorcycle Maintenance) – neatly wrapping and wire tying all the cables, cleaning monitors before moving to the next station, making sure everything powered up and worked correctly. I guess they were impressed with my hard work and offered me a job despite having no IT experience. I continued to bust my butt to learn more, built my first home lab, earned my MCSE on Windows 2000, and constantly requested bigger and harder projects that challenged my abilities and forced me to grow.

I continued to push myself as I took on more complex engineering tasks, eventually taking on new projects and roles, exploring new technologies, and always striving for technical excellence and lasting quality in my work.
How did I get into working with VMware and end up being honored as a vExpert?

While at NetCentrics, the company developed the official David Allen Getting Things Done Outlook Add-in. I did beta testing on the initial release, in addition to my regular duties as a network engineer. We got a beta copy of VMware Workstation 3.0 – the first version of the product to really take off – to help in staging test environments with different versions of Windows and Office. I saw the potential and was immediately hooked on virtualization.

At the same time we started working with VMware Elastic Sky X 1.0 (ESX) and VMware Ground Storm X (GSX) servers, building out hosts to run test workloads and even some light production workloads. I knew as soon as I saw the technology in action that it would be a big driver of change – little did I know how big.

As I progressed in my career, I took on an IT Manager role at a VC-backed startup in 2005. Server growth was explosive – we needed running versions of every version of SAP software in unicode and non-unicode, Big Endian and Little Endian. We also had some Oracle, PeopleSoft, Hyperion, and JDE thrown in for good measure. My little server room was reaching critical mass and I needed a solution. I turned to VMware and EMC and built out an infrastructure with a virtualization-first policy, reaching 90+ percent virtualization for hundreds of servers over the course of a few months.

With that experience, I dug in deep on VMware technology, earning my VCP and attending VMworld conferences. I started blogging on VMware-related topics on https://vmtoday.com and engaging with the community though VMUG leadership and the emerging social media scene. These activities earned me the vExpert designation in 2010, 2011, 2012 and 2013. I spent three awesome years at the most family friendly, hard working, full of good people company I can imagine at Tiber Creek Consulting, where I had the privilege of supporting new custom big data and analytics applications in support of our nation’s war fighters in the National Guard and Army Reserve.

In 2012 I took on the role of Virtualization Practice Manager at Clearpath Solutions Group, allowing me to focus exclusively on VMware solutions, acting as an evangelist for virtualization technologies, an advocate for our virtualization customers, a mentor to our engineers and being responsible for delivering technical excellence in VMware projects. A dream job with an awesome company, focused on awesome technology!

What would advice do I have for someone who wanted to get a job like mine?

Personal investment required – build a home lab, immerse yourself in the awesome books, blogs, and user groups written and run by my fellow vExperts in the virtualization community. Small breaks may come (like the call from my wife’s aunt), but the onus is on you to use the opportunity to grow.

Take risks – I was criticized for suggesting production workloads on ESX 1.0 and told I was nuts for trying to virtualize hundreds of SAP servers on VMware in 2005. Those risks paid off big for my employers and for me personally.

Take pride in your work – I paid my way through college by putting roofs on houses during the summers. That job taught me tons about the value of hard work, team work, overcoming fear (although I’m still afraid of heights) and attention to detail. Most importantly, it taught me how to take pride in my work – it’s awesome to take my kids back to my home town in New York and say “see that roof? I built that 15 years ago, and it is still going strong, protecting that family’s house.” Knowing that you did something well and something that matters is key to job satisfaction (key to pretty much all happiness in life, really). Take that same sense of pride and ownership into your work – put your all into it and build for lasting quality. It will pay huge dividends for you personally and professionally!

Storage Basics – Part IV: Interface

January 26, 2010 by Josh Townsend 11 Comments

In parts I, II, and III of the Storage Basics series we looked at the basic building blocks of modern storage systems: hard disk drives. Specifically, we looked at the performance characteristics of disks in terms of IOPS and the impact of combining disks into RAID sets to improve performance and resiliency. Today we will have a quick look at another piece of the puzzle that impacts storage performance: the interface. The interface, for lack of a better term, can describe several things in a storage conversation. It can be let me break it down for you (remember, we’re keeping it simple here).

At the most basic level (assume a direct-attached setup), ‘interface’ can be used to describe the physical connections required to connect a hard drive to a system (motherboard/controller/array). The ‘interface’ extends beyond the disk itself, and includes the controller, cabling, and disk electronics necessary to facility communications between the processing unit and the storage device. Perhaps a better term for this would be ‘intra-connect’ as this is all relative to the storage bus. Common interfaces include IDE, SATA, SCSI, SAS, and FC. Before data reaches the disk platter (where it is bound by IOPS), it must pass through the interface. The standards bodies that define these interfaces go beyond the simple physical form factor; they also define the speed and capabilities of the interface, and this is where we find another measure of storage performance: throughput. The speed of the interface is the maximum sustained throughput (transfer speed) of the interface and is often measured in Gbps or MBps.

Here are the interface speeds for the most common storage interfaces:

Interface	Speed
IDE	100MBps or 133MBps
SATA	1.5Gbps, 3.0Gbps, 6.0Gbps
SCSI	160MBps (Ultra-160) and 320MBps (Ultra-320)
SAS	1.5Gbps, 3.0Gbps, 6.0Gbps
FC	1Gb, 2Gb, 4Gb, 8Gb or 16Gb (Duplex throughput rates are 200MBps, 400MBps, 800MBps and 1600MBps respectively)

If we take these speeds at face value, we see that a 320MBps SCSI and a 2Gbps FC are not too different. If you dig a bit deeper you will soon find that simple speed ratings are not the end of the story. For example, FC throughput can be impacted by the length and type of cable (fiber channel can use twisted pair copper in addition to fiber optic cables). Also, topologies can limit speeds – serial connected topologies are more efficient than parallel on the SCSI side, and arbitrated loops can incur a penalty on the FC side. The specifications of each interface type also define capabilities such as the protocol that can be used, the number of devices allowed on a bus, and the command set that can be used in communications on a storage system. For example, SATA native command queuing (NCQ) can offer a performance increase over parallel ATA’s tagged command queuing with other factors held constant. Because of this, you might also see some performance implications of connecting a SATA drive to a SAS backplane, as the SAS backplane translates SAS commands to SATA.

If we move away from the direct-connect model, and into a shared storage environment that you might use in a VMware-virtualized environment, the ‘interface’ takes on an additional meaning. You certainly still have the bus ‘interface’ that connects your disks to a backplane. Modern arrays typically use SAS or FC backplanes. If you have multiple disk enclosures, you also have an interface that connects each disk shelf to the controller/head/storage processor, or to an adjacent tray of disks. For example, EMC Clariion’s use a copper fiber channel cable in a switched fabric to connect disk enclosures to the back-end of the storage processors.

If we move to the front-end of the storage system, ‘interface’ describes the medium and protocol used by initiating systems (servers) when connecting to the target SAN. Typical front-end interface mediums on a SAN are Fiber Channel (FC) and Ethernet. Front-end FC interfaces come in the standard 2Gb, 4Gb, or 8Gb speeds, while Ethernet is 1Gbps or 10Gbps. Many storage arrays support multiple front-end ports which can be aggregated for increased bandwidth, or targeted by connecting systems using multi-pathing software for increased concurrency and failover.

Various protocols can be sent over these mediums. VMware currently supports Fiber Channel Protocol (FCP) on FC, and iSCSI and NFS on Ethernet. FC and iSCSI are block-based protocols that utilize encapsulated SCSI commands. NFS is a NAS protocol. Fiber Channel over Ethernet (FCoE) is also available on several storage arrays, sending FCP packets across Ethernet.

Determining which interface to use on both the front-end and back-end of your storage environment requires an understanding of your workload and your desired performance levels. A post on workload characterization is coming in this series, so I won’t get too deep now. I will, however, provide a few rules of thumb. First, capture performance statistics: using Windows Perfmon, look at Physical Disk|Disk Read Bytes/sec or Disk Write Bytes/sec), or check out stats in your vSphere Client if you are already virtualized.

If you require low latency, use fiber channel.
If your throughput is regularly over 60MBps, you should consider fiber channel connected hosts.
iSCSI or NFS are often a good fit for general VMware deployments.

There is a ton of guidance and performance numbers available when it comes to choosing the right interconnect for a VMWare deployment, and a ton of variables that impact performance. Start with this whitepaper from VMware: https://www.vmware.com/resources/techresources/10034. For follow up reading, check out Duncan Epping’s post with a link to a NetApp comparison of FC, iSCSI, and NFS: https://www.yellow-bricks.com/2010/01/07/fc-vs-nfs-vs-iscsi/. If you are going through a SAN purchase process, ask your vendor to assist you in collecting statistics for proper sizing of your environment. Storage vendors (and their resellers) have a few cool tools for collecting and analyzing statistics – don’t be afraid to ask questions on how they use those tools to recommend a configuration for you.

I’ve kept this series fairly simple. Next up in this series is a look at cache, controllers and coalescing. With the next post we’ll start to get a bit more complex and more specific to VMware and Tier 1 workloads, both virtual and physical. Thanks for reading!

Keep Reading:

Right-sizing Your Power and Cooling

January 21, 2010 by Josh Townsend 1 Comment

We all know that virtualization allows us to do more with less. Fewer servers and space-saving storage (talk about an oxymoron) help us put some green in the datacenter and back in the budget. But with tight budgets demanding greater efficiency, virtualization pushing per-U-space utilization higher, and increasingly rack-dense equipment, proper planning of your physical plant remains an essential part of IT. I argue that right-sizing your power, cooling, and floor-space is more critical now than it has ever been, and is a knowing how to do it is a darn good skill for a virtualization engineer to possess.

So along those lines… I was just doing some site-prep work for a new Clariion installation and noticed that the EMC Power Calculator has been updated. It is now a pretty slick little web app that can be found on the PowerLink site (login required) here: https://powerlink.emc.com/nsepn/webapps/powercalculator/Main.aspx.

While I am at it, here are some links to other power consumption calculators. Let me know if you have others and I will update this post:

Dell: https://www.dell.com/calc
IBM: https://www-03.ibm.com/systems/bladecenter/resources/powerconfig/index.html
NetApp: Storage Efficiency Calculator here – https://www.secalc.com – it doesn’t calculate your consumption, just what you might save over a competitor’s offering.
HP: https://h30099.www3.hp.com/configurator/powercalcs.asp
Sun: https://www.sun.com/solutions/eco_innovation/powercalculators.jsp
Hitachi/HDS: https://www.byhitachi.com/se/go/weight-and-power-calculator/
APC: https://www.apc.com/prod_docs/results.cfm?DocType=Trade-Off%20Tool&Query_Type=10 and https://www.apcc.com/products/runtime_for_extendedruntime.cfm?upsfamily=165
Emerson: Efficiency Calculator: https://www.emerson.com/edc/Calculator/default.aspx
VMware ROI Calculator: https://vmware.com/go/calculator
This site has a bunch of links to other calculators and resources: https://thegreenandvirtualdatacenter.com/calculator.html

There’s some fun and timely chatter happening right now on Twitter around power consumption and sizing – join in by following me at https://twitter.com/joshuatownsend/!

Storage Basics – Part III: RAID

January 6, 2010 by Josh Townsend 28 Comments

This is the third in a multi-part series on storage basics. I’ve had some good feedback from folks in the SMB space saying that the first couple posts in this series have been beneficial, so we’ll be sticking with some basic concepts for another post or two before we dive into some nitty-gritty details and practical applications of these concepts in a VMware environment. In the second post of this series I introduced the concept of IOPS and explained how the physical characteristics of a hard disk drive determine the theoretical IOPS capability of a disk. I then noted that you can aggregate disks to achieve a greater number of IOPS for a particular storage environment. Today, we will look at just how you combine multiple disks and the performance impact of doing so. Remember that we are keeping this simple; the concepts I present here may not apply to that fancy new SAN you just purchased with your end-of-year money or the cheap little SATA controller on your desktop’s motherboard (not that there’s anything wrong with it) – we’re more in the middle ground of direct attached storage (DAS) as we firm up concepts.

Enterprise servers and storage systems have the ability to combine multiple disks into a group using Redundant Array of Independent Disks (RAID) technology. We’ll assume a hardware RAID controller is responsible for configuring and driving storage IO to the connected disks. RAID controllers typically have battery-backed cache (we’ll talk cache in a future post), an interconnect where the drives plug in, such as SCSI or SAS (we’ll talk about these too in a future post), and hold the configuration of the RAID set including stripe size and RAID level. The controller also does the basic work of reading and writing on RAID set – mirroring, striping, and parity calculations. There are several different types of RAID level – rather than rehash the details of them, read the Wikipedia entry on RAID and then come back here….

Ok, great. So you now know that RAID is implemented to increase performance through the aggregation of multiple disks, and to increase reliability though mirroring and parity. Now let’s consider the performance implications of some basic RAID levels. As with many things in the IT industry, there are trade-offs: security vs. usability, brains vs. brawn, and now performance vs. reliability. As we increase reliability in a RAID array through mirroring and parity, performance can be impacted. This is where the more disks = more IOPS bit starts to fall apart. The exact impact depends on the RAID type. Here are some examples of how RAID impact the maximum theoretical IOPS using the most common RAID levels, where:

I = Total IOPS for Array (note that I show Read and Write separately)

i = IOPS per disk in array (based on spindle speed averages from Part II: IOPS)

n = Number of disks in array

r = Percentage of read IOPS (calculated from the Average Disk Reads/Sec divided by total Average Disk Transfers/Sec in your Windows Perfmon)

w = Percentage of write IOPS (calculated from the Average Disk Writes/Sec divided by total Average Disk Transfers/Sec in your Windows Perfmon)

RAID0 (striping, no redundancy)

This is basic aggregation with no redundancy. A single drive error/failure could render your data useless and as such it is not recommended for production use. It does allow for some simple math:

I =n*i

Because there is no mirroring or parity overhead, theoretical maximum Read and Write IOPS are the same.

RAID 1 & RAID10 (mirroring technologies):

Because data is mirrored to multiple disks

Read I = n*i

For example, if we have six 15k disks in a RAID10 config, we should expect a theoretical maximum number of IOPS for our array to be 6*180 = 1080 IOPS

Write I = (n*i)/2

RAID5 (striping with a single parity disk)

Read I = (n-1)*i

Example: Five 15k disks in a RAID 5 (4 + 1) will yield a maximum IOPS of (5-1)*180 = 720 READ IOPS. We subtract 1 because one of the disks holds parity bits, not data.

Write I = (n*i)/4

Example: Five disks in a RAID 5 (4 + 1) will yield a maximum IOPS of (5*180)/4 = 225 WRITE IOPS

Again, these formulas are very basic and have little practical value. Furthermore, it is seldom that you will find a system that is doing only reads or only writes. More often, as is the case with typical VMware environments, reads and writes are mixed. An understanding of your workload is key to accurately sizing your storage environment for performance. One of the workload characteristics (we’ll explore some more in the future) that you should consider in your sizing is the percentage of read IOPS vs. the percentage of write IOPS. A formula like this gets you close if you want to do the math for a mixed read/write environment in a RAID5 set:

I = (n*i)/(r+4 *w)

Example: a 60% read/40% write workload with five 15k disks in a RAID5 would provide (5*180)/(.6+4*.4) = 409 IOPS.

The previous examples have all been from the perspective of the storage system. If we take a look at this from the server/OS/application side, something interesting shows up. Let’s say you fired up Windows perfmon and collected Physical Disk Transfers/sec counters every 15 seconds for 24 hours and analyzed the data in Excel to find the 95th Percentile for total average IOPS (this is a pretty standard exercise if you are buying enterprise storage array or SAN). Let’s say that you find that the server in question was asking for 1000 IOPS at the 95th Percentile (let’s stick with our 60% read/40% write workload). And finally, let’s say we put this workload on a RAID5 array. That’s saying a lot of stuff, but what does it all mean? Because RAID5 has a write penalty factor of 4 (again, Duncan Epping’s posted a great article here which I referenced in Part II that describes this in a slightly different way) we can tweak the previous formula to show the IO’s to the backend array given a specific workload.

I = Target workload IOPS

f = IO penalty

r = % Read

w = % Write

IO = (I * r) + (I * w) * f

Our example then looks like this (remember work inside parenthesis first, and then My Dear Aunt Sally):

(1000 * .6) + ((1000 * .4) * 4) = 2200

Simply stated, this means that for every 1000 IOPS that our workload requests from our storage system, the backing array perform 2200 IO’s, and it better do it quickly or you will start to see latency and queuing (we call this performance degradation, boys and girls!). Again, this is a very simplistic approach neglecting factors like cache, randomness of the workload, stripe size, IO size, and partition alignment which can all impact requirements on the backend. I’ll cover some of those later.

As you can hopefully see, the laws of physics combined with some simple math can provide some pretty useful numbers. A basic understanding of your array configuration against your workload requirements can go a long way in preventing storage bottlenecks. You may also find that as you consider the cost per disk against various spindle speeds, capacities and RAID levels that you are better off buying smaller, faster, fewer, more, slower…. disks depending on your requirements. The geekier amongst us could even take these formulas and some costs per disk and hit up Excel Goal Seek to find the optimal level, but that’s more than this little blog can do for you today.

Before I wrap up this post, I want to leave you with a few more links that I have bookmarked around the topics of IOPS and RAID over the past several years:

DB sizing for Microsoft Operations Manger, includes a nice chart with formulas similar to the ones I provided in this article: https://blogs.technet.com/jonathanalmquist/archive/2009/04/06/how-can-i-gauge-operations-manager-database-performance.aspx
An Experts Exchange post with some good info in the last entry on the page (subscription required) https://www.experts-exchange.com/Storage/Storage_Technology/Q_22669077.html
A Microsoft TechNet article with storage sizing for Exchange – a bit dated but still applicable: https://technet.microsoft.com/en-us/library/aa997052(EXCHG.65).aspx
A simple whitepaper from Dell on their MD1000 DAS array – easy language to help the less technical along: https://support.dell.com/support/edocs/systems/md1120/multlang/whitepaper/SAS%20MD1xxx.pdf
A great post that uses some math to show performance and cost trade-offs of RAID level, disk type, and spindle speed. https://www.yonahruss.com/architecture/raid-10-vs-raid-5-performance-cost-space-and-ha.html
Another nifty post that looks at cost vs. performance vs capacities of various disks speeds in an array https://blogs.zdnet.com/Ou/?p=322

Keep Reading:

Vote Now for Top Virtualization Bloggers

January 4, 2010 by Josh Townsend

Happy New Year to everyone! 2010 is shaping up to be quite a good year, both personally and professionally. Between two little boys at home and a bunch of new projects at work I should stay busy. I will also continue to co-lead the Washington, DC VMware User Group (VMUG). In my spare time I will continue to write what I hope are technically sound, practical, and timely articles on VMtoday.com.

And speaking of VMtoday.com… Eric Siebert has opened up voting for the top 25 virtualization bloggers on his vSphere-Land.com site. I am very honored to be included in the ballot list of 55 of our industry’s top bloggers. Please take a few minutes to vote – a couple lucky voters will win a copy of TrainSignal’s VMware vSphere DVD training course.

Thanks for reading and best wishes for a blessed and productive 2010!

~Josh