Storage Virtualization & Scaling Out Your KVM-based Virtual Infrastructure

If you’re using KVM as a major component of your data center infrastructure, at some point you have to consider how you’re going to scale out. How easy is it to grow from 10 virtual machines to 20? How about from 20 to 200? The answers are likely different, but they don’t have to be…

I’m going to make a bold assumption that if you’re using KVM in the data center, you have at least 2 hypervisors and some form of shared storage (probably much more). In the context of the questions I asked earlier, supporting 10 virtual machines on those 2 hypervisors likely is not an issue. If the physical servers have the CPU, memory, and network bandwidth, they might even be able to handle 20 or 50. Regardless, you will in fact hit the limit of that particular server at some point.

But that’s not really a big deal. Add another server, hook up power and network, clone another instantiation of your default hypervisor image (you are using enterprise storage, right?), and then add the server to the pool of hypervisors. Your cramped sardine-like virtual machines can then spread out across the new hypervisors with ease, transparent to the rest of the data center.

Growing out the network is just as easy, if not easier. Need more ports? Drop in another switch, and extend VLANs, VPCs, etc. Again, this is no big deal. Or maybe you’re already using a SDN and you’ve automated this…

What about the storage? What happens when you hit the limits of your storage array? It doesn’t matter if it’s a “dumb” disk shelf, a Linux server exporting NFS, or an enterprise array. They all have limits around capacity and bandwidth. Even the enterprise arrays have a hard stop at some point regarding the number of files, volumes, RAID groups, and other important building blocks.

Traditionally, this meant scaling out storage could very well be a disruptive process as new storage was added to the environment. Storage clients had to unmount old storage, mount new storage, and many times data had to be moved from controller to controller in order to balance the load.

How do you overcome these storage limitations? How do you avoid this disruption and “scale for real”? If only you could virtualize your storage the same way that you virtualize your servers and networks… then you could just move your LUNs and exports around as needed, or just add additional controllers as if they were hypervisors… (enter dramatic music)

BOOM! Data ONTAP Cluster-Mode.

(Before you gloss over because of a ‘product pitch’, I promise to tie this back into KVM.)

In essence, Data ONTAP Cluster-Mode allows you to overcome the hard limits imposed by hardware or software by virtualizing the entire storage infrastructure. The available storage pool spans the cluster – for both SAN and NAS. A storage volume is no longer bound to a single controller. And because of this, a storage volume can be “live migrated” migrated from one part of the cluster to another, transparently to the rest of the data center – as in, no one has to unmount old storage and then mount new storage.

Need to add more capacity and/or horsepower to the storage infrastructure? No problem. Add another controller pair and shelves to the cluster. But what if a controller fails, am I screwed? Not at all. With previous versions of Data ONTAP, there was the concept of the “Active-Active” pair configuration where both controllers handled their own workloads, but would automatically take over both workloads if the other controller failed. Data ONTAP Cluster-Mode uses the model of the “Active-Active” pair as the building blocks of the cluster.

It’s like nested redundancy… A RAID within a RAID… “Inception” for enterprise storage.. Ok, I’m going too far with it…

All of the other storage efficiency features are there – Dedupe, thin provisioning, cloning. As are the data protection pieces – SnapMirror, SnapShots, and RAID-DP. Most of the demos that I’ve created for Red Hat Summit and this blog have used Data ONTAP Cluster-Mode on the back end.

Ok fine, but what about performance? Check out these record SPEC NFS results at http://www.spec.org/sfs2008/results/res2011q4/sfs2008-20111003-00198.txt . (Spoiler, NetApp wins.) Match that with the SPEC Virt results – http://www.spec.org/virt_sc2010/results/ (Spoiler, Red Hat and KVM win) and you have the makings of a very fast and very scalable virtualization platform.

And that is my point – a fast and scalable virtualization platform built around Red Hat, KVM and NetApp Data ONTAP Cluster-Mode.

Want to know more about Cluster-Mode? My colleague, Charlotte Brooks has written an excellent introduction to Cluster-Mode and can be downloaded from http://www.netapp.com/templates/mediaView?m=tr-3982.pdf&cc=us&wid=131201824&mid=57332234 .

Want to know more about RHEL 6, KVM, and Data ONTAP Cluster-Mode? Stay tuned, I’ve got a new document coming out in the next few weeks.

Hope this helps,

Captain KVM

Agree? Disagree? Something to add to the conversation?