Using pNFS to Scale Out RHEL & KVM

A few weeks ago, I talked about “Storage Virtualization & Scaling Out Your KVM-based Virtual Infrastructure”. I want to expand on that theme, and I’ll probably do that several times over the next few months.  This time, I’ll focus a bit more on one of the specific ways that we can truly scale on demand with RHEL 6, KVM, & NetApp Clustered ONTAP using pNFS (Parallel NFS).

What is pNFS? Well, let me outline the problem first using NFSv3, and then I’ll explain pNFS. In the “Scaling Out” post that I referred to before, I gave a very brief explanation as to what Clustered ONTAP (aka Data ONTAP Cluster-Mode) was and why it’s important. Let’s recap that just a bit.

Recap of Clustered ONTAP & ONTAP 7-Mode

In Data ONTAP “7-Mode”, NetApp had the concept of the Active-Active pair to handle automated failover in order to minimize disruptions caused by hardware failure (controller fails, network disrupted, intern unplugs the wrong cable, etc). However, scaling out for capacity and load balancing of storage workloads was hardly eloquent. Most other storage vendors have similar issues.

In Clustered ONTAP, the Active-Active pair becomes the building block for scalable, non-disruptive storage. Clustered ONTAP can be thought of as a storage hypervisor where “Vservers” are the Virtual Machines. In essence, this is true Storage Virtualization and therefore, storage volumes (and storage interfaces) are no longer bound to physical hardware.

Storage volumes are created and managed from within Vservers, yet the cluster can be managed as a single entity. As Active-Active pairs are added to the cluster non-disruptively, workloads can be balanced quite easily.

Herein lies a new problem: we’ve easily scaled out and easily migrated our storage volume, but NFSv3 still holds us back. Let’s illustrate this using NFSv3, starting with a more generic storage implementation.

In this first diagram, we have a storage volume that is exported from some generic storage in the data center. The last server in the diagram has mounted that export using NFSv3. For the sake of argument, lets say that the first storage controller is getting overworked. In traditional storage, this would mean a long weekend of copying or mirroring volumes from one storage controller to another in order to balance things out.

But this also means that you have to plan time to migrate client side configuration files to mount the storage at the new location. Your weekend just got longer. Yay you.

So lets fast forward through your long weekend of migrating data and assume everything is complete. But you’re back in first thing Monday morning wishing there was a better way out there on the technology horizon…

pNFS For the Win!!

pNFS adds intelligence to the client side such that any volume move on the server side, results in the pNFS client automatically optimizing the storage path. No wasted weekends copying volumes from one controller to another. No editing client side NFS mounts. None of it.  Migrate your volume, and the pNFS client makes the adjustment completely transparent to the user and application.

So how do you get this new-fangled pNFS? In short, through RHEL (6.2 or newer) and NetApp Clustered ONTAP (8.1 or newer).

Red Hat included pNFS as a “Tech Preview” in RHEL 6.2., earlier this year. Think of “Tech Preview” as a fancy way of saying “We’ll distribute it, we’d like you to play with it, but it’s not fully supported, yet”. Additionally, the RHEL 6.4 beta that was just released included comes out the week of November 12, includes some very significant patches to pNFS that include support for Direct I/O. (Not a trivial backport.. Major kudos to Steve D and everyone else at Red Hat.) Direct I/O in pNFS is what makes pNFS a good fit for virtualization and database workloads.. (Expect more posts on that in the future.)

On the server, the pNFS server splits the data I/O from the metadata and sends them across different paths. In NFSv3, metadata and data are sent across the same path. This streamlines much of the NFS related activities and provides a direct path for data access. From a standards point of view, pNFS is an extension to NFSv4.1. Just as RHEL 6.2 or higher is required for the client side, Clustered ONTAP 8.1 is a requirement from the NetApp standpoint for pNFS.

So how does this tie back into KVM? I’m SO glad you asked. I look at it from these angles:

  • A hypervisor that uses pNFS to access a VM data store would be able to take advantage of multiple connections per session, which can (and should) be distributed across multiple interfaces. Think “multipathing for NFS without a separate driver”. (NFSv3 uses a single connection/session.) Put that over 10GbE and I get excited just thinking about it.
  • Scalability without disruption – that was the whole point of the original blog post (and this one too, actually). The ability to scale out your KVM-based virtualization platform without having to wait for your storage to keep up. This adds significant intelligence to your NFS-based VM storage without additional overhead to the servers or the admins. Not to mention you get (some of) your weekends back.

I hope this helps illuminate things a bit for pNFS and how you might be able to take advantage of it in your environment. For additional information, check out NetApp TR-4063 “Parallel Network File System Configuration & Best Practices for Data ONTAP 8.1 Cluster-Mode“. (I know, the title just rolls right off the tongue…)

Hope this helps,

Captain KVM

8 thoughts on “Using pNFS to Scale Out RHEL & KVM”

  1. Ovirt is working on Glusterfs integration, it already has some but more will follow. What are the advantages of pNFS on NetApp over Glusterfs? It looks to me they are both trying to solve the same problem.
    Kind regards,
    Sander

    1. Hi Sander,

      Thanks for stopping by. Gluster is meant to be an easy and inexpensive scale-out solution for storage. While there is a native Gluster client, it’s really a scalable file system accessible by NFS & CIFS

      pNFS is a protocol and a file system that adds a lot of intelligence specifically to NFSv4.1. On the client side, the pNFS client can automatically re-optimize the data paths if they change. On the server side, the pNFS server separates meta data traffic from I/O traffic, in addition to allowing multiple sessions per connection. The point of all of that is that easy scale-out of NFS can add to a problem that we refer to as “NFS sprawl”. pNFS is one way of handling NFS sprawl.

      hope this helps,

      Captain KVM

  2. Did you see that a group of companies have launched open-pNFS.org to promote the use of and open development of parallel NFS?

    As the open source client and server gain industry traction, the code will mature and the use of pNFS for virtualization will become standard.

  3. I read this article with intense interest. Jon, we can actually team up and prove that pNFS can be an excellent foundation of an enterprise RESTful storage cloud! Please take a look of http://tinyurl.com/aarcbj2. The embedded screencast (2m4s) in the Appendix should show how pNFS can address the slow IO part. SDC 2013 is coming up in September. I think we can create a compelling talk 😉

    1. Hi Chin,

      Thanks for dropping by. And thanks for sharing the screencast – I thought it was very interesting. As for SDC 2013, we’ll have to see. NetApp is a platinum sponsor for that conference, but I’m still trying to iron out my schedule for the second half of the year.

      Captain KVM

Agree? Disagree? Something to add to the conversation?