In this most current post, I wanted to cover 2 of the core features of NetApp Clustered Data ONTAP and tie it specifically to how it really helps KVM (and OpenStack) scale. While this is meant to be a tech discussion, not a sales pitch, I can’t help but point out that this article shows you some things you can do with NetApp that you can’t do with any other storage, be it “enterprise” or “commodity”.
I know, I know, that’s pretty bold talk, but I’m about to back it up. In order to provide a level set, here are some things that you should know and/or understand:
- Data ONTAP is NetApp’s storage operating system, with the current version being 8.2
- Clustered ONTAP is NOT the old (now referred to as 7-mode) ONTAP in a simple 2 node active/active cluster.
- Clustered ONTAP uses active/active pairs as building blocks to create a global namespace for SAN and NAS.
- Clustered ONTAP has a different architecture and command set as compared to ONTAP 7.x and earlier as well as ONTAP 8.x running in 7-mode.
- All of the things that folks love about the old NetApp (dedupe, thin provisioning, cloning, Snapshots, etc) are still there
And for good measure, here is a visual representation of a 4-node cluster:
From top to bottom, we see that the SAN/NAS clients attach to the storage by way of “LIFs”, or logical interfaces. These could be for Ethernet storage or for FC storage. We also see that the 4-node cluster is made up of 2 HA (active/active) pairs, and are linked by what is referred to as the cluster interconnect which is a 10GbE network that carries both Cluster traffic but also data traffic. This is very important to understand in the context of the global namespace.
Under the storage controllers and under the cluster interconnect, we see 4 stacks of disk shelves overlayed with “SVM 1” and “SVM 2”. A storage virtual machine, or SVM, is the secure container within Clustered ONTAP that owns storage volumes and storage interfaces (LIFs). And as illustrated, an SVM can span the entire cluster, by way of the cluster interconnect and the global namespace. Here is an example.
If the client on the far left is connected to the NetApp controller on the far left by way of LIF4, he can access the data (NFS export, CIFS share, iSCSI LUN, FC LUN) regardless of where the actual storage volume is.. For example, that far left host using LIF4 will have no issue accessing the yellow volume in the far right bottom. The cluster interconnect simply routes the traffic there once Clustered ONTAP realizes the data is not on the local system. For the initiated, this is one of the primary benefits of a “global namespace”, and NetApp is not the only player here.
However, we’re about to go into uncharted territory. What if the NetApp controller on the far right is getting hammered? No problem, we look at the volume or volumes that are getting the most requests and make a decision. Do we want to move the offending volumes to a different host, or do we want to move the less important volumes to a different host. The decision will be based on business factors, but the action you take is the same. You live migrate the volume(s) from node 4 to whatever other node in the cluster that you need. Again, your SAN/NAS clients can still access that data through their respective LIFs, so there is no disruption to the storage access. The length of time it takes to move the volume depends on the size of the volume(s) and the amount of traffic on the cluster.
Lets throw in another example. What if you wanted to replace the 2 nodes on the left with newer/faster models? No problem. The first thing to do is live migrate the LIF(s) on the far left to the second controller. Again, the storage clients can still access their data without interruption. LIF migration is near instantaneous. Now that the storage access is moved off of node 1, we shut it down, and replace it with the new node. Then we live migrate all of the LIFs on node 2 over to node 1, and replace that node. Again, this is all done non-disruptively. Once node 2 is up and joined to the cluster, we spread the LIFs across the 2 nodes again. We never even touched the storage volumes in this example.
Ok one more example, then we tie this back to KVM and OpenStack. Your environment is ready to grow again, but because the new storage will be used for an existing project on the cluster, you don’t want to have that data spread across disparate controllers in the data center. Not a problem. Add an additional NetApp HA pair to the cluster. You’ll likely want to add some new LIFs to the cluster as well. Once your new cluster nodes are added (non-disruptively), you can both add new storage volumes and load balance the existing storage. A NAS only cluster can grow to 24 nodes and thousands of Petabytes. A SAN only or mixed cluster can grow to 8 nodes, and still grow into thousands of Petabytes. Each HA pair can handle 256 LIFs (you’ll want to spread them equally), and each cluster can handle hundreds of SVMs.
Hopefully you’ve already seen the possibilities for your virtualization and cloud environments. But just in case, here are some pointers. The fact that you can migrate volumes and LIFs on the fly, means that your storage can move as fast or faster than the applications that it supports. Have a tier 1 database that you know is going go into overdrive in the next few days? Go ahead and plan your volume migrations and be ready. Expecting a bunch of new hypervisors? Start adding LIFs to spread the access across the cluster so that the storage connection itself is not the bottleneck.
Need dual paths for your boot LUNs? Again, not an issue as your initiators can log into LIFs on multiple nodes in the cluster, allowing the Linux multi-pather to handle path failover. Want to implement pNFS? Not a problem. Mount your pNFS client to any LIF in the cluster that has access to the SVM and volume, and pNFS will automatically maintain the most optimal connection without human intervention. Move the pNFS export? No problem. pNFS will automatically reconnect elsewhere without interruption and without re-configuration.
All of this means that the number of KVM and OpenStack hypervisors can grow without creating bottlenecks at the storage connection. It means that different storage volumes (applications!!) with different SLA’s can be spread accordingly to faster or slower controllers. Storage volumes and storage connections being live migrated non-disruptively to balance and scale with your environment pain free.
No one else can do that.
Hope this helps,