Use Cases for Both RHEV Snapshots and NetApp SnapShot copies

Hi folks,

In one of my previous posts, I mentioned that I would start review some of my favorite new features in RHEV 3.1. This week I tackle the first topic: Live Snapshots. But I can’t leave well enough alone; I also compare it to NetApp SnapShot copies. Not to show that one is better than another – only to highlight different use cases. Read on!!

So before I dive into the differences in approach as well as use cases, I’d like provide my angle on levels of snapshot consistency while being momentarily vendor neutral:

  • Crash consistent – This is where you capture the state of a VM (or group of VMs) “in flight”. That is to say that nothing has been quiesced. This is great for stateless or read-heavy applications. Not necessarily the best option when the application data is stored on NAS or SAN (see the next option below instead). Recovering a VM from a crash consistent snapshot almost always works, but they’re easy to initiate without much planning. (hint, not a good option for that tier 1 database that stores your company’s financials..)
  • Application consistent – This is similar to the crash consistent except that extra steps have been taken to quiesce the applications (hot backup mode, etc). This is especially useful for write-heavy applications, critical applications, and when your data is on SAN or NAS. As soon as the app is quiesced, snapshots can be taken of the VM data store and the app data store, then the app is resumed. Restoring from an app consistent snapshot has an extremely high level of success, but require a little planning.
  • Fully consistent – This builds on the application consistent snapshot by also quiescing the virtual machine itself. Typically, we only care about the application data itself, because most of the time we can build a new VM (or group of VMs) and attach to storage faster than we can locate a backup and then restore from it. But what if the VM build itself is a pain in the ass, even with golden images? This is where the fully consistent snapshot comes into play. Fully consistent snapshots have the highest level of success and the highest level of planning.

Now back to RHEV and NetApp snapshots…

In earlier versions of RHEV, taking a snapshot of a virtual machine meant having to shut it down, take the snapshot, then start it back up again. This meant either a service outage for users/customers or a sleep outage for admins (or learning PowerShell. Yikes.). RHEV 3.1 introduces the “Live Snapshot”, and it replaces the old way. And it couldn’t be easier. Well, maybe a little*. Select your VM, select the Snapshot tab, press “Create”, enter a description, and hit “OK”. Boom. A few seconds later, (depends on the size of the VM) the snapshot completes.

* Easier would be to make “snapshot” a menu option when you ‘right click’ a VM. I know, I’m just a jerk who can’t be happy with what’s been provided….

The other thing you may have picked up on is that, like the old way, a RHEV snapshot pertains to one VM at a time. Because of this, a RHEV snapshot may be better suited for testing a new configuration or patch set on a single virtual machine. This may be very appealing for a “dev/test” environment where such changes don’t require strict change management. This affords the virtualization administrator to quickly roll back the virtual machine without affecting the other virtual machines in the volume.

It also means that the admin, who may not have direct access to the storage, doesn’t want to involve the storage admin as he moves VM versions back and forth.

The other primary use case is to create a new virtual machine from a RHEV snapshot. This is very useful when updates to an existing virtual machine constitute the new standard image. This is a time saver as compared to creating a new virtual machine, then adding the new updates and/or configuration.

The other factor to consider is backup and archive. A snapshot of any kind (RHEV, NetApp, etc) cannot be considered a backup until it is stored on a separate storage array from the source. Otherwise, any calamity involving the source storage array affects not only the virtual machine, but also its snapshots. However, it makes it very easy for a single machine “crash consistent” snapshot.

A NetApp SnapShot copy is preferred when a near instantaneous copy of an entire group of virtual machines is required. This makes it trivial to restore the entire volume, but still allows the restoration of a single virtual machine. NetApp SnapShot copies also equally effective to three primary snapshot consistency levels discussed above.

The other area that NetApp SnapShot copies are well-suited for is backup. When used in conjunction with NetApp SnapMirror, NetApp volumes and their SnapShot copies are asynchronously mirrored to other NetApp storage arrays. These arrays may be in a different area of the date center or a different locations altogether. Those mirrored volumes are easily mirrored back for restoration.

So that’s this weeks entry.. It’s not so much “single VM snapshot” vs “volume snapshot”, but where do they fit nicely in the same data center. And honestly, I like them both, just not for the same use cases. And if you’re wondering about combining them, I thought about it and couldn’t come up with a use case that wasn’t just adding in unnecessary steps.. But feel free to comment and suggest your own combo use case.

Hope this helps,

Captain KVM

7 thoughts on “Use Cases for Both RHEV Snapshots and NetApp SnapShot copies”

  1. Hi Captain,

    nice article..

    But i have question :

    We are working with Redhat 2 node Cluster CMAN -> CLVMD.
    All vms has a 1:1 vm (lvm in the vm) lvm outside the vm NetappLun over Fc why ? Performance!

    So now the request arvied me do backup 🙂

    And now the problems going on.

    – LVM / CLVMD is not supported by snapcreator so wont work.
    – We has a 2 node, but all vm’s in one Volume so minimum 2 snapshots has do be scheduled.
    – The vm can on one or other node located

    So my solution is :

    Script : – Get the VM’s running on the node
    – Identify the lvm volume
    – vm suspend
    – snapdrive take snapshot of volume with lvm parameter
    – vm resume

    Do the same on the secound maschine.

    But my question ist has you a article what ever that desriped a better solution ?

    rg
    marcel

    1. Hi Marcel,

      Thanks for dropping by and leaving a question. If I understand your setup correctly, you are using Red Hat Cluster Suite (or part of it) to manage a shared LUN. You’re not using a filesystem on the LUN, you’re only using CLVMD. So each VM is backed by a logical volume on the shared LUN. It also sounds like each VM has a direct access to an additional LUN for data. So when you need to do a backup, you have to capture the volume that contains the VM images as well as the volume that contains the application data. Is this correct? What applications are you running in the VMs? Are the applications “transactional” like a database, or are they stateless like a web server?

      If this is correct, then my suggestion is to alter your procedure just a little bit. What you really care about is the application data. You might consider only doing a “crash consistent” snapshot of the VM volume, then an “application consistent” snapshot of the application volume. A “crash consistent” snapshot means that you take the snapshot while the VMs are running. If you have to restore a VM from a crash consistent snapshot, then it will likely be ok. It might have to replay the filesystem logs, but that’s ok because that is what they are for. The “application consistent” snapshot is when you pause the application before you trigger the snapshot, and then resume it. For example, most databases have a “hot backup” mode that would work very well with the “application consistent” snapshot.

      Hope this helps,

      Captain KVM

      1. Hi captain,

        thanks for reply my question.

        Yes you are rigth.

        You mean it is good to do a snapshot out side on the kvm hosts without suspend / resume the kvm for the OS.

        And do a suspend/resume for the data partition.

        But the data are stateless like a webserver.

        So “crash consistent” should be godd for me.

        rg
        marcel

  2. Hi Captain,

    We are having 2 node cluster on RHEV and we are using netapp storage. We want to take backups of all vm’s in such a way that it should help us in case of DR.

    Could you please suggest how can we achieve this and what all files needs to be backed up. Presently we are taking backup of .snapshot directory which is getting generated under SAN mount point and postgres via pg_dump.

    Will these backup be effeicient enough to support us during DR, that means will we be able to copy this data onto bare metal server and it should work?

    Please suggest.

    Regards
    Sam

    1. Hi Sam,

      Thanks for reaching out. If you have more than one NetApp controller, then I would heavily suggest that you purchase and use SnapMirror. This is a backup and recovery tool that works with your SnapShots.. You can mirror data asynchronously to your secondary controller. If something happens to the original controller, you can use your secondary controller instead. If the data on the first controller becomes damaged or corrupted due to human error, the data can be recovered from the secondary controller.

      I bring this up because, yes, you can restore your VMs from SnapShot. But if you don’t have that data copied/mirrored/located in a secondary site, it’s not actuall a backup. For example, if you have a month of snapshots, and then someone deletes some VMs, you can restore them from the snapshots. But what happens if something happens to the controller itself? If a fire tears through the data center and destroys the controller disks, and you don’t have the data mirrored elsewhere, it doesn’t matter how many snapshots you have.

      If you don’t have the budget for another controller to be placed in another data center or even somewhere else in the existing data center, you should still find a way to backup a thick copy of your data and your snapshots.

      Captain KVM

Agree? Disagree? Something to add to the conversation?