Providing High Availability for RHEV-M

Hi folks, in today’s post I’d like to cover something fairly elementary – providing HA for the RHEV-M application. To be honest, I can’t believe I haven’t written a post on it sooner. If you have power users, VDI users, and/or other services (via API/SDK) that depend on RHEV-M, it’s definitely critical. Besides, providing HA for RHEV-M is just downright convenient, even if it’s just virtualization administrators that need access.

Before I dive into how to provide HA to RHEV-M I’d like to cover what you will and will not need in order to make this happen. Here are our requirements:

  • An application that detects failure and automatically restarts the RHEV-M server
  • Shared storage to house the RHEV-M data
  • (Nice to have) An automated “takeback” feature when everything is back online

Here is what you don’t need:

  • A cluster application (Veritas Cluster Suite, Red Hat Cluster Suite, corosync, etc)
  • A cluster file system (GFS, GFS2, VFS, GPFS, etc)

NOTE: if you’re using more than 2 hypervisors or protecting many VMs, I do recommend that you look into a more traditional cluster suite.

UPDATE MAY 2016: Please note that RHEV has had the “self-hosted engine” option for several minor releases now. This means that you may deploy RHEV-H, then RHEV-M as a VM on RHEV-H for the purposes of HA. Obviously you would need a second RHEV-H node, but the process is well documented and straightforward.

If you’re confused, hold on a minute, I promise to illuminate. If you know what I’m about to say, feel free to move on. I’m a big believer in making sure that things are no more complicated than they need to be. In this case, deploying a cluster suite and a cluster file system becomes 2 more things that you have to manage in your environment. It’s an added layer of complexity that doesn’t have to be there.

If you’ve read any of my Technical Reports on RHEV & NetApp, you know that I’m a big fan of virtualizing RHEV-M, Kickstart, NetApp OnCommand System Manager, and all of the other “infrastructure” applications that support the “production” hypervisors and virtual machines. Virtualizing those applications on a pair of “thick” hypervisors (RHEL 6.4 + KVM in this case) makes good sense. Your infrastructure applications gain the mobility afforded by virtualization, and using NFS (NetApp in this case) for your shared file system also makes really good sense.

In other words, you already know that KVM is “easy” in terms of management, and NFS is super easy to maintain. No added complexity.

Ok, so you’re onboard with virtualizing the infrastructure servers for mobility, and you’re fine with using NFS as the shared file system. What about the “application that detects failure and automatically restarts the RHEV-M server”?

Easy, we’ll script it in “BASH”. I’ve come up with a really simple way of detecting and automating failover for the VM that needs HA, RHEV-M in this case. You’re free to use the following script or come up with your own. The point is, is that it doesn’t take a whole lot of effort to put something together.

Let me walk you through the script. There are 3 main sections: Assumptions and Global Vars, Primary Host, and Secondary Host. You copy the contents of the script to both of your thick hypervisors. I happened to save it in “/root” and named it “kvmhrtbt.sh”. Decide which host will be primary, and which will be secondary. For the primary host, un-comment the section that says “Uncomment the underlying section if this is the Primary KVM Host” (just remove the single “#” from each line and you’ll be fine.)

For the secondary host, un-comment the section that says “Uncomment the underlying section if this is the Secondary KVM Host”. (just remove the single “#” from each line and you’ll be fine.)

On both hosts, you need to fill in the variables in the “Global Vars” section. Also note that SSH keys need to be configured for “password-less” login between hosts, and NTP needs to be configured and running.

Lastly, create a “cron job” on both hosts so that the scripts will be run periodically. If your RHEV-M server is super important, you might schedule it to run every 60 seconds. If it’s more of a convenience, let it run once every 5 minutes or more. It’s up to you. My crontab looks like this:

* * * * *          /root/kvmhrtbt.sh

If you wanted to get creative or more granular that 60 seconds, you could also create a more traditional “service” script that employs a “while loop”.  In the example below, the script has been configured as the primary, as the “primary” section has been un-commented. Just remember that the secondary host gets it’s section un-commented.

(Note that I did all my testing in the video with a “test” VM, not my RHEV-M VM. I ran it successfully several times on my RHEV-M VM with zero issues.)

#!/bin/bash
### Assumptions
### * There are only 2 KVM Hosts, a Primary and a Secondary
### * SSH keys have been created and exchanged between 
###   the Primary and Secondary hosts
### * NTP is running on both hosts and is configured properly
### * Either a cronjob (* * * * * /root/kvmhrtbt.sh) or more
###   traditional start/stop script needs to be created to
###   run this script on the hosts periodically

### Global Vars 

## Primary & Secondary Hosts
HOST1="infra01"
HOST2="infra02"

## The VM that requires HA
## Provide the KVM domain name, not the hostname
GUEST1="rhevm"     

## URI's for the KVM Hosts (don't edit)
URI1="qemu+ssh://$HOST1/system"
URI2="qemu+ssh://$HOST2/system"

## this is the "state" file, needs to be located on the shared storage
STATE="/var/lib/libvirt/images/state.text"

### States - it is critical that the State file only has 1 single
### character as defined below
## 1 - The VM runs on Primary
## 2 - The VM runs on Secondary
## 3 - The VM isn't running

#####################################################################
#### Uncomment the underlying section if this is the Primary KVM Host
#####################################################################

## PRI VARS
# Pull (migrate) the VM from the secondary if state is "2"
if [ "`cat $STATE` == 2" ]; then
   logger Virtual Machine $GUEST1 Takeback from $HOST2 Initiated
   virsh --connect $URI2 migrate --live $GUEST1 qemu:///system
   echo "1" > $STATE
 fi
## End Primary

#####################################################################
## Uncomment the underlying section if this is the Secondary KVM Host
#####################################################################

# ## SEC VARS
# PING=`ping -c 3 $HOST1 | grep "100% packet loss" | wc -c`
# VLIST=`virsh list --all | grep $GUEST1 | awk {'print $4'}`

# ## If I can't reach the primary and I'm not running the
# ## guest, change the state to "3"
# if [ $PING != 0 ] && [ "$VLIST == off" ]; then
#    echo "3" > $STATE
#  fi

# ## If I'm the secondary and the state is "1" no problem,
# ## otherwise restart VM locally if necessary
# if [ `cat $STATE` == 3 ]; then
#    logger Virtual Machine $GUEST1 Takeover Initiated, $HOST1 Failed
#    # send a destroy, just in case
#    virsh --connect $URI1 destroy $GUEST1 > /dev/null 2>&1
#    # start the guest
#    virsh start $GUEST1
#    # change the state
#    echo "2" > $STATE
#  fi
# ## End Secondary

You might have noticed that the “primary” section even includes the automated “takeback”, and both sections include lines to log activities to “syslog”.

Here’s a quick recording showing the script in action:

 

As usual, feel free to leave comments or questions in the comment section.

hope this helps,

Captain KVM

33 thoughts on “Providing High Availability for RHEV-M”

  1. Excellent script. I am a huge fan of this type of implementation for the simplicity and ease of modification.

    For one application, I have a medium term requirement to build HAFT with the fastest possible change over time and I really don’t want to use VMWare for their HAFT implementation. With Kemari effectively dead, do you think that we’ll ever see supported FT in KVM?

    1. Hi Yorik,

      Thanks for taking the time to post a comment/question. Eventually I think we will see FT for KVM, but I don’t see it this year. (This is something that I’d be happy to be wrong about..)

      Captain KVM

  2. Hi,

    Thanks for sharing this. I really like the simplicity of the script! Right on.

    Hey, there’s a small typo on the comments section right after the STATE variable declaration (on your video is fine). The flags should be 1,2 & 3 instead of 0,1 & 2 🙂

    Thanks for these wonderful tutorials!
    Jorge

    p.d. I just heard your interview from on the NetApp Communities Podcast. It was a very interesting one. I didn’t know about that podcast…have plenty to catch up now!

  3. Simple and effective. Love it!

    Have you ever tried to extrapolate this to multiple VMs and more than two nodes? It would be interesting to see how that would scale.

    Thanks for sharing this with us!

    1. Hi Obelixz,

      Thanks for dropping by. I have not tried expanding this beyond what I’ve shared. Adjusting for multiple VMs would likely be easy, but multiple hypervisors might be a little trickier. At that point, it might be easier to go with a more mainstream clustering application.

      Captain KVM

  4. Hi Captain,

    I am new to this Redhat Virtualisation. Will this works for RHEV-M (Redhat EL 6.4 with Manager RPM Installed on two machines as management server). Please suggest.

    1. Hi Vijay,

      Thanks for stopping by. If I understand your question correctly, you are asking if you can install RHEV-M on 2 different hosts. If this is your question, then the answer is no. There is an embedded database that contains all of the data for VM’s, networks, storage domains, etc. There is not currently anyway to have 2 instances of RHEV-M connect to the same embedded database.

      If this is not what you are asking, please reply back.

      Captain KVM

  5. Hi captain,

    We got the requirement to install RHEV-M as high available without clustering. For this, we tried like installing the RHEV-M on one node with config files on shared iscsi lun and making this as active node. If some issue happens on primary node, then on other node we thought of mounting the same iscsi lun from the storage and start the RHEV-M server. Is it possible..?

    Thanks & Regards,
    Vijay

    1. Hi Vijay,

      I understand your question now. If all of the RHEV-M related data is on the LUN(s), I don’t see an issue. It would not be “highly available”, but it would be very easy to manually failover. However, I would test it out a few times first – both failover and failback.

      Captain KVM

  6. Thanks for the update Captain. Yes, when i am testing this requirement, i am facing some issue..

    ISCSI lun has been shared to both the Nodes. We detected the lun and created logical volumes for the RHEV-M related data. But after restarting the node, we can able to see the lun, able to check the logical vol (PVS,VGS & LVS commands) but can’t able to mount the filesystem.

    It is throwing the error as “mount: you must specify the filesystem type”. Also there is no details on UUID part as well (/dev/disk/by-uuid)..Pls suggest us a solution.

    For your kind info: We had installed one node as a standard installation (using only the local disks). Then, we are installing the second one as shared iscsi as mentioned above and we are syncing the configuration files from the node1 to this node 2. But the iscsi lun is shared to both the nodes.

    Hope this helps..!!

    Thanks & Regards,
    Vijay

    1. Hi Vijay,

      Like the article that you are responding to, I prefer to deploy RHEV-M in a virtual machine. I was looking at my RHEV-M deployments and there are at least 3 different directories that are critical to RHEV-M – /var/lib/ovirt-engine, /etc/ovirt-engine, and /usr/bin. You could create 2 identical RHEL servers and mount 2 LUNs to the 2 ovirt-engine directories and copy over the required commands in /usr/bin, but Red Hat may not want to support that configuration.

      I highly recommend virtualizing RHEV-M like I explain in my article. It’s much more straightforward. You don’t even have to use the script that I wrote. If the RHEL 6 server that is hosting your RHEV-M virtual machine fails, simply start it up on your other RHEL 6 server.

      Hope this helps,

      Captain KVM

    1. Hi Vijay,

      If you’re using LVM, you’ll want to look at the man pages for LVM.. specifically around things like `vgchange -ay`. If you’re not using LVM, you may have a corrupted file system or superblock.

      Captain KVM

  7. Sir,

    I have posted a question here yesterday, but could not find today. I need assistance on a small issue. I have existing RHEVM hosts and other KVM HVs. I want the other KVM HV s and VMs under the umbrella of existing RHEVM, which eables easy management of hosts. Please update if it is possible, if so provide any reference page of configs.

    Thanks
    Sarma

  8. Thanks for the documentation Captain,

    I will read perform the migration. Will post the migration after. Just in case need your help in any roadblocks.

  9. This looks pretty cool Captain KVM… I use a similar (ish) setup for a POS system I built. Frontend APP connecting to mysql database. DRBD replicating the /var/lib/mysql dir. If host1 fails, APP connects to host2 instead (all data is there already), and host1 reboots.

    Works very effectively.

    But, at work, we currently have no failover. We have a beast of a KVM host, that recently had a HDD error, that caused all VMs to fail.

    I am very interested with your solution, but I would need to modify the detection script in more detail, as as ping alone, just isnt enough detection for what we need. If HTTPD / MySQLD dies, but ping is still there, we will still need to failover to the other VM host.

    I see us having a seperate cron for each VM, with the VM name given in cron file arguments.

    There is just one thing I am not too sure about with your setup here. I know that the state file is in shared storage, but are the VM .img files themselves in shared storage?

    I’ve read that virsh live migration already has a ‘backed up’ copy that is never generally more than 5 minutes old, but this wasnt from the virsh man page.

    Do you have the .img files replicated, instantly, so that the virsh migrate command just has to transfer the xml config files?

    Thanks,
    Glenn

    1. Hi Glenn,

      Thanks for stopping by and leaving a comment/question. The shared storage is used specifically for the .img files. So if you have say, 3 hypervisors all attached to the same shared storage, it doesn’t matter where the VM is running. If you do a live migration from host1 to host2, the same .img file is used, not a copy. Part of the migration process is to compare the age (or existence) of the .xml file and copy/update it as needed.

      I like the sound of your home app setup – nice hack.

      As for HA/failover in production, my little script was really meant to be used for a small setup with just a few VMs (like your home setup). I actually would not use it for production. If you need production HA, I would look at something like pacemaker. If you run CentOS, there is a built in HA suite that uses corosync and rgmanager – the same tools that are included with Red Hat Cluster Suite. If your work environment happens to be a RHEL shop, then I would recommend the RHCS.. And if you’re an Ubuntu shop, I’m fairly certain there are pacemaker and/or corosync packages available there too..

      Let’s talk about your HDD failure and KVM server(s).. If you’re only using 1 KVM host, then get another one. You mention the one you have is a monster – your new KVM host doesn’t need to be the same size. Just big enough to handle the most important VMs. And it has to match CPU’s as well – you can’t have one be AMD and the other Intel. The whole reason for shared storage is for live migration, whether that is for maintenance, failover, or load balancing. The easiest shared storage to setup and maintain is an NFS server. Your VMs will still be “.img” files. If you go with block storage, then you can use a filesystem like ext4, XFS, or skip the filesystem and just use clustered LVM. If you use the filesystem method your VMs will still use the raw disk format of the “.img” files. If you use clustered LVM, your VMs will be backed by logical volumes.

      Again, NFS is the easiest to setup and maintain. Your NFS server should have several disks that are setup in a RAID configuration such as RAID 5 (do NOT use RAID 0) so that they can handle a disk failure. Setting up RAID in Linux is fairly straightforward and you can find instructions via google, just like the corosync or pacemaker HA setup.

      hope this helps,

      Captain KVM

      1. Thanks for the fast response. The home build is just something simple so there is always an on-line copy of the database for the POS system for my mum and dads pub. (Yes, I wrote Point-Of-Sale software in PHP / JS / MySQL! Oh, and Bash… and Python – for the receipt printer / cash drawer servo)

        On our production server at work, we do run CentOS, yes, I refuse to use Ubuntu (I just dont like it.), and I’m fairly new to this job, so I had no influence on the system they currently run.

        I did help the company upgrade to the new VM server not long into my job here though, and whoa is it a beast… 12cores, 96GB RAM, 1.7TB Raid’ed storage!

        We have just under 20 VMs right now, and I cant see us having too many more, so I think your script may well suffice (with some adaptations for fail-detection) – let’s face it, there’s only a small chance the host will die, and ALL VMs will need to be running on the backup.

        I’m glad that the migrate sequence uses the same image files, and is mainly there to update the XML files.

        I think if we build a whole cluster system (Just for two nodes) it will be a little over complicated, and very hard to implement… We must keep all VMs running somewhere, while we are doing the upgrade – which will need to happen no matter what our new setup is, but your solution will be easier and quicker to implement. The only major change we will need to make to our existing host is setting up DRBD.

        My plan is to have a DRBD partition mounted to ‘/home/virtual_machines/’ , and permanently mirror the image files (and the state file), then use an adaptation of your script to keep a live copy of each VM at all times.

        I will pop back in and say hi when we do eventually get this in production. (It may take a while as we have to acquire Host x2)

        Thanks again,
        Glenn

        1. Glenn,

          I love that you’re running your parents’ pub POS!! That is awesome. And yes, keep me posted on your progress at work.

          Captain KVM

  10. Thanks for the script, which I will definitely be implementing.

    Just an FYI but I’m testing this on Linux Mint 16 and CentOS 6.5. In both cases

    VLIST=`virsh list –all | grep $GUEST1 | awk {‘print $4’}`

    needs to be

    VLIST=`virsh list –all | grep $GUEST1 | awk {‘print $3’}`

    1. Just be aware that the self-hosted engine system is still quite buggy. Check out the numerous problem reports on the Ovirt Users mailing list (there is no RHEV Users mailing list) and bear in mind that most problems users experience never even get reported.

      I’ve tried it myself, in a test environment, and found it quite unreliable. It suffers from both lockups (engine fails to respond to anything) and lockouts (engine fails to start because each host thinks another host should be running it). Other users have reported additional problems.

      Some of these issues have been at least partially fixed but there are still plenty more to be fixed. In my opinion it’s nowhere near production ready and I certainly won’t use it on a production system.

  11. Hi there!
    This is awesome!!
    One question, I need create the VM XML in each node ? I understood that the img file is saved at the shared storage, but when the VM move to node2, how the node2 know about the cpu, memory, etc ?

    1. Hi Thiago,

      Thanks for stopping by. You can in fact copy over the XML file manually if you like. Or you can do one better – do a test Live Migration of the VM that is running RHEV-M. If you can successfully Live Migrate the VM, then you’re testing the network, the storage, the authentication, and the XML file gets copied automatically.

      Hope this helps,

      Captain KVM

        1. Hi there!

          I did some tests and I got a bug!

          My scenario is:

          virtual01 -> 192.168.0.55
          virtual02 -> 192.168.0.1
          Storage_NFS -> 192.168.0.10
          web01 -> 192.168.100.2 (VM in Storage_NFS)

          The state file is written in /mnt/state.text (Storage_NFS)

          I put the ha.sh in both KVM servers and in the crontab.

          When I disconnect the virtual01 cable, after 1min the VM web01 is migrated to virtual02. OK so far.

          When I back cable of virtual01, the vm web01 back to virtual01, the state.text file is updated with value = 1 but the VM still running on virtual 02.

          The VM web01 are running on both KVM’s servers at the same time!!!

          I guess that you can create a lit bit product to do HA in the git hub and implement some functions like send e-mail with alerts, etc.

          Maybe I can help you! This shell is very usual!!!

          Thank you!

          1. Hi Thiago,

            It’s been a while since I did anything with that script. You’re welcome to do whatever you like with it. If I remember correctly, you let the script move the app back and forth as well as update the status – you don’t update the status or move the app. This was basic functionality of the script; I wouldn’t have posted it if it didn’t work in my lab at the time. Have fun with it!

            Captain KVM

Agree? Disagree? Something to add to the conversation?