Maximizing your 10GB Ethernet in KVM

Hi folks, it’s been a few weeks since I posted anything. It’s been busy for me as I’m in the middle of the biggest project that I’ve ever lead. It’s got huge implications, and while I’ve dropped some not so vague hints, I want to show you one tiny aspect of it today:

Effective use of 10GbE using VLANs and Channel Bonding

Imagine a brand new interstate highway with a speed limit of 120mph and no painted lanes. This is what a 10GbE pipe is like without VLANs to segregate different traffic types like management, storage, internet access, private application, or VDI. It would be a huge mess, difficult to manage, and most likely insecure.

Now imagine that same new interstate with strictly enforced lanes for high speed commuters, Sunday drivers, interstate commerce, and other assorted road hogs. You get the idea. Makes for a better use of a freeway when tractor trailers and slow drivers aren’t taking up the fast lane and speed freaks aren’t weaving in and out on their import motorcycle.

Channel Bonding Primer

In Linux, channel bonding is a means of abstracting (virtualizing!) network configurations from a physical NICs for the purposes of fault tolerance, improved throughput, or both. And while there are half a dozen or so different channel bonding, we’re going to focus on 2 modes only: round-robin, mode 1 active-passive and mode 4 or aggregate (802.3ad/LACP).

Simply put, mode 1 takes 2 or more physical NICs and “bonds” them into a single logical interface. In mode 1, only 1 link is “active” and the other link(s) are there to take over in case of failure. It’s easy because the network switches don’t need any special configuration.

Like mode 1, mode 4 takes 2 or more physical NICs and “bonds” them into a single logical interface. However, it does so using 802.3ad or Link Aggregation Control Protocol (LACP), which combines the links for a fatter pipe and all links are active. So 2 1GB interfaces become a single 2GB link, and 2 10GbE links become a single 20GbE link. It just takes a little coordination with the networking folks as the network switch ports need to have LACP enabled on them.

VLAN Primer (With Great Power Comes Great Responsibility)

Holy crap, a 20GbE link??? That’s where the VLANs and ‘great responsibility ‘ come in to play. In its simplest terms, a VLAN is a means of carving a single layer 2 broadcast domain into multiple distinct (isolated!!) broadcast domains. In other words, our 20GbE LACP channel bond can then be carved into multiple VLANs, keeping our different traffic types separate and thereby making the best use of our resources. And adding a little bit of security to boot because of the isolation.

Virtual Bridge Primer

A virtual bridge is simply a means of presenting a network device to a virtual machine. That device could be a physical interface, like “eth0”, or it could be a logical device like “bond0” (a channel bond) or “bond0.3080” (a VLAN).

Visualizing the Concepts

If you’re unfamiliar or relatively inexperienced with Channel Bonding, it’s sometimes easier to visualize what it is that you’re trying to accomplish first. This includes choosing the type of channel bond as well as how many VLANs you will need to start things off with. Here’s a diagram to help out:

RHEVH FlexPod Network v3

“Link 1” and “Link 2” above are the fibre cables that plug into the dual port 10GbE interface represented by “NIC 1” and “NIC 2”. They are then configured as a single channel bond, “bond0”. That bond essentially represents a 20GbE link, so we carve it into 3 VLANs that we immediately turn into virtual bridges.

Now let’s go configure this!

Configure Channel Bonding and 10GbE in RHEL+KVM

1. First we prep our NICs, eth0 and eth1:

[root@infra-host-1 network-scripts]# cat ifcfg-eth0
DEVICE="eth0"
BOOTPROTO="none"
HWADDR="00:00:00:AA:0A:0F"
NM_CONTROLLED="no"
ONBOOT="yes"
TYPE="Ethernet"
UUID="3f466c1b-a9cc-4bad-8729-4582e597dcc9"
SLAVE=yes
MASTER=bond0
MTU=9000

Notice that we added “MASTER” and “SLAVE” variables to the configuration files, and also configured Jumbo Frames. We do the same configuration change to “eth1”.

2. Next, we create our channel bond, bond0:

[root@infra-host-1 network-scripts]# cat ifcfg-bond0
DEVICE=bond0
ONBOOT=yes
BOOTPROTO=none
BONDING_OPTS="mode=4 miimon=100"
MTU=9000

This is also where we dictate the channel bond type.

3. Next, we create the VLAN (remember this has to match the actual network), that also serves as the basis for our virtual bridge:

[root@infra-host-1 network-scripts]# cat ifcfg-bond0.3080
DEVICE=bond0.3080
VLAN=yes
BOOTPROTO=static
ONBOOT=yes
BRIDGE=br3080
MTU=1500

The “.3080” extension and “VLAN=yes”  are what makes this a VLAN.

Feel free to create additional VLANs the same way for NFS traffic, iSCSI traffic, management traffic, or VM traffic.

4. Finally, we give our virtual bridge an IP:

[root@infra-host-1 network-scripts]# cat ifcfg-br3080
DEVICE=br3080
TYPE=Bridge
BOOTPROTO=static
ONBOOT=yes
IPADDR=172.20.80.45
NETMASK=255.255.255.0
DELAY=0
MTU=1500

Notice that the type is “Bridge”, not “Ethernet” here, and we forced a normal MTU size here. (Having the big pipe use Jumbo Frames allows us the choice at the individual device level.)

5. Restart the network service.

service network restart

When you create a VM with RHEL6+KVM, this is what is presented to the guest operating system:

bridge

But the VM just sees it as “eth0”….

So that’s my post for the week; keep your eyes out for the next post, as I’ll show you how to do the same thing in RHEV!

Hope this helps,

Captain KVM

38 thoughts on “Maximizing your 10GB Ethernet in KVM”

    1. Hi Dim,

      Thanks for adding to the conversation. I do remember that mode 6 won’t work under a bridge, I forgot about mode 0. I’ll change that in the post, pronto!

      Captain KVM

  1. I’ve been doing this as well, but with the vlans and bonds the other way around.
    For example:
    NIC1 -> NIC1.3080 -> bond0 -> bridge
    Do you see any reason to favor one approach over the other?

    1. Hi,

      Thanks for stopping by! If you’re only ever going to have 1 bond and 1 VLAN, I would think there isn’t an issue with the way you’re doing it.. However, if you only have 2 NICs, and you want to add additional VLANs, you’ll have to do it the way I described.

      Captain KVM

      1. I’m not sure I follow. What I do for multiple vlans/bonds is for example:
        NIC1 -> NIC1.3080 -> bond0 -> br3080
        NIC2 -> NIC2.3080 -> bond0 -> br3080
        NIC1 -> NIC1.2288 -> bond1 -> br2288
        NIC2 -> NIC2.2288 -> bond1 -> br2288

        By the way, I personally use ARP monitoring instead of miimon. I tend to find that the majority of real-life network issues won’t be noticed by miimon, but are by ARP monitoring.

        1. Hi Tentacle,

          It comes down to personal preference, number of steps, and pieces to manage:

          Your method to add additional VLANs (assuming initial config is done):
          1. Create 2 more VLANs
          2. Create a new bond
          3. Create a new bridge

          That’s 4 new files to create (including a new bond) each time you need a new VLAN/Bridge

          My method to add additional VLANs (assuming initial config is done):
          1. Create new VLAN
          2. Create new bridge

          That’s 2 new files and 1 less step

          If your way works for you, stick with it, again it comes down to personal preference.

          Captain KVM

          1. Unfortunately, I realized that with switches that don’t let you have an untagged vlan on a port alongside tagged vlans (like anything Cisco), it means you can’t use arp monitoring.

            So sadly I’ll have to stick to my less streamlined approach.

  2. Thanks for putting this together Jon. The diagram was awesome! It helped me come to grips with the whole process.

    I’m just curious: why did you assigned an ip to the virtual bridge? Is it that you want your host to have presence on that network? If so, any particular reason for that? Do you also assign IP addresses for each one of the other vBridges (for NFS, iSCSI etc.)?

    I guess my real question is: In what situations do you assign an ip to the vBridge? The only reason I can think of is for the VLAN corresponding to the management network ..so can I manage my system (ssh to it etc). But then… I could just assign an ip to the Bonded VLAN logical interface (without having to create the final vBridge). This is the confusing part for me 🙁

    Again, thanks for this excellent tutorial. Looking forward to the RHEV version!

    Best regards,
    Jorge

    1. Hi Jorge,

      Thanks for taking the time to read the article and post a question. Assigning an IP address to the bridge is really based on Red Hat’s best practices. It comes down to either giving the bridge an IP or giving the physical Ethernet device (the one being used for the bridge) an IP – and assigning the physical device an IP will result in spurious/unexpected behaviors.

      hope this helps,

      Captain KVM

  3. Hello,

    first of all: I’m a complete newbee with KVM. I stumbeled upon it with Proxmox VE (www.proxmox.com) a few days ago.

    As far as i understood, your tutorial is about connecting VMs with physical 10gb ethernet connections.

    I’m looking for a way to connect two VMs within the same server with a virtual 10gb (or just CPU-limited) ethernet connection. I do not have physical 10gb ethernet connections.

    Do you think, that is possible? If so, can you give me a clue, please?

    Best regards
    minasmorgul

    1. Hi,

      When you install a KVM host, there is a private network (192.168.0.0/24) enabled by default. Your VMs will be able to communicate with each other there. If you need to extend functionality to get 2 way traffic for your VMs (access to internet AND access from internet) then you will need to set up 1 or more virtual bridges. There is plenty of documentation in the Fedora, RHEL, and CentOS sites as well as technical reports that I’ve written for NetApp (available publicly on the NetApp tech library site).

      Captain KVM

  4. Hello,

    i wasn’t aware the default private network (192.168.0.0/24), but created an additional virtual bridge, not attached to any physical ethernet connection.

    The VMs are able to communicate with each other via that virtual bridge, but wont go faster than gigabit adapters.

    Do i have to configure something special to get faster than that?

    Best regards
    minasmorgul

  5. Hello,

    pardon me, but you don’t got me.

    I don’t want to accelerate physical 1GB adaptors,

    I want to create a virtual 10GB Network, just two VMs, within one Server, talking with each other at highest possible speed.

    Best regards
    minasmorgul

    1. Minasmorgul,

      No, you cannot create a “virtual 10GbE network”. Two VM’s communicating on the same host will go as fast as the server hardware allows. It depends on CPU, memory, disk speeds, and bus speeds.

      /ck

  6. Hello,

    saying it in other words:

    I’d like to create an ‘host only – infiniband – vmbr’

    Is that Possible?

    Best regards
    minasmorgul

    1. Minasmorgul,

      I am not an Infiniband expert, but my research shows that virtualizing IB is still theoretical at best. I don’t see virtualized IB supported on any hypervisor. What I have found are IB drivers for hypervisors that allow for pass through, allowing a VM direct access to a hardware-based IB network.

      /ck

  7. ck,

    1) have a look here:

    http://pve.proxmox.com/wiki/Infiniband

    I suppose that is what you call IB drivers for hypervisors, right?

    2) You wrote above: “Two VM’s communicating on the same host will go as fast as the server hardware allows. It depends on CPU, memory, disk speeds, and bus speeds.”

    Does that imply, that if creating a simple vmbr and attaching two VMs (with virtio drivers of course), the communication will exceed 1gb by maxing out the hardware?

    Unfortunately that doesn’t happen so far and the server does have enough vapour (16 cores).

    In my prospect there is no 1gb limit then, because there is no physical device involved – acting as a bottleneck.

    Best regards
    minasmorgul

    1. Minasmorgal,

      I took a look at the link you provided (thank you), and even that article stated the need for an external IB switch (under ‘Subnet Manager’). And as far as going faster than 1GB internally, I wasn’t suggesting that it would be faster, unless it was a very high end server.

      /ck

  8. I know this forum is a bit old, however I don’t understand how your bond is communicating to your VLAN’s with not nothing defined in Bond0 for a BRIDGE=”X” or something of that nature.. I’ve been having a heck of time with a setup similar to this and I cannot get it working.

    My setup was as follows…

    Nic1 \ / bond0.2000 –> br_mgmt
    —> Bond0 –> br_private–>
    Nic2 / \ bond0.2000 –> br_storage

    My question is, how do i cut out br_private and just make bon0 see the two VLANs that are forked off it.

    Thanks

    1. Hi Bob,

      I apologize for the delayed response; I’ve been away for a bit. In my example, I do actually have a “BRIDGE=xxx” defined, but not until after the VLAN. The config is: NICs into Bond, Bond carved into VLANs, each VLAN becomes a bridge. The “BRIDGE” variable shows up in the “bond.vlan” file. It’s not the only way to do things, I just think it’s the easiest for when you need to deploy more VLANs as you don’t have to take any interfaces down.

      Looking at your config, it looks like you have 2 NICs bonded, but only 1 VLAN to cover 2 or 3 networks. If you’re ok with having storage and mgmt on the same VLAN (it’s fine, they should both be private and mgmt traffic won’t take much away from storage bandwidth), then your “BRIDGE=” variable should show up in the “bond.2000” config file, pointing at “br_MGandSTOR” or some other name. If you want 2 VLANs, then go for 2000 and 2001, or similar. Don’t have a br_mgmt and a br_storage both off of bond0.2000.. Do bond0.2000 -> br_mgmt and bond0.2001 -> br_storage.

      hope this helps,

      Captain KVM

  9. Hi,

    what I want to do is a KVM hypervisor with a NAS holding multiple gigabit links to store the virtual machines. My concern is to have one VM to be able to use multiple links at the same time. If I understand well your article, in using mode 4 for bonding, even with layer3+4 load balancing, a VM will be stick to one gigabit link from/to the NAS.

    The problem is that our VM are very I/O intensive and one gigabit link is no enough to supply bandwidth. Any hint on this ? (we’ve no budget to buy 10Gbits equipment)

    1. Hi there,

      Thanks for stopping by. If you use multiple NICs on your VM along with mode 4 bonding, you’re going to get the benefit of aggregate bandwidth. How much? Only testing will tell for sure as it depends on your application(s) and environment. You can certainly do a KVM hypervisor with multiple 1GbE links to NAS (NFS hopefully). Your testing will be very important as you may find that virtualizing your application is ok, or you may find that you can’t virtualize it until you have the budget to update the network.

      Captain KVM

      1. Captain,

        You say “…NFS hopefully”. Is NFS preferred over iSCSI for KVM?

        I’ve been trying to figure out the best setup that covers backup/migration for business continuity while maintaining performance for users. I’m usure if I should create my Guest locally or on the NAS(NFS or iSCSI) and how I should configure my available ports.

        My Hardware…
        NAS-QNAP TS-1079-Pro with 4-GbE. Currently been used for file sharing(CIFS). Supports VLAN, Jumbo Frames, Port Trunking, NFS, iSCSI, and Service Binding.

        SERVER-DIY SuperMicro X9SCM-IFF, XEON-E3v2, 32GB ECC, with 2-1GbE plus 1 dedicated GbE for IPMI

        TEST/DR SPARE- HPZ210, XEON-E3, 32GB ECC, with 3-GbE

        I see all the possible configurations, but which are preferred/best practices?

        Any suggestions would be a great help.

        1. Hi Patrick,

          NFS is typically the easiest to deploy and manage for KVM-based virtualization. And if you’ve got ‘enterprise’ grade NFS, like NetApp, there are typically other features that make it even more attractive. In your case your “NAS” seems limited to “CIFS”, so I would stick to iSCSI. There is nothing wrong with iSCSI. As far as best practices, I would use VLANs to segregate your storage, management, and VM traffic. I would also go with link aggregation on the storage connections and test with Jumbo Frames.

          Captain KVM

          1. Captain,

            The QNAP supports NFS, iSCSI, CIFS, and AFP.

            OK, so NFS for ease of deployment. Are there use cases where iSCSI would be better?

            As for link aggregation on the NAS, are you recommending aggregating all 4 GbE links into one? What about service binding; any benefit to isolating NFS or iSCSI to 2 GbE links and have the other 2 for everything else (versus 1 large pipe)?

          2. Patrick,

            If you linked all 4, then carved them into individual VLANs, you would be in good shape. iSCSI vs NFS really comes down to what you are used to, although NFS takes nothing to learn.

            Captain KVM

  10. Would be great to also see some recommendations for specific settings especially for specific transport. For example sysctl tweaks, iscsi.conf tweaks, multipath tweaks (you guessed it, I use iSCSI extensively).
    I realise these are probably very vendor and environment specific, but a list of params to tweak and recommended ways of finding the optimal settings would be more than welcome.

    I remember back in the day, when I was playing with samba, and I found an article describing how to use tcpdump to determine the avg MTU and MSS on a network and calculate the optimal send and receive bauffers, which boosted my samba file transfer speeds to 7x the speeds of a windows server, something along the same lines for 10G iSCSI/NFS would be extremely useful.

    1. Thanks for stopping by! Yeah, I agree, recommendations would be great. But you are also correct in that it is very vendor and environment specific.

      Captain KVM

  11. Hello
    I have been reading your blog!
    I have, I believe, a basic question for Pro’s.

    … 4. Finally, we give our virtual bridge an IP…..
    This IP you defined will this bee the IP Address of a virtual Server, or will this Address be the Gateway Address of a virtual Server?

    Thanks for your feedback.
    have a nice day
    vinc

    1. Hi Vinc,

      Thanks for stopping by. In step 4, the IP defined is the IP for that interface on the the Hypervisor. It is not a default gateway, it will provide network access to the virtual guests.

      Captain KVM

  12. @Captain
    thanks for your feedback, I think this is the point I am getting lost.
    the Virtual guest would be a virtual Server, I believe?!
    or do you have a net design to imagine this?
    have a nice day
    vinc

    1. Hi Vinc,

      Yes, virtual guest = virtual server = guest os. So in the diagram in that article, step number 4 applies an IP address to the “vbridge” in the diagram. The “vbridge” or virtual bridge is what allows 2 way access to KVM guests. By default, the virtual network that exists only allows traffic to go out from the guests, as it is one-way Network Address Translation (NAT). But the virtual bridge allows for 2 way traffic.

      Captain KVM

Agree? Disagree? Something to add to the conversation?