If this is a new blog, how is Captain KVM “revisiting” the topic?
In January of this year, I posted an entry for File System Alignment for Linux VMs on my old blog; I’d like to revisit the topic. So, why would anyone want to worry about something like “file system alignment” in the first place? Simple.
Performance. (You do like performance, right?)
Allow me to illuminate… Running native Linux on local disk presents no issue, but running a virtualized instance of Linux starts to present challenges. As multiple abstraction layers are added between physical disks and virtual disks, there is plenty of room to have partition boundaries and sector boundaries get out of alignment. Look at the layers below:
Storage File System (NFS)
Storage Physical disk
We could actually expand that out even further, but this will suffice. What needs to happen is that the partition boundaries at the top layer need to line up with the blocks on the bottom layer. It’s actually quite complicated, but part of the reason for adding abstraction is to hide the complexities.
NOTE: Newer operating systems such as RHEL 6, Windows Server 2008r2, and Windows 7 align properly by default.
So lets look at this issue using a real life example – a RHEL 5.x VM (KVM, of course!), on a RHEL 5.x host, backed by a NetApp storage. Left to the default configuration of ‘fdisk’, ‘sfdisk’, or ‘parted’, the /boot partition would start at sector 63 and each subsequent partition would start at the next available sector, ensuring that all of the performance you were seeking with your 10GbE & Jumbo Frames would be a waste of time.
Why is this such a problem? Imagine that you have a piece of data that gets written to disk. Normally, this is a perfect fit for NetApp blocks (also 4k in size) – but, a misaligned file system would cause that piece of data to be written across 2 data blocks. Do you see the problem yet? Essentially, this will result in 2 reads or 2 writes for every 1 I/O request. Multiply that times your favorite 7 digit number and imagine all of the extra reads and writes that are occurring because of misalignment. Performance penalties of 40% are not unheard of.
Now lets prevent this issue from even rearing its ugly head. If you are using Linux and KVM, you likely have some kind of provisioning process such as Kickstart. This is arguably the easiest way to prevent any issues that stem from misaligned file systems. The standard Kickstart file has a section for options, packages, an optional %post, and an optional %pre. In the exercise below, we will concern ourselves with the “disk layout” part of the options section and we will implement the optional %pre section. NOTE: The %pre section must be the last section in the Kickstart file.
Let’s look at the %pre section and provide some explanation:
%pre parted /dev/sda mklabel msdos parted /dev/sda mkpart primary ext3 64s 208718s parted /dev/sda mkpart primary 208720s 100% parted /dev/sda set 2 lvm on
We’ve defined the section by starting with the “%pre” directive, then proceeded with 4 “parted” commands: create a disk label, create a partition starting on sector 64 roughly 100mb in size, create another partition that takes up the rest of the disk, and prep the second partition for use with LVM.
So why sector 64, and why sector 208720? The short answer is that they are “alignment friendly” sectors. That is to say that “64” is cleanly divisible by 8 – so, 128 and 2048 would work as well. Don’t pay any attention to the unused sectors in between the partitions. It’s not worth the performance penalty.
Remember though, that is only the first of 2 pieces to configure in the Kickstart file. The second section is the “disk layout”. Let’s look at a layout example:
zerombr yes ##clearpart --linux --drives=sda ## comment out or remove part /boot --fstype ext3 --onpart sda1 part pv.2 —onpart sda2 volgroup VolGroup00 --pesize=32768 pv.2 logvol swap --fstype swap --name=LogVol01 --vgname=VolGroup00 --size=1008 \--grow --maxsize=2016 logvol / --fstype ext3 --name=LogVol00 --vgname=VolGroup00 --size=1024 --grow
The first thing to note is the the “clearpart” directive is commented out. (You could actually remove it altogether.) Leaving it in would undo the magic in the “%pre” section. The next thing to note is that we are using the “–onpart” options to the “part” directive. (The “/dev/sda” can be changed to “/dev/vda” if using the VirtIO driver.) This is just telling Kickstart to use the properly aligned partitions. Everything else is the same.
The magic is illustrated and alignment is optimal!
But what about the extra layers of abstraction you say?
It’s all good. Provided the partition boundaries are set to an “alignment friendly” sector, Linux LVM is itself alignment friendly. And the layers introduced by NetApp are also alignment friendly by default. There is one caveat in dealing with LUNs, though. It’s not exception, mind you, just something to be aware of. Let’s say you have a NetApp LUN presented to a RHEL 5.x server and you want to layer it with LVM. Run your `pvcreate` command against the entire LUN without running `fdisk` or `parted` first. Again, LVM itself is alignment friendly. And unless the LUN is to be carved up into multiple partitions, there is no point in using “parted” or “fdisk”….
But what about the reason for revisiting the topic you say?
- I can never overstate the importance of proper alignment
- I have a new scenario
What if I’m actually SAN booting my Linux box with multi-pathing enabled? The server itself is not virtualized, but the storage is.. How do I take that into consideration?
Let’s say you’re using RHEL 5.x, Centos 5.x, or Fedora (of the same time frame). The “%pre” section remains the same. The disk layout has a slightly different look:
zerombr yes ###clearpart --all --drives=sda part /boot --fstype ext3 --size=100 --onpart=mapper/mpath0p1 part pv.2 --size=0 --grow --onpart=mapper/mpath0p2 volgroup VolGroup00 --pesize=32768 pv.2 logvol swap --fstype swap --name=LogVol01 --vgname=VolGroup00 --size=1008 \--grow --maxsize=2016 logvol / --fstype ext3 --name=LogVol00 --vgname=VolGroup00 --size=1024 --grow
The “–onpart” options to the part directives now point to “mapper/mpath0p1”. Why didn’t we change the “%pre” section? Easy. Multipathing isn’t configured that early in the process, so “%pre” stays the same. All we need to account for are the mpath partitions in the disk layout.
The magic is illustrated and alignment is optimal!
thanks for reading,
For more information on File System Alignment, please review NetApp TR-3747 http://www.netapp.com/us/library/technical-reports/tr-3747.html
For more information on Kickstart please review the “deployment guides” as hosted on http://redhat.com or http://centos.com.