pNFS – This is where the rubber meets the road..

Hi folks,

In October of 2012, I wrote a blog post on using pNFS as a means of scaling out your KVM-based virtualization environment. I kept it mostly high level as I wanted it to be more academic than hands on. Today, I’d like to switch that around and make this post more hands on in order to show you how to attach your pNFS client to a pNFS server.

If you remember from the original post, NetApp includes a pNFS server as part of Clustered ONTAP (NetApp’s next-gen storage operating system), and Red Hat includes a pNFS client as part of RHEL 6.2 & 6.3(Tech Preview) and RHEL 6.4 (fully supported). And while the pNFS standard includes access for file, block, and object, I’m only referring to “file” in the context of NetApp.

Let’s tackle them in order…

Server Side (NetApp Clustered Data ONTAP)

We’ll kick things off by showing you the initial view from the NetApp side. I’ve already created a FlexVol named “VeeFourOne” on Virtual Storage Server “rhev_vss” and exported it:

rhev_vss::> volume show -volume VeeFourOne -fields volume,size,type,policy,junction-path
vserver  volume     size type policy junction-path
-------- ---------- ---- ---- ------ -------------
rhev_vss VeeFourOne 50GB RW   pNFS   /VeeFourOne

Let me explain the fields above:

  • “vserver” – this is the logical name of the Virtual Storage Server that is exporting the pNFS share.
  • “volume” – this is the name of the FlexVol that is exported to our pNFS client.
  • “size” – I hope I don’t have to explain this one..
  • “type” – in this case “read/write”
  • “policy” – (I shouldn’t have named it ‘pNFS’ in hindsight as it may confuse folks. This is just a ‘name’ not a declaration.) An export policy is what we use to define who has access to the export as well as what protocol is used (NFS,CIFS,both) and what version (NFS only – 3, 4, 4.1)

This next command shows that our export policy is restricted to the 172.20.45.0/24 subnet and is only exporting in NFS v4 (v4.1 is included).

rhev_vss::> export-policy rule show -policyname pNFS  (vserver export-policy rule how) 
Vserver      Policy Name     Rule    Access   Client Match          RO Rule 
------------ --------------- ------  -------- --------------------- ---------
rhev_vss     pNFS            1       nfs4     172.20.45.0/24        sys

And finally, this last command shows which NFS versions and options are enabled/disabled for the Virtual Storage Server “rhev_vss” overall:

rhev_vss::> nfs show -vserver rhev_vss
                         Vserver: rhev_vss
              General NFS Access: true
                          NFS v3: enabled
                        NFS v4.0: enabled
                    UDP Protocol: enabled
                    TCP Protocol: enabled
             Spin Authentication: disabled
            Default Windows User: -
             NFSv4.0 ACL Support: disabled
 NFSv4.0 Read Delegation Support: disabled
NFSv4.0 Write Delegation Support: disabled
         NFSv4 ID Mapping Domain: defaultv4iddomain.com
   NFSv4.1 Minor Version Support: enabled
                   Rquota Enable: disabled
    NFSv4.1 Parallel NFS Support: enabled
             NFSv4.1 ACL Support: disabled
            NFS vStorage Support: disabled

 Client Side (RHEL 6.4)

Let’s move on and configure our RHEL 6.4 client next. Use the ‘mount’ command with the following options to mount the export:

# mount -o v4.1 172.20.45.50:/VeeFourOne /data

NFS v4.1 is an extension to v4 that includes pNFS (we could have also specified “-t  nfs4 minorversion=1”, but “v4.1” fits better on an /etc/fstab entry line, and the longer option is really used for RHEL 6.2 and 6.3). And the “/VeeFourOne” junction point that I showed earlier is the volume that we are mounting client-side. (don’t forget to make an entry in /etc/fstab if you want to make the mount permanent.)

Verification

So, how do you know for sure that you’re using pNFS, and not just v4.1? Great question, as the venerable “nfsstat -m” command won’t give you the full picture in this case. Let’s look at a couple of different ways:

1. Start with making sure that the proper options came through (remember, this alone does not guarantee that you’re using pNFS, only that you’ve mounted using v4.1):

# nfsstat -m
/data from 172.20.45.50:/VeeFourOne/
 Flags:    rw,relatime,vers=4,rsize=65536,wsize=65536,namlen=255,
hard,proto=tcp,port=0,timeo=600,retrans=2,sec=sys,clientaddr=172.20.45.23,
minorversion=1,local_lock=none,addr=172.20.45.50

2. Next, we check to see if the module is loaded:

# lsmod | grep nfs_layout_nfsv41_files

3. For the true indicator, watch the “LAYOUTGET” counter before and after triggering some I/O:

# nfsstat -c -4
Client rpc stats:
calls      retrans    authrefrsh
60379      8          60379   

Client nfs v4:
null         read         write        commit       open         open_conf    
0         0% 10        1% 10        1% 0         0% 15        1% 0         0% 
open_noat    open_dgrd    close        setattr      fsinfo       renew        
0         0% 0         0% 15        1% 8         0% 9         1% 0         0% 
setclntid    confirm      lock         lockt        locku        access       
0         0% 0         0% 0         0% 0         0% 0         0% 19        2% 
getattr      lookup       lookup_root  remove       rename       link         
21        2% 6         0% 3         0% 5         0% 0         0% 0         0% 
symlink      create       pathconf     statfs       readlink     readdir      
0         0% 0         0% 6         0% 0         0% 0         0% 3         0% 
server_caps  delegreturn  getacl       setacl       fs_locations rel_lkowner  
15        1% 0         0% 0         0% 0         0% 0         0% 0         0% 
exchange_id  create_ses   destroy_ses  sequence     get_lease_t  reclaim_comp 
3         0% 7         0% 2         0% 714      80% 3         0% 3         0% 
layoutget    layoutcommit layoutreturn getdevlist   getdevinfo   ds_write     
0         0% 1         0% 0         0% 0         0% 0         0% 0         0% 
ds_commit    
0         0%

Generate some I/O, then re-run:

# dmesg > /data/dmesg.file
# nfsstat -c -4
Client rpc stats:
calls      retrans    authrefrsh
60379      8          60379   

Client nfs v4:
null         read         write        commit       open         open_conf    
0         0% 10        1% 10        1% 0         0% 15        1% 0         0% 
open_noat    open_dgrd    close        setattr      fsinfo       renew        
0         0% 0         0% 15        1% 8         0% 9         1% 0         0% 
setclntid    confirm      lock         lockt        locku        access       
0         0% 0         0% 0         0% 0         0% 0         0% 19        2% 
getattr      lookup       lookup_root  remove       rename       link         
21        2% 6         0% 3         0% 5         0% 0         0% 0         0% 
symlink      create       pathconf     statfs       readlink     readdir      
0         0% 0         0% 6         0% 0         0% 0         0% 3         0% 
server_caps  delegreturn  getacl       setacl       fs_locations rel_lkowner  
15        1% 0         0% 0         0% 0         0% 0         0% 0         0% 
exchange_id  create_ses   destroy_ses  sequence     get_lease_t  reclaim_comp 
3         0% 7         0% 2         0% 714      80% 3         0% 3         0% 
layoutget    layoutcommit layoutreturn getdevlist   getdevinfo   ds_write     
        0% 1         0% 0         0% 0         0% 0         0% 0         0% 
ds_commit    
0         0%

5. Finally, you can also use wireshark to “sniff” the connection. Look for metadata operations like ‘open’, ‘close’, and ‘getattr’, as well as ‘read’ and ‘write’ activities. Additionally, as seen below, you may also see the occasional ‘layoutget’ (or ‘getdeviceinfo’) as a result of the volume being migrated elsewhere in the cluster.

(use the slider at the bottom of the wireshark output to see the calls at the end of each line)

# tshark -i eth0.3055 -w shark.out
# tshark -r shark.out
[output heavily truncated for brevity]
835 2013-03-14 15:19:08.736835 172.20.45.23 -> 172.20.45.50 NFS V4 COMP Call LOOKUP GETFH GETATTR
836 2013-03-14 15:19:08.737059 172.20.45.50 -> 172.20.45.23 NFS V4 COMP Reply (Call In 835) LOOKUP GETFH GETATTR
1088 2013-03-14 15:19:30.413628 172.20.45.23 -> 172.20.45.50 NFS V4 COMP Call SAVEFH OPEN Unknown
1095 2013-03-14 15:19:30.517494 172.20.45.23 -> 172.20.45.50 NFS V4 COMP Call CLOSE GETATTR
1096 2013-03-14 15:19:30.517772 172.20.45.50 -> 172.20.45.23 NFS V4 COMP Reply (Call In 1095) CLOSE GETATTR
1218 2013-03-14 15:19:38.091827 172.20.45.23 -> 172.20.45.50 NFS V4 COMP Call READDIR
1219 2013-03-14 15:19:38.092290 172.20.45.50 -> 172.20.45.23 NFS V4 COMP Reply (Call In 1218) READDIR
1237 2013-03-14 15:19:39.409572 172.20.45.23 -> 172.20.45.50 NFS V4 COMP Call LAYOUTGET
1238 2013-03-14 15:19:39.409718 172.20.45.50 -> 172.20.45.23 NFS V4 COMP Reply (Call In 1237) LAYOUTGET
1258 2013-03-14 15:19:39.410714 172.20.45.23 -> 172.20.45.50 NFS V4 COMP Call WRITE
1263 2013-03-14 15:19:39.411261 172.20.45.50 -> 172.20.45.23 NFS V4 COMP Reply (Call In 1258) WRITE
2114 2013-03-14 15:20:54.947897 172.20.45.23 -> 172.20.45.50 NFS V4 COMP Call WRITE GETATTR
2115 2013-03-14 15:20:54.948520 172.20.45.50 -> 172.20.45.23 NFS V4 COMP Reply (Call In 2114) WRITE GETATTR
2116 2013-03-14 15:20:54.948676 172.20.45.23 -> 172.20.45.50 NFS V4 COMP Call CLOSE GETATTR
2117 2013-03-14 15:20:54.949110 172.20.45.50 -> 172.20.45.23 NFS V4 COMP Reply (Call In 2116) CLOSE GETATTR
2344 2013-03-14 15:21:06.674644 172.20.45.23 -> 172.20.45.50 NFS V4 COMP Call REMOVE GETATTR
2346 2013-03-14 15:21:06.675512 172.20.45.50 -> 172.20.45.23 NFS V4 COMP Reply (Call In 2344) REMOVE

So that’s really all there is to it for attaching your pNFS client to your pNFS server. And as I mentioned in the original post, pNFS (with Direct I/O like in RHEL 6.4) is really great for virtualization platforms like KVM. It provides predictable performance and makes the scaling out and load balancing at the storage level completely transparent to the hypervisors, virtual machines, and end users.

Hope this helps (and special thanks to Dros Adamson & Ricardo Labiaga)

Captain KVM

(Appendix)

3 thoughts on “pNFS – This is where the rubber meets the road..”

  1. Hi, Jon…thank you for the update regarding pNFS. Have you tried using pNFS with RHEV 3.1? The latest RHEV Hypervisor builds indicated that they are based on RHEL 6.4, so shouldn’t the pNFS code be present? If not, I guess you could always go “thick” on the hypervisor if you need to add it. I don’t currently have a spare hypervisor to create a new NFS domain to try it out, all my domains are iSCSI at the moment. Also, discussion here: http://www.mail-archive.com/users@ovirt.org/msg05963.html
    …indicates that you can pass custom options (v 4.1?) to the nfs mount options used by vdsm. Maybe that would do it? Just curious…

    1. Hey Mike,

      Thanks for dropping by. I’ve not yet tried it out, but if they latest “h” builds are in fact based on RHEL 6.4, then passing the extra options ~should~ work (for thick and thin). Remember, the kernel is identical between thick and thin.. and even if the thick is managed via RHEV-M, you would still have to successfully pass the pNFS options. As a side note, if you enable NFSv4 on the server side, the hypervisor will attempt to connect using v4, regardless of the version. Even an old rhev-h from RHEV 2.2 will attempt (and fail) to connect via NFS v4.

      hope this helps,

      Captain KVM

Agree? Disagree? Something to add to the conversation?