None

Highly available NFS server


Setup an Active/Passive topology with common storage for shared data.

By Kostas Koutsogiannopoulos

Environment

Our topology consists of 2 CentOS 7 servers (Active-Passive).

Both nodes are using one common (shared) disk for data:

+------+-------------------------+---LAN+
       ^                         ^
       |                         |
 +-----+-----+  heartbeat  +-----+-----+
 |nfs-server1|<----------->|nfs-server2|
 +-----+-----+             +-----+-----+
       |          COMMON         |
       |         /------\        |
       |        |  XFS   |       |
       |        |  LVM   |       |

       +--------+  VG    +-------+
                |  PV    |
                 \------/
                   DISK

The active one need to be able to serve the data to clients on network as nfs share.

We will begin setting up nfs server on a single server "nfs-server1"

$ cat /etc/redhat-release
CentOS Linux release 7.5.1804 (Core)

Install packages

Beginning with minimal CentOS installation we will need the fowllowing packages for NFS and cluster setup:

$ sudo yum update

$ sudo yum install nfs-utils

$ sudo yum groupinstall 'High Availability'

Create shared directory

We will create the /nfsdata directory that will live on root filesystem for some testing. On cluster setup later, this will be the mount point for the filesystem that will be mounted on active node.

$ sudo mkdir /nfsdata

$ sudo chmod -R 755 /nfsdata

$ sudo chown nfsnobody:nfsnobody /nfsdata

Manage firewall

These are the firewall exceptions for inter-cluster communication and NFS service  to work correctly. Note that we may need extra rules for fencing functionality later:

$ sudo firewall-cmd --permanent --zone=public --add-service=nfs
success
$ sudo firewall-cmd --permanent --zone=public --add-service=mountd
success
$ sudo firewall-cmd --permanent --zone=public --add-service=rpc-bind
success
$ sudo firewall-cmd --permanent --add-service=high-availability
success
$ sudo firewall-cmd --reload
success

Test nfs service on one server alone

On your nfs server start nfs service (do not enable it for start at boot, we will configure it at our cluster setup later):

$ sudo systemctl start nfs-server

Add the following line in /etc/exports file (192.168.16.14 is the IP of a client):

/nfsdata            192.168.16.14(rw,sync,no_root_squash,no_all_squash)

 

Restart nfs-server service:

$ sudo systemctl restart nfs-server

Now go to the client (in our case 192.168.16.14) and try to mount the remote nfs share:

$ sudo mount -t nfs nfs-server1:/nfsdata /mnt/nfsdata

After succesfull mount you can remove everything from /etc/exports file and stop nfs service:

$ sudo systemctl stop nfs-server

We try always to manage clustered resource configuration via cluster resource agents that are configured globally (as cluster configuration). If we need to configure something on any node separately, then we need a mechanism to replicate that configuration on every other node. This is making things more complex and beyond article's scope.

Set password for the user hacluster

The user "hacluster" is created with "High Availability" packages installation and will be used for internal communication between nodes. Lets set a password for that user.

$ sudo passwd hacluster
Changing password for user hacluster.
New password:
Retype new password:
passwd: all authentication tokens updated successfully.

Cloning the node

Now it is a good time to clone nfs-server1 creating nfs-server2.

After cloning and before boot, we need to:

  • Attach a new disk on both nodes (common disk). This disk will host our data that will be nfs-shared over the network.
  • (optional) Attach a new network card on both nodes dedicated to inter-cluster communication (heartbeat).

Then we can boot nfs-server2, change it's hostname running:

$ sudo hostnamectl set-hostname nfs-server2

...and if you have static network setup, configure network interfaces.

Lets manage our shared disk now as long as we have only nfs-server2 up.

We will create an xfs filesystem over LVM:

# pvcreate /dev/vdb
  Physical volume "/dev/vdb" successfully created.

# vgcreate NFS_SHARED /dev/vdb
  Volume group "NFS_SHARED" successfully created.

# lvcreate -l 100%FREE -n NFS_LVM NFS_SHARED
  Logical volume "NFS_LVM" created.

# mkfs.xfs /dev/mapper/NFS_SHARED-NFS_LVM
meta-data=/dev/mapper/NFS_SHARED-NFS_LVM isize=512    agcount=4, agsize=196352 blks
         =                       sectsz=512   attr=2, projid32bit=1
         =                       crc=1        finobt=0, sparse=0
data     =                       bsize=4096   blocks=785408, imaxpct=25
         =                       sunit=0      swidth=0 blks
naming   =version 2              bsize=4096   ascii-ci=0 ftype=1
log      =internal log           bsize=4096   blocks=2560, version=2
         =                       sectsz=512   sunit=0 blks, lazy-count=1
realtime =none                   extsz=4096   blocks=0, rtextents=0

At last, we can boot nfs-server1 running some tests:

  • Make sure that our hostnames are resovable on each one of them.
  • Make sure that our nodes can communicate with each other (via heartbeat network interfaces)
  • Make sure that every node can display the "NFS_SHARED" Logical volume: # lvdisplay NFS_SHARED
  • Make sure that every node can display the filesystem: # blkid /dev/mapper/NFS_SHARED-NFS_LVM
  • Dont try to mount the shared filesystem on more than one node at the same time because you may end up with a corrupted filesystem.

Now we are starting, enabling cluster services on both nodes:

[root@nfs-server1 ~]# systemctl start pcsd.service

[root@nfs-server1 ~]# systemctl enable pcsd.service

[root@nfs-server2 ~]# systemctl start pcsd.service

[root@nfs-server2 ~]# systemctl enable pcsd.service

Creating cluster

We have both of our nodes up 'n running with a shared disk attached to both of them and capable to communicate with each other.

From now on we can apply cluster configuration globally. This means that we can run the commands bellow on any of the cluster nodes. The cluster will take over to configure every member.

So we can monitor configuration on any node with a command like:

# watch pcs status

... and execute everything on another node:

Cluster authorization using "hacluster" username, password we set before cloning:

# pcs cluster auth nfs-server1 nfs-server2
Username: hacluster
Password:
nfs-server1: Authorized
nfs-server2: Authorized


Cluster creation:

# pcs cluster setup --start --name nfs-server-cluster nfs-server1 nfs-server2
Destroying cluster on nodes: nfs-server1, nfs-server2...
nfs-server1: Stopping Cluster (pacemaker)...
nfs-server2: Stopping Cluster (pacemaker)...
nfs-server1: Successfully destroyed cluster
nfs-server2: Successfully destroyed cluster

Sending 'pacemaker_remote authkey' to 'nfs-server1', 'nfs-server2'
nfs-server1: successful distribution of the file 'pacemaker_remote authkey'
nfs-server2: successful distribution of the file 'pacemaker_remote authkey'
Sending cluster config files to the nodes...
nfs-server1: Succeeded
nfs-server2: Succeeded

Starting cluster on nodes: nfs-server1, nfs-server2...
nfs-server1: Starting Cluster...
nfs-server2: Starting Cluster...

Synchronizing pcsd certificates on nodes nfs-server1, nfs-server2...
nfs-server1: Success
nfs-server2: Success
Restarting pcsd on the nodes in order to reload the certificates...
nfs-server1: Success
nfs-server2: Success

Enable cluster services to auto-run on boot:

# pcs cluster enable --all
nfs-server1: Cluster Enabled
nfs-server2: Cluster Enabled

Creating clustered resources

For a complete NFS server, we need at least the following resources:

  • Filesystem on common disk
  • NFS service
  • NFS export configuration
  • Virtual (cluster) IP address for the active node to listen

Lets configure them one by one:

Filesystem resource:

# pcs resource create SharedFS Filesystem device=/dev/mapper/NFS_SHARED-NFS_LVM  directory=/nfsdata fstype=xfs --group nfsresourcegroup
Assumed agent name 'ocf:heartbeat:Filesystem' (deduced from 'Filesystem')

NFS service resource:

# pcs resource create NFSService nfsserver nfs_shared_infodir=/nfsdata/nfsinfo --group nfsresourcegroup
Assumed agent name 'ocf:heartbeat:nfsserver' (deduced from 'nfsserver')

NFS export resource:

# pcs resource create NFSExport exportfs clientspec="192.168.16.0/24" options=rw,sync,no_root_squash,no_all_squash directory=/nfsdata fsid=0 --group nfsresourcegroup
Assumed agent name 'ocf:heartbeat:exportfs' (deduced from 'exportfs')

Virtual IP resource(192.168.16.100):

# pcs resource create VIP IPaddr2 ip=192.168.16.100 cidr_netmask=24 --group nfsresourcegroup
Assumed agent name 'ocf:heartbeat:IPaddr2' (deduced from 'IPaddr2')

In order to be more detailed, we will configure one extra resource. This will send NFSv3 reboot notifications to every NFS client in case of resource movement, right after NFS service is initialized on the new active node.

NFS notify resource:

# pcs resource create NFSNotify nfsnotify source_host=192.168.16.100 --group nfsresourcegroup
Assumed agent name 'ocf:heartbeat:nfsnotify' (deduced from 'nfsnotify')

Ordering Constraints

Last but not least, we need some order constraints that will enforce our resources to start in specific order (and stop in reverse):

# pcs constraint order SharedFS then NFSService

Adding SharedFS NFSService (kind: Mandatory) (Options: first-action=start then-action=start)

# pcs constraint order NFSService then NFSExport

Adding NFSService NFSExport (kind: Mandatory) (Options: first-action=start then-action=start)

# pcs constraint order NFSExport then VIP

Adding NFSExport VIP (kind: Mandatory) (Options: first-action=start then-action=start)

# pcs constraint order set NFSNotify require-all=true

After all these we have the following situation:

# pcs status
Cluster name: nfs-server-cluster
WARNING: no stonith devices and stonith-enabled is not false
Stack: corosync
Current DC: nfs-server2 (version 1.1.18-11.el7_5.2-2b07d5c5a9) - partition with quorum
Last updated: Sat Jun  2 12:54:13 2018
Last change: Sat Jun  2 12:51:51 2018 by root via cibadmin on nfs-server1

2 nodes configured
5 resources configured

Online: [ nfs-server1 nfs-server2 ]

Full list of resources:

 Resource Group: nfsresourcegroup
     SharedFS    (ocf::heartbeat:Filesystem):    Stopped
     NFSService    (ocf::heartbeat:nfsserver):    Stopped
     NFSExport    (ocf::heartbeat:exportfs):    Stopped
     VIP    (ocf::heartbeat:IPaddr2):    Stopped
     NFSNotify    (ocf::heartbeat:nfsnotify):    Stopped

Daemon Status:
  corosync: active/enabled
  pacemaker: active/enabled
  pcsd: active/enabled

You noticed that the cluster is keeping all the configured resources stopped. This is happening because we have not configured any fencing resource yet. You can see a clear warning for that on the output above:

WARNING: no stonith devices and stonith-enabled is not false

Using command:

# pcs property set stonith-enabled=false

... you can disable fencing -for now- and every resource will immediately start up:

# pcs status
Cluster name: nfs-server-cluster
Stack: corosync
Current DC: nfs-server2 (version 1.1.18-11.el7_5.2-2b07d5c5a9) - partition with quorum
Last updated: Sat Jun  2 13:06:15 2018
Last change: Sat Jun  2 13:04:32 2018 by root via cibadmin on nfs-server1

2 nodes configured
5 resources configured

Online: [ nfs-server1 nfs-server2 ]

Full list of resources:

 Resource Group: nfsresourcegroup
     SharedFS    (ocf::heartbeat:Filesystem):    Started nfs-server1
     NFSService    (ocf::heartbeat:nfsserver):    Started nfs-server1
     NFSExport    (ocf::heartbeat:exportfs):    Started nfs-server1
     VIP    (ocf::heartbeat:IPaddr2):    Started nfs-server1
     NFSNotify    (ocf::heartbeat:nfsnotify):    Started nfs-server1

Daemon Status:
  corosync: active/enabled
  pacemaker: active/enabled
  pcsd: active/enabled

Some testing

Now that we have a highly available NFS service, we can run some maintenance on our machines with -almost- no downtime.

For example a possible scenario is starting with the passive(inactive) node:

  1. [root@nfs-server2 ~]# pcs node standby
  2. ... maintenance actions on nfs-server2...
  3. [root@nfs-server2 ~]# pcs node unstandby
  4. [root@nfs-server1 ~]# pcs node standby
  5. ... wait for resources to move and run maintenance actions on nfs-server1...
  6. [root@nfs-server1 ~]# pcs node unstandby

If you had a client connected to nfs service on virtual IP during maintenance (during resource moving to be more specific) you could do some testing, for example:

[user@TestPC ~]# for i in {1..10}; do time touch /mnt/nfsdata/test; sleep 1; echo $?; done

real    0m0.033s
user    0m0.000s
sys    0m0.002s
0

real    0m0.016s
user    0m0.001s
sys    0m0.000s
0

real    0m0.006s
user    0m0.000s
sys    0m0.001s
0

real    0m0.008s
user    0m0.000s
sys    0m0.001s
0

real    0m0.031s
user    0m0.000s
sys    0m0.001s
0

real    0m7.724s
user    0m0.000s
sys    0m0.001s
0

real    0m0.007s
user    0m0.000s
sys    0m0.001s
0

real    0m0.008s
user    0m0.001s
sys    0m0.000s
0

real    0m0.007s
user    0m0.000s
sys    0m0.001s
0

real    0m0.032s
user    0m0.000s
sys    0m0.001s
0

As you see, the sixth try to touch /mnt/nfsdata/test file, took more time than usual because of a little resource unavailability but eventually, completed succesfully (return code 0). Of cource you can try more IO intensive operations (like watching your favorite movie while patching your NAS servers). You will have only delays as long as nfs client - server are able to handle pending transactions after initialization on the new active node.

 

We built and tested our cluster running some controlled actions. We marked an active node as standby and watched every resource gracefully stop on this node and start on the other. But what happens when a failure occurs? Is automatic failover always possible?

Fencing

Fencing is the most important mechanism of a cluster during a failure. It guarantees that the node that is going to take over, will start clean and the malfunctioning node is really isolated.

Fencing, in linux clusters is implemented by "stonith" (Shoot The Other Node In The Head) resources. The exact implementation totally depends on your infrastucture. There are multiple "fence agents" that will do their job on specific environments (physical/virtual machines, software/hardware fencing devices etc).

On another article we will add fencing functionality to our cluster, running the servers as virtual machines on KVM hypervisors.


View epilis's profile on LinkedIn Visit us on facebook X epilis rss feed: Latest articles