Highly available NFS server
Setup an Active/Passive topology with common storage for shared data.
Environment
Our topology consists of 2 CentOS 7 servers (Active-Passive).
Both nodes are using one common (shared) disk for data:
+------+-------------------------+---LAN+
^ ^
| |
+-----+-----+ heartbeat +-----+-----+
|nfs-server1|<----------->|nfs-server2|
+-----+-----+ +-----+-----+
| COMMON |
| /------\ |
| | XFS | |
| | LVM | |
+--------+ VG +-------+
| PV |
\------/
DISK
The active one need to be able to serve the data to clients on network as nfs share.
We will begin setting up nfs server on a single server "nfs-server1"
$ cat /etc/redhat-release
CentOS Linux release 7.5.1804 (Core)
Install packages
Beginning with minimal CentOS installation we will need the fowllowing packages for NFS and cluster setup:
$ sudo yum update
$ sudo yum install nfs-utils
$ sudo yum groupinstall 'High Availability'
Create shared directory
We will create the /nfsdata directory that will live on root filesystem for some testing. On cluster setup later, this will be the mount point for the filesystem that will be mounted on active node.
$ sudo mkdir /nfsdata
$ sudo chmod -R 755 /nfsdata
$ sudo chown nfsnobody:nfsnobody /nfsdata
Manage firewall
These are the firewall exceptions for inter-cluster communication and NFS service to work correctly. Note that we may need extra rules for fencing functionality later:
$ sudo firewall-cmd --permanent --zone=public --add-service=nfs
success
$ sudo firewall-cmd --permanent --zone=public --add-service=mountd
success
$ sudo firewall-cmd --permanent --zone=public --add-service=rpc-bind
success
$ sudo firewall-cmd --permanent --add-service=high-availability
success
$ sudo firewall-cmd --reload
success
Test nfs service on one server alone
On your nfs server start nfs service (do not enable it for start at boot, we will configure it at our cluster setup later):
$ sudo systemctl start nfs-server
Add the following line in /etc/exports file (192.168.16.14 is the IP of a client):
Restart nfs-server service:
$ sudo systemctl restart nfs-server
Now go to the client (in our case 192.168.16.14) and try to mount the remote nfs share:
$ sudo mount -t nfs nfs-server1:/nfsdata /mnt/nfsdata
After succesfull mount you can remove everything from /etc/exports file and stop nfs service:
$ sudo systemctl stop nfs-server
We try always to manage clustered resource configuration via cluster resource agents that are configured globally (as cluster configuration). If we need to configure something on any node separately, then we need a mechanism to replicate that configuration on every other node. This is making things more complex and beyond article's scope.
Set password for the user hacluster
The user "hacluster" is created with "High Availability" packages installation and will be used for internal communication between nodes. Lets set a password for that user.
$ sudo passwd hacluster
Changing password for user hacluster.
New password:
Retype new password:
passwd: all authentication tokens updated successfully.
Cloning the node
Now it is a good time to clone nfs-server1 creating nfs-server2.
After cloning and before boot, we need to:
- Attach a new disk on both nodes (common disk). This disk will host our data that will be nfs-shared over the network.
- (optional) Attach a new network card on both nodes dedicated to inter-cluster communication (heartbeat).
Then we can boot nfs-server2, change it's hostname running:
$ sudo hostnamectl set-hostname nfs-server2
...and if you have static network setup, configure network interfaces.
Lets manage our shared disk now as long as we have only nfs-server2 up.
We will create an xfs filesystem over LVM:
# pvcreate /dev/vdb
Physical volume "/dev/vdb" successfully created.
# vgcreate NFS_SHARED /dev/vdb
Volume group "NFS_SHARED" successfully created.
# lvcreate -l 100%FREE -n NFS_LVM NFS_SHARED
Logical volume "NFS_LVM" created.
# mkfs.xfs /dev/mapper/NFS_SHARED-NFS_LVM
meta-data=/dev/mapper/NFS_SHARED-NFS_LVM isize=512 agcount=4, agsize=196352 blks
= sectsz=512 attr=2, projid32bit=1
= crc=1 finobt=0, sparse=0
data = bsize=4096 blocks=785408, imaxpct=25
= sunit=0 swidth=0 blks
naming =version 2 bsize=4096 ascii-ci=0 ftype=1
log =internal log bsize=4096 blocks=2560, version=2
= sectsz=512 sunit=0 blks, lazy-count=1
realtime =none extsz=4096 blocks=0, rtextents=0
At last, we can boot nfs-server1 running some tests:
- Make sure that our hostnames are resovable on each one of them.
- Make sure that our nodes can communicate with each other (via heartbeat network interfaces)
- Make sure that every node can display the "NFS_SHARED" Logical volume:
# lvdisplay NFS_SHARED
- Make sure that every node can display the filesystem:
# blkid /dev/mapper/NFS_SHARED-NFS_LVM
- Dont try to mount the shared filesystem on more than one node at the same time because you may end up with a corrupted filesystem.
Now we are starting, enabling cluster services on both nodes:
[root@nfs-server1 ~]# systemctl start pcsd.service
[root@nfs-server1 ~]# systemctl enable pcsd.service
[root@nfs-server2 ~]# systemctl start pcsd.service
[root@nfs-server2 ~]# systemctl enable pcsd.service
Creating cluster
We have both of our nodes up 'n running with a shared disk attached to both of them and capable to communicate with each other.
From now on we can apply cluster configuration globally. This means that we can run the commands bellow on any of the cluster nodes. The cluster will take over to configure every member.
So we can monitor configuration on any node with a command like:
# watch pcs status
... and execute everything on another node:
Cluster authorization using "hacluster" username, password we set before cloning:
# pcs cluster auth nfs-server1 nfs-server2
Username: hacluster
Password:
nfs-server1: Authorized
nfs-server2: Authorized
Cluster creation:
# pcs cluster setup --start --name nfs-server-cluster nfs-server1 nfs-server2
Destroying cluster on nodes: nfs-server1, nfs-server2...
nfs-server1: Stopping Cluster (pacemaker)...
nfs-server2: Stopping Cluster (pacemaker)...
nfs-server1: Successfully destroyed cluster
nfs-server2: Successfully destroyed cluster
Sending 'pacemaker_remote authkey' to 'nfs-server1', 'nfs-server2'
nfs-server1: successful distribution of the file 'pacemaker_remote authkey'
nfs-server2: successful distribution of the file 'pacemaker_remote authkey'
Sending cluster config files to the nodes...
nfs-server1: Succeeded
nfs-server2: Succeeded
Starting cluster on nodes: nfs-server1, nfs-server2...
nfs-server1: Starting Cluster...
nfs-server2: Starting Cluster...
Synchronizing pcsd certificates on nodes nfs-server1, nfs-server2...
nfs-server1: Success
nfs-server2: Success
Restarting pcsd on the nodes in order to reload the certificates...
nfs-server1: Success
nfs-server2: Success
Enable cluster services to auto-run on boot:
# pcs cluster enable --all
nfs-server1: Cluster Enabled
nfs-server2: Cluster Enabled
Creating clustered resources
For a complete NFS server, we need at least the following resources:
- Filesystem on common disk
- NFS service
- NFS export configuration
- Virtual (cluster) IP address for the active node to listen
Lets configure them one by one:
Filesystem resource:
# pcs resource create SharedFS Filesystem device=/dev/mapper/NFS_SHARED-NFS_LVM directory=/nfsdata fstype=xfs --group nfsresourcegroup
Assumed agent name 'ocf:heartbeat:Filesystem' (deduced from 'Filesystem')
NFS service resource:
# pcs resource create NFSService nfsserver nfs_shared_infodir=/nfsdata/nfsinfo --group nfsresourcegroup
Assumed agent name 'ocf:heartbeat:nfsserver' (deduced from 'nfsserver')
NFS export resource:
# pcs resource create NFSExport exportfs clientspec="192.168.16.0/24" options=rw,sync,no_root_squash,no_all_squash directory=/nfsdata fsid=0 --group nfsresourcegroup
Assumed agent name 'ocf:heartbeat:exportfs' (deduced from 'exportfs')
Virtual IP resource(192.168.16.100):
# pcs resource create VIP IPaddr2 ip=192.168.16.100 cidr_netmask=24 --group nfsresourcegroup
Assumed agent name 'ocf:heartbeat:IPaddr2' (deduced from 'IPaddr2')
In order to be more detailed, we will configure one extra resource. This will send NFSv3 reboot notifications to every NFS client in case of resource movement, right after NFS service is initialized on the new active node.
NFS notify resource:
# pcs resource create NFSNotify nfsnotify source_host=192.168.16.100 --group nfsresourcegroup
Assumed agent name 'ocf:heartbeat:nfsnotify' (deduced from 'nfsnotify')
Ordering Constraints
Last but not least, we need some order constraints that will enforce our resources to start in specific order (and stop in reverse):
# pcs constraint order SharedFS then NFSService
Adding SharedFS NFSService (kind: Mandatory) (Options: first-action=start then-action=start)
# pcs constraint order NFSService then NFSExport
Adding NFSService NFSExport (kind: Mandatory) (Options: first-action=start then-action=start)
# pcs constraint order NFSExport then VIP
Adding NFSExport VIP (kind: Mandatory) (Options: first-action=start then-action=start)
# pcs constraint order set NFSNotify require-all=true
After all these we have the following situation:
# pcs status
Cluster name: nfs-server-cluster
WARNING: no stonith devices and stonith-enabled is not false
Stack: corosync
Current DC: nfs-server2 (version 1.1.18-11.el7_5.2-2b07d5c5a9) - partition with quorum
Last updated: Sat Jun 2 12:54:13 2018
Last change: Sat Jun 2 12:51:51 2018 by root via cibadmin on nfs-server1
2 nodes configured
5 resources configured
Online: [ nfs-server1 nfs-server2 ]
Full list of resources:
Resource Group: nfsresourcegroup
SharedFS (ocf::heartbeat:Filesystem): Stopped
NFSService (ocf::heartbeat:nfsserver): Stopped
NFSExport (ocf::heartbeat:exportfs): Stopped
VIP (ocf::heartbeat:IPaddr2): Stopped
NFSNotify (ocf::heartbeat:nfsnotify): Stopped
Daemon Status:
corosync: active/enabled
pacemaker: active/enabled
pcsd: active/enabled
You noticed that the cluster is keeping all the configured resources stopped. This is happening because we have not configured any fencing resource yet. You can see a clear warning for that on the output above:
WARNING: no stonith devices and stonith-enabled is not false
Using command:
# pcs property set stonith-enabled=false
... you can disable fencing -for now- and every resource will immediately start up:
# pcs status
Cluster name: nfs-server-cluster
Stack: corosync
Current DC: nfs-server2 (version 1.1.18-11.el7_5.2-2b07d5c5a9) - partition with quorum
Last updated: Sat Jun 2 13:06:15 2018
Last change: Sat Jun 2 13:04:32 2018 by root via cibadmin on nfs-server1
2 nodes configured
5 resources configured
Online: [ nfs-server1 nfs-server2 ]
Full list of resources:
Resource Group: nfsresourcegroup
SharedFS (ocf::heartbeat:Filesystem): Started nfs-server1
NFSService (ocf::heartbeat:nfsserver): Started nfs-server1
NFSExport (ocf::heartbeat:exportfs): Started nfs-server1
VIP (ocf::heartbeat:IPaddr2): Started nfs-server1
NFSNotify (ocf::heartbeat:nfsnotify): Started nfs-server1
Daemon Status:
corosync: active/enabled
pacemaker: active/enabled
pcsd: active/enabled
Some testing
Now that we have a highly available NFS service, we can run some maintenance on our machines with -almost- no downtime.
For example a possible scenario is starting with the passive(inactive) node:
[root@nfs-server2 ~]# pcs node standby
- ... maintenance actions on nfs-server2...
[root@nfs-server2 ~]# pcs node unstandby
[root@nfs-server1 ~]# pcs node standby
- ... wait for resources to move and run maintenance actions on nfs-server1...
[root@nfs-server1 ~]# pcs node unstandby
If you had a client connected to nfs service on virtual IP during maintenance (during resource moving to be more specific) you could do some testing, for example:
[user@TestPC ~]# for i in {1..10}; do time touch /mnt/nfsdata/test; sleep 1; echo $?; done
real 0m0.033s
user 0m0.000s
sys 0m0.002s
0
real 0m0.016s
user 0m0.001s
sys 0m0.000s
0
real 0m0.006s
user 0m0.000s
sys 0m0.001s
0
real 0m0.008s
user 0m0.000s
sys 0m0.001s
0
real 0m0.031s
user 0m0.000s
sys 0m0.001s
0
real 0m7.724s
user 0m0.000s
sys 0m0.001s
0
real 0m0.007s
user 0m0.000s
sys 0m0.001s
0
real 0m0.008s
user 0m0.001s
sys 0m0.000s
0
real 0m0.007s
user 0m0.000s
sys 0m0.001s
0
real 0m0.032s
user 0m0.000s
sys 0m0.001s
0
As you see, the sixth try to touch /mnt/nfsdata/test file, took more time than usual because of a little resource unavailability but eventually, completed succesfully (return code 0). Of cource you can try more IO intensive operations (like watching your favorite movie while patching your NAS servers). You will have only delays as long as nfs client - server are able to handle pending transactions after initialization on the new active node.
We built and tested our cluster running some controlled actions. We marked an active node as standby and watched every resource gracefully stop on this node and start on the other. But what happens when a failure occurs? Is automatic failover always possible?
Fencing
Fencing is the most important mechanism of a cluster during a failure. It guarantees that the node that is going to take over, will start clean and the malfunctioning node is really isolated.
Fencing, in linux clusters is implemented by "stonith" (Shoot The Other Node In The Head) resources. The exact implementation totally depends on your infrastucture. There are multiple "fence agents" that will do their job on specific environments (physical/virtual machines, software/hardware fencing devices etc).
On another article we will add fencing functionality to our cluster, running the servers as virtual machines on KVM hypervisors.
- Posted by Kostas Koutsogiannopoulos · June 4, 2018