Cluster
Clustering is establishing connectivity among two or more servers in order to make it work like one. Clustering is a very popular technic among Sys-Engineers that they can cluster servers as a failover system, a load balance system or a parallel processing unit.
A failover cluster is a set of servers that works together to maintain the high availability of applications and services.
Advantages of Clustering Servers
Clustering servers is completely a scalable solution. You can add resources to the cluster afterwards.
If a server in the cluster needs any maintenance, you can do it by stopping it while handing the load over to other servers.
Among high availability options, clustering takes a special place since it is reliable and easy to configure. In case of a server is having a problem providing the services furthermore, other servers in the cluster can take the load.
Disadvantages of Clustering Servers
Cost is high. Since the cluster needs good hardware and a design, it will be costly comparing to a non-clustered server management design. Being not cost effective is a main disadvantage of this particular design.
Since clustering needs more servers and hardware to establish one, monitoring and maintenance is hard. Thus increase the infrastructure.
Now let’s see what kind of packages/installations we need to configure this setup successfully. The following packages/RPMs can be downloaded by rpmfind.net.
Ricci (ricci-0.16.2-75.el6.x86_64.rpm)
Luci (luci-0.26.0-63.el6.centos.x86_64.rpm)
Mod_cluster (modcluster-0.16.2-29.el6.x86_64.rpm)
CCS (ccs-0.16.2-75.el6_6.2.x86_64.rpm)
CMAN(cman-3.0.12.1-68.el6.x86_64.rpm)
Clusterlib (clusterlib-3.0.12.1-68.el6.x86_64.rpm)
Let’s see what each installation does for us and their meanings.
Ricci is a daemon which used for cluster management and configurations. It distributes/dispatches receiving messages to the nodes configured.
Luci is a server that runs on the cluster management server and communicates with other multiple nodes. It provides a web interface to make things easier.
Mod_cluster is a load balancer utility based on httpd services and here it is used to communicate the incoming requests with the underlying nodes.
CCS is used to create and modify the cluster configuration on remote nodes through ricci. It is also used to start and stop the cluster services.
CMAN is one of the primary utilities other than ricci and luci for this particular setup, since this acts as the cluster manager. Actually, cman stands for CLUSTER MANAGER. It is a high-availability add-on for RedHat which is distributed among the nodes in the cluster.
Install below packages in all three servers.
Step 1: Installing Clustering in Linux
#
yum install “ricci”
After ricci installation is done, we can see it has installed mod_cluster and cluster lib as its dependencies.
Next install luci using yum install “luci” command.
#
yum install "luci"
After the installation of luci, you can see it has installed the dependencies it needed.
Now, let’s install ccs package in the servers. For that I entered yum install ccs.x86_64 which is shown in the list when I issued yum list |grep “ccs” or else you can simply issue yum install “ccs”.
#
yum install “ccs”
Let’s install cman as the last requirement for this particular setup. The command is yum install “cman” or yum install cman.x86_64 as shown in the yum list as I mentioned earlier.
#
yum install “cman”
We need to confirm the installations are in place. Issue below command to see whether the packages we needed are installed properly in all three servers.
#
rpm -qa | egrep "ricci|luci|modc|cluster|ccs|cman"
Perfect all the packages are installed and all we need to do is configuring the setup.
Step 2: Configure Cluster in Linux
1. As the first step for setting up the cluster, you need to start the ricci service on all three servers.
#
service ricci start
OR
#
/etc/init.d/ricci start
2. Since ricci is started in all servers, now it’s time to create the cluster. This is where ccs package comes to our help when configuring the cluster.
If you don’t want to use ccs commands then you will have to edit the “cluster.conf” file for adding the nodes and do other configs. I guess easiest way is to use following commands. Let’s have a look.
Since I haven’t created the cluster yet, there’s no cluster.conf file created in /etc/cluster location yet as shown below.
# cd /etc/cluster
# pwd
# ls
In my case, I do this in 172.16.1.250 which is dedicated for cluster management. Now onwards, everytime we try to use ricci server, it will ask for ricci’s password. So you will have to set the password of ricci user in all servers.
Enter passwords for ricci user.
#
passwd ricci
Now enter the command as shown below.
#
ccs -h 172.16.1.250 --createcluster linuxtech_cluster
You can see after entering above command, cluster.conf file is created in /etc/cluster directory.
3. Now let’s add the two nodes to the system. In here also we use ccs commands to make the configurations. I’m not going to manually edit the cluster.conf file but use the following syntax.
#
ccs -h 172.16.1.250 --addnode 172.16.1.222
Add the other node too.
#
ccs -h 172.16.1.250 --addnode 172.16.1.223
You also can enter below command to verify node details.
#
ccs –h 172.16.1.250 --lsnodes
================================
Fencing and Adding a Failover to Clustering
================================
What is Fencing?
If we think of a setup with more than one nodes, it is possible that one or more nodes can be failed at some point of time. So in this case fencing is isolating the malfunctioning server from the cluster in order to protect and secure the synced resources. Therefore we can add a fence to protect the resources shared within the cluster.
What is Failover?
Imagine a scenario, where a server has important data for an organization which the stakeholders need the organization to keep the server up and running without any down time experienced. In this case we can duplicate the data to another server (now there are two servers with identical data and specs) which we can use as the fail-over.
By any chance, one of the servers goes down, the other server which we have configured as the fail-over will take over the load and provides the services which were given by the first server. In this method, users will not be experienced the down time period which was caused to the primary server.
Step 1: How to Add Fencing to Cluster Server
1. First we have to enable fencing on the cluster server, for this I will use below two commands.
#
ccs -h 172.16.1.250 --setfencedaemon post_fail_delay=0
#
ccs -h 172.16.1.250 --setfencedaemon post_join_delay=10
As you can see we use ccs command to add the configurations to cluster. Following are definitions of the options I have used in the command.
-h: Cluster host IP address.
–setfencedaemon: Applies the changes to the fencing daemon.
post_fail_delay: Time in seconds which the daemon waits before fencing a victim server when a node has been failed.
post_join_delay: Time in seconds which the daemon waits before fencing victim server when a node has joined the cluster.
2. Now let’s add a fence device for our cluster, execute below command to add a fence device.
#
ccs -h 172.16.1.250 --addfencedev linuxtechfence agent=fence_virt
#
ccs -h 172.16.1.250 --lsfenceopts
Step 2: Add Two Nodes to Fence Device
3. Now I’m going to add a method to the created fence device and add hosts in to it.
#
ccs -h 172.16.1.250 --addmethod Method01 172.16.1.222
#
ccs -h 172.16.1.250 --addmethod Method01 172.16.1.223
You have to add the methods you have created while ago for the both nodes you have in your setup.
4. As the next step, you will have to add the fence methods you created for the both nodes, to the fence device we created namely “
linuxtechfence”.
#
ccs -h 172.16.1.250 --addfenceinst linuxtechfence 172.16.1.222 Method01
#
ccs -h 172.16.1.250 --addfenceinst linuxtechfence 172.16.1.223 Method01
I have successfully associated my methods with the fence device and this is how my cluster.conf looks like now.
Step 3: Add Failover to Cluster Server
5. I use below syntax of commands to create my fail-over to the cluster setup.
#
ccs -h 172.16.1.250 --addfailoverdomain linuxtechfod ordered
6. As you have created the fail-over domain, now you can add two nodes to it.
#
ccs -h 172.16.1.250 --addfailoverdomainnode linuxtechfod 172.16.1.222 1
#
ccs -h 172.16.1.250 --addfailoverdomainnode linuxtechfod 172.16.1.223 2
=======================================================================
How to Sync Cluster Configuration and Verify Failover Setup in Nodes
========================================================================
We will start by adding resources to the cluster. In this case we can add a file system or a web service as your need. Now I have /dev/sda3 partition mounted to /x01 which I wish to add as a file system resource.
1. I use below command to add a file system as a resource:
#
ccs -h 172.16.1.250 --addresource fs name=my_fs device=/dev/mapper/linuxtechtest_lv_vol01 mountpoint=/x01 fstype=ext3
Additionally, if you want to add a service also, you can by using below methodology. Issue the following command.
#
ccs -h 172.16.1.250 --addservice my_web domain=testdomain recovery=relocate autostart=1
You can verify it by viewing the cluster.conf file as we did in previous lessons.
2. Now enter following entry in cluster.conf file to add a reference tag to the service.
<fs ref="my_fs"/>
3. All set. No we will see how we can sync the configurations we made to cluster among the 2 nodes we have. Following command will do the needful.
#
ccs -h 172.16.1.250 --sync --activate
Note: Enter passwords we set for ricci in the early stages when we were installing packages.
You can verify your configurations by using below command.
#
ccs -h 172.16.1.250 --checkconf
4. Now it’s time to start the things up. You can use one of below commands as you prefer.
To start only one node use the command with relevant IP.
#
ccs -h 172.16.1.222 start
Or if you want to start all nodes use --startall option as follows.
#
ccs -h 172.16.1.250 –startall
You can use stop or --stopall if you needed to stop the cluster.
In a scenario like if you wanted to start the cluster without enabling the resources (resources will automatically be enabled when the cluster is started), like a situation where you have intentionally disabled the resources in a particular node in order to disable fencing loops, you don’t want to enable those resources when the cluster is starting.
For that purpose you can use below command which starts the cluster but does not enable the resources.
#
ccs -h 172.16.1.250 --startall --noenable
5. After the cluster has been started up, you can view the stats by issuing clustat command.
#
clustat
The options to restart a server using the ILO are contained under the virtual power tab. You can do one of four options:
Momentary press – this is like pressing the power button on your laptop/pc or server, with Windows 2000 and above (if I remember correctly), it should cleanly shut down the operating system
Press and hold – this is like pressing and holding the power button, it will turn off the computer completely and will not cleanly shut down the server
Cold boot of the system – this effectively powers off the server and restarts it.
Reset system – this is the ILO way of rebooting it without cleanly restarting the operating system, if your server has hung and wont respond, this might be the option you select.