Fencing (or I/O fencing) is the mechanism that disables an errant GFS node's access to a file system, preventing the node from causing data corruption. This chapter explains the necessity of fencing, summarizes how the fencing system works, and describes each form of fencing that can be used in a GFS cluster. The chapter consists of the following sections:
Fencing consists of two main steps:
Removal — Cutting an errant node off from contact with the storage
Recovery — Returning the node safely back into the cluster.
A cluster manager monitors the heartbeat between GFS nodes to determine which nodes are running properly and which nodes are errant in a GFS cluster. (A cluster manager is part of the LOCK_GULM server). If a node fails, the cluster manager fences the node, then communicates to the lock manager and GFS to perform recovery of the failed node.
If a node falls out of contact (losing heartbeat) with the rest of the cluster, the locks it holds and the corresponding parts of the file system are unavailable to the rest of the nodes in the cluster. Eventually, that condition may bring the entire cluster to a halt as other nodes require access to those parts of the file system.
If a node fails, it cannot be permitted to rejoin the cluster while claiming the locks it held when the node failed. Otherwise, that node could write to a file system where another node — that legitimately has been issued locks to write to the file system — is writing, therefore corrupting the data. Fencing prevents a failed node from rejoining a cluster with invalid locks by disabling the path between the node and the file system storage.
When the cluster manager fences a node, it directs the fencing system to fence the node by name. The fencing system must read from CCS the appropriate method of fencing the node. Refer to Chapter 7 Using the Cluster Configuration System for details on how to specify each fencing method in the CCS configuration files.
Each device or method that can be used to fence nodes is listed in fence.ccs under fence_devices. Each device specification includes the name of a fencing agent. The fencing agent is a command that interacts with a specific type of device to disable a specific node. In order to use a device for fencing, an associated fence agent must exist.