Chapter 32. High Availability

Chapter 32. High Availability
Prev	Part III. Advanced Configuration	Next

John H. Terpstra

Samba Team

<jht@samba.org>

Jeremy Allison

Samba Team

<jra@samba.org>

Table of Contents

Features and Benefits

Technical Discussion

The Ultimate Goal
Why Is This So Hard?
A Simple Solution
High-Availability Server Products
MS-DFS: The Poor Man's Cluster
Conclusions

Features and Benefits

- - - -Network administrators are often concerned about the availability of file and print -services. Network users are inclined toward intolerance of the services they depend -on to perform vital task responsibilities. -

-A sign in a computer room served to remind staff of their responsibilities. It read: -

- - - - -All humans fail, in both great and small ways we fail continually. Machines fail too. -Computers are machines that are managed by humans, the fallout from failure -can be spectacular. Your responsibility is to deal with failure, to anticipate it -and to eliminate it as far as is humanly and economically wise to achieve. -Are your actions part of the problem or part of the solution? -

-If we are to deal with failure in a planned and productive manner, then first we must -understand the problem. That is the purpose of this chapter. -

- - - -Parenthetically, in the following discussion there are seeds of information on how to -provision a network infrastructure against failure. Our purpose here is not to provide -a lengthy dissertation on the subject of high availability. Additionally, we have made -a conscious decision to not provide detailed working examples of high availability -solutions; instead we present an overview of the issues in the hope that someone will -rise to the challenge of providing a detailed document that is focused purely on -presentation of the current state of knowledge and practice in high availability as it -applies to the deployment of Samba and other CIFS/SMB technologies. -

Technical Discussion

- - - -The following summary was part of a presentation by Jeremy Allison at the SambaXP 2003 -conference that was held at Goettingen, Germany, in April 2003. Material has been added -from other sources, but it was Jeremy who inspired the structure that follows. -

The Ultimate Goal

- - - - All clustering technologies aim to achieve one or more of the following: -

Obtain the maximum affordable computational power.
Obtain faster program execution.
Deliver unstoppable services.
Avert points of failure.
Exact most effective utilization of resources.

- A clustered file server ideally has the following properties: - - - - -

All clients can connect transparently to any server.
A server can fail and clients are transparently reconnected to another server.
All servers serve out the same set of files.
All file changes are immediately seen on all servers.
- Requires a distributed file system.
Infinite ability to scale by adding more servers or disks.

Why Is This So Hard?

- In short, the problem is one of state. -

- - All TCP/IP connections are dependent on state information. -
- - The TCP connection involves a packet sequence number. This - sequence number would need to be dynamically updated on all - machines in the cluster to effect seamless TCP failover. -
- - - CIFS/SMB (the Windows networking protocols) uses TCP connections. -
- This means that from a basic design perspective, failover is not - seriously considered. -
- - All current SMB clusters are failover solutions - they rely on the clients to reconnect. They provide server - failover, but clients can lose information due to a server failure. - -
-
- Servers keep state information about client connections. -
- CIFS/SMB involves a lot of state.
- Every file open must be compared with other open files - to check share modes.
-

The Front-End Challenge

- - - - - - - - To make it possible for a cluster of file servers to appear as a single server that has one - name and one IP address, the incoming TCP data streams from clients must be processed by the - front-end virtual server. This server must de-multiplex the incoming packets at the SMB protocol - layer level and then feed the SMB packet to different servers in the cluster. -

- - - One could split all IPC$ connections and RPC calls to one server to handle printing and user - lookup requirements. RPC printing handles are shared between different IPC4 sessions it is - hard to split this across clustered servers! -

- Conceptually speaking, all other servers would then provide only file services. This is a simpler - problem to concentrate on. -

Demultiplexing SMB Requests

- - - - - De-multiplexing of SMB requests requires knowledge of SMB state information, - all of which must be held by the front-end virtual server. - This is a perplexing and complicated problem to solve. -

- - - - Windows XP and later have changed semantics so state information (vuid, tid, fid) - must match for a successful operation. This makes things simpler than before and is a - positive step forward. -

- - - SMB requests are sent by vuid to their associated server. No code exists today to - effect this solution. This problem is conceptually similar to the problem of - correctly handling requests from multiple requests from Windows 2000 - Terminal Server in Samba. -

- - One possibility is to start by exposing the server pool to clients directly. - This could eliminate the de-multiplexing step. -

The Distributed File System Challenge

- - There exists many distributed file systems for UNIX and Linux. -

- - - - - - - Many could be adopted to backend our cluster, so long as awareness of SMB - semantics is kept in mind (share modes, locking, and oplock issues in particular). - Common free distributed file systems include: - - - - -

NFS
AFS
OpenGFS
Lustre

- - The server pool (cluster) can use any distributed file system backend if all SMB - semantics are performed within this pool. -

Restrictive Constraints on Distributed File Systems

- - - - - Where a clustered server provides purely SMB services, oplock handling - may be done within the server pool without imposing a need for this to - be passed to the backend file system pool. -

- - - On the other hand, where the server pool also provides NFS or other file services, - it will be essential that the implementation be oplock-aware so it can - interoperate with SMB services. This is a significant challenge today. A failure - to provide this interoperability will result in a significant loss of performance that will be - sorely noted by users of Microsoft Windows clients. -

- Last, all state information must be shared across the server pool. -

Server Pool Communications

- - - - - Most backend file systems support POSIX file semantics. This makes it difficult - to push SMB semantics back into the file system. POSIX locks have different properties - and semantics from SMB locks. -

- - - - All smbd processes in the server pool must of necessity communicate - very quickly. For this, the current tdb file structure that Samba - uses is not suitable for use across a network. Clustered smbds must use something else. -

Server Pool Communications Demands

- High-speed interserver communications in the server pool is a design prerequisite - for a fully functional system. Possibilities for this include: -

- Proprietary shared memory bus (example: Myrinet or SCI [scalable coherent interface]). - These are high-cost items. -
- Gigabit Ethernet (now quite affordable). -
- Raw Ethernet framing (to bypass TCP and UDP overheads). -

- We have yet to identify metrics for performance demands to enable this to happen - effectively. -

Required Modifications to Samba

- Samba needs to be significantly modified to work with a high-speed server interconnect - system to permit transparent failover clustering. -

- Particular functions inside Samba that will be affected include: -

- The locking database, oplock notifications, - and the share mode database. -
- - - Failure semantics need to be defined. Samba behaves the same way as Windows. - When oplock messages fail, a file open request is allowed, but this is - potentially dangerous in a clustered environment. So how should interserver - pool failure semantics function, and how should such functionality be implemented? -
- Should this be implemented using a point-to-point lock manager, or can this - be done using multicast techniques? -

A Simple Solution

- - - - Allowing failover servers to handle different functions within the exported file system - removes the problem of requiring a distributed locking protocol. -

- - - If only one server is active in a pair, the need for high-speed server interconnect is avoided. - This allows the use of existing high-availability solutions, instead of inventing a new one. - This simpler solution comes at a price the cost of which is the need to manage a more - complex file name space. Since there is now not a single file system, administrators - must remember where all services are located a complexity not easily dealt with. -

- - The virtual server is still needed to redirect requests to backend - servers. Backend file space integrity is the responsibility of the administrator. -

High-Availability Server Products

- - - - - - Failover servers must communicate in order to handle resource failover. This is essential - for high-availability services. The use of a dedicated heartbeat is a common technique to - introduce some intelligence into the failover process. This is often done over a dedicated - link (LAN or serial). -

- - - - - - Many failover solutions (like Red Hat Cluster Manager and Microsoft Wolfpack) - can use a shared SCSI of Fiber Channel disk storage array for failover communication. - Information regarding Red Hat high availability solutions for Samba may be obtained from - www.redhat.com. -

- - The Linux High Availability project is a resource worthy of consultation if your desire is - to build a highly available Samba file server solution. Please consult the home page at - www.linux-ha.org/. -

- - - Front-end server complexity remains a challenge for high availability because it must deal - gracefully with backend failures, while at the same time providing continuity of service - to all network clients. -

MS-DFS: The Poor Man's Cluster

- - - MS-DFS links can be used to redirect clients to disparate backend servers. This pushes - complexity back to the network client, something already included by Microsoft. - MS-DFS creates the illusion of a simple, continuous file system name space that works even - at the file level. -

- Above all, at the cost of complexity of management, a distributed system (pseudo-cluster) can - be created using existing Samba functionality. -

Conclusions

Transparent SMB clustering is hard to do!
Client failover is the best we can do today.
Much more work is needed before a practical and manageable high-availability transparent cluster solution will be possible.
MS-DFS can be used to create the illusion of a single transparent cluster.

Prev	Up	Next
Chapter 31. Backup Techniques	Home	Chapter 33. Handling Large Directories