Scalable HA PHP for Linux

[float=right][smg id=402 type=box][/float]Just to get the ball rolling, a question I’ve been asked a few times recently is; “what’s the best way to set up a resilient web service, specifically for dynamic web sites”. The solution and details will vary depending on the application server you’re looking to run, but I’ve a working solution here for anyone interested.

I’m not going to go into too much detail as it’s the principle and the fact that it works which are the key points.

Step 1, in order to keep your systems up, they need to be able to survive not only a disk failure, but also a failure of the system on which the disk sits, so effectively you need to be able to serve files off mirrored disks in real time. You immediately have two issues;

  • How to physically mirror the disks
  • How to persuade a filesystem to operate off two mirrored disks on two diverse machines

DRBD - Will effectively mirror two partitions on separate machines via an ethernet link.
OCFS2 - Will be your filesystem and “knows” about mirrored devices

So you will end up with two machines, each machine will have a filesystem partition which will be mirrored with the other machine via DRBD, then each machine will mount it’s local partition using OCFS2. So each machine will be able to read/write to their local OCFS2 filesystem, and any changes will immediately appear on the OCFS2 filesystem on the other machine.

It will “look” like two clients mounting the same network (NFS/SAMBA etc) file-system, except taking out one of the servers in it’s entirety will have no effect on the other server.

Step 2, you still effectively have two diverse systems so you need a way to pull them together. So, set up an NFS4 server on each machine and export the root of what you would like to be your web server’s file tree. (eg; /var/www)

Then you need some PHP servers, essentially any box, install PHP and NFS mount one of the two mirrored servers. Ideally mix things up so you have two or more PHP servers mounting different NFS servers.

Step 3, you need a web server, I’d strongly recommend using Lighttpd rather than Apache, not only is it quicker but it also handles running with multiple UID’s which you need in a multi-user shared environment (i.e. if you’re an ISP) - not something Apache does well.

So you still have a single point of failure in the web server, but there are all sorts of ways of backing this up with IP sharing, heartbeat / failover etc, but to be honest I’ve not seen a web server failure in the last three years so as long as you offload all your PHP work to other machines, the chances of ever needing a fail-over here don’t generally warrant the work. Also (!) in a VPS environment it takes ~ 10s to boot a server from scratch to having a live port-80, so just keeping a spare instance knocking around is a fairly efficient backup solution.

Examples - You’re using it now! i.e. this is SMF running on the above-mentioned configuration, if you want to try Wordpress on the same solution head over to http://trollstomper.com.

What you need

To get this running effectively in a test environment (or live for that matter);

  • Two filestore boxes running DRBD, OCFS2 and NFSv4
  • Two (or more) PHP boxes [configured for fast-cgi] and NFSv4 client
  • A web server box configured with lighttpd

Tips;

  • Make sure you enable NFS client caching, but if you do this you will need to add fs.leases-enable = 0 to sysctl.conf on your NFS servers.
  • Try using an increased block size and TCP (rather than UDP) for NFS mounts
  • DRBD will need to be configured with “allow two primaries” in order for both OCFS2 mounts to run at the same time

For anyone who’s interested or trying this and stuck, I can post config snippits … if on the other hand nobody is remotely interested in HA systems, I guess this is just more text for Google to search … :-\

Thanks for that, I was wondering how clustering for server fail-over could be implemented on Linux based servers… Now I know where to start looking… Time to play :wink:

Have you ever played with the ‘condor’ (HTC environment)? and/or ‘pacemaker-heartbeat’?

Condor no, heartbeat yes.

Heartbeat is “messey” , personally I would avoid like the plague.
(is pacemaker not the new version of heartbeat?)

Not sure, thats why I was asking :wink: I was wondering about what the differences are, and which solution was ‘better’… looks like it’s more of a heartbeat manager (so cleverly named).

According to the Ubuntu repo:
pacemaker-heartbeat

High-Availability cluster resource manager for Heartbeat.

Pacemaker supports a very sophisticated dependency model for n-node
clusters and can respond to node and resource-level failures.

It supports both the Heartbeat (Linux-HA) and OpenAIS cluster stacks


and ‘condor’ is a workload management system for high throughput computing, so probably not applicable anyway, I just wondered if you had ever used it or knew anything about it.

I think I tried pacemaker and failed to get it to work.

If you look at what it is and what it does, it’s a pretty brute force mechanism and not something to use unless you really need it and don’t have an alternative.

We use heartbeat for server failure detection (over the standard network links, normally ping-based) and ldirector for service failover and loadbalancing… useful for 100% uptime clusters :slight_smile:

Ok, I’ve tried my hand at trying to visualise the configuration, does this make sense / help to show how it works? This is a simplified setup involving two physical hosts, in reality we have a number of these, all identical in specification. The data nodes, PHP nodes and light httpd nodes are all Virtual KVM servers and technically can sit on any node so long as you make sure the physical node has access to it’s partition via DRBD. (I use scripts to migrate DRBD and KVM between nodes)
[smg id=402]
Obvious question #1: Why do you re-implement DRBD “inside” of the data servers rather than using the node based DRBD that’s already running?
Not so obvious answer #1: When you migrate a data node / VPS between physical nodes, it’s MUCH easier if the associated data is encapsulated within the VPS and much less prone to errors / mistakes … and there doesn’t seem to be much by the way of additional overhead.

Question #2 (because I know it’s comming): Did you draw this in Visio?
Answer #2: Absolutely not! That would involve running Windows! See; http://www.gliffy.com