[float=right][smg id=402 type=box][/float]Just to get the ball rolling, a question I’ve been asked a few times recently is; “what’s the best way to set up a resilient web service, specifically for dynamic web sites”. The solution and details will vary depending on the application server you’re looking to run, but I’ve a working solution here for anyone interested.
I’m not going to go into too much detail as it’s the principle and the fact that it works which are the key points.
Step 1, in order to keep your systems up, they need to be able to survive not only a disk failure, but also a failure of the system on which the disk sits, so effectively you need to be able to serve files off mirrored disks in real time. You immediately have two issues;
- How to physically mirror the disks
- How to persuade a filesystem to operate off two mirrored disks on two diverse machines
DRBD - Will effectively mirror two partitions on separate machines via an ethernet link.
OCFS2 - Will be your filesystem and “knows” about mirrored devices
So you will end up with two machines, each machine will have a filesystem partition which will be mirrored with the other machine via DRBD, then each machine will mount it’s local partition using OCFS2. So each machine will be able to read/write to their local OCFS2 filesystem, and any changes will immediately appear on the OCFS2 filesystem on the other machine.
It will “look” like two clients mounting the same network (NFS/SAMBA etc) file-system, except taking out one of the servers in it’s entirety will have no effect on the other server.
Step 2, you still effectively have two diverse systems so you need a way to pull them together. So, set up an NFS4 server on each machine and export the root of what you would like to be your web server’s file tree. (eg; /var/www)
Then you need some PHP servers, essentially any box, install PHP and NFS mount one of the two mirrored servers. Ideally mix things up so you have two or more PHP servers mounting different NFS servers.
Step 3, you need a web server, I’d strongly recommend using Lighttpd rather than Apache, not only is it quicker but it also handles running with multiple UID’s which you need in a multi-user shared environment (i.e. if you’re an ISP) - not something Apache does well.
So you still have a single point of failure in the web server, but there are all sorts of ways of backing this up with IP sharing, heartbeat / failover etc, but to be honest I’ve not seen a web server failure in the last three years so as long as you offload all your PHP work to other machines, the chances of ever needing a fail-over here don’t generally warrant the work. Also (!) in a VPS environment it takes ~ 10s to boot a server from scratch to having a live port-80, so just keeping a spare instance knocking around is a fairly efficient backup solution.
Examples - You’re using it now! i.e. this is SMF running on the above-mentioned configuration, if you want to try Wordpress on the same solution head over to http://trollstomper.com.
What you need
To get this running effectively in a test environment (or live for that matter);
- Two filestore boxes running DRBD, OCFS2 and NFSv4
- Two (or more) PHP boxes [configured for fast-cgi] and NFSv4 client
- A web server box configured with lighttpd
Tips;
- Make sure you enable NFS client caching, but if you do this you will need to add fs.leases-enable = 0 to sysctl.conf on your NFS servers.
- Try using an increased block size and TCP (rather than UDP) for NFS mounts
- DRBD will need to be configured with “allow two primaries” in order for both OCFS2 mounts to run at the same time
For anyone who’s interested or trying this and stuck, I can post config snippits … if on the other hand nobody is remotely interested in HA systems, I guess this is just more text for Google to search … :-\