Filesystems TNG

I’ve seen people raving about OpenSolaris and in particular ZFS and thought, “must try that one day”, then seen that there are licensing issues and never really got around to it. Then recently I was attracted by BTRFS, which has lots of nice features, so I thought I’d give it a try. Unfortunately BTRFS is about as stable at a 10ft Jelly in a high wind and is more full of feature holes than the proverbial cheese made in Switzerland.

So, ZFS is sort of like BTRFS, but it’s stable and had all the features BTRFS would like to have … yes?
You’re waiting for me to say “actually, no!” … however, actually “yes” !

It does have a few shortcomings, but essentially it IS the best thing since sliced bread and if you’re not running your server on it, one day you won’t just be kicking yourself, you’ll be kicking yourself around the room, out the door, and down the street! Too strong a recommendation? Try it yourself!

To install, do this; (on Ubuntu)

sudo add-apt-repository ppa:zfs-native/stable
apt-get update && apt-get install ubuntu-zfs

If you have two available disks, create a raid10 like this;

zpool create vols mirror /dev/sda /dev/sdb   # modify device names as appropriate

It will create the mount points and mount the filesystems for you.
If you have a bunch of disks, create a raidz array (Raid5 without the bugs) with two parity disks like this;

zpool create vols raidz2 /dev/sda /dev/sdb /dev/sdc /dev/sdd /dev/sde /dev/sdf

I think in general you’ll find these RAID’s perform “better” than the traditional Linux RAID, and has many other features.
(just for kick off, RAID’s are all striped, so you’re going to get 500M/sec+ off of 6 disks!)

You can create subvolumes like this;

zfs create vols/home
zfs create vols/backups

And here comes the really good stuff, you can create snapshots like this;

zfs snapshot vols/home@fullback

You can then send a snapshot to another volume, so for example;

zfs send vols/home@fullback | zfs receive -f vols/backups

And, wait for it, to make a backup on a remote system, assuming it too runs ZFS;

zfs send vols/home@fullback | ssh root@remote zfs receive -d vols/backups

And (!) the icing on the cake, you can also do incremental backups, so following from the above example;

zfs snapshot vols/home@incremental
zfs send -i vols/home@fullback vols/home@incremental | ssh root@remote receive -d vols/backups

… which sends the differences between the two snapshots to the remote host, then on the remote host creates a new snapshot and merges the changes into it’s copy of vols/home. i.e. the remote host will have the latest copy of your volume, which it automatically creates and mounts, and it will have a snapshot to match each incremental backup you’ve made. Indeed incremental’s are so quick, running one per minute is more than realistic!

Think about it, a full backup of your system, every minute! And it even works on VM image files! No more “rsync” !!

You can read the spec’s, and there are some gotcha’s, but once you’ve worked your way through everything, it’s an incredibly powerful alternative if you’re not too worried about the theoretical licensing restrictions (!)

A typical system might look something like this;

# zfs list
NAME              USED  AVAIL  REFER  MOUNTPOINT
vols              749G  2.79T  61.9K  /vols
vols/backups      616G  2.79T   616G  /vols/backups
vols/disks       8.25G  2.79T  53.9K  /vols/disks
vols/disks/swap  8.25G  2.80T  40.0K  -
vols/home        40.7G  2.79T  40.7G  /vols/home
vols/images      84.0G  2.79T  84.0G  /vols/images

# zpool status
  pool: vols
 state: ONLINE
 scan: scrub repaired 0 in 1h20m with 0 errors on Mon Apr 23 12:17:02 2012
config:

	NAME                                               STATE     READ WRITE CKSUM
	vols                                               ONLINE       0     0     0
	  raidz2-0                                         ONLINE       0     0     0
	    scsi-SATA_SAMSUNG_HD103SJS246J90ZA22486-part3  ONLINE       0     0     0
	    scsi-SATA_SAMSUNG_HD103SJS246J9CB121450-part3  ONLINE       0     0     0
	    scsi-SATA_SAMSUNG_HD103SJS246J9KB415896-part3  ONLINE       0     0     0
	    scsi-SATA_SAMSUNG_HD103SJS246J9KB415897-part3  ONLINE       0     0     0
	    scsi-SATA_SAMSUNG_HD103SJS246J9KB415898-part3  ONLINE       0     0     0
	    scsi-SATA_SAMSUNG_HD103SJS246J9KB427162-part3  ONLINE       0     0     0

errors: No known data errors

Hmm… Interesting. My NAS currently runs on a EXT3 filesystem.

But soon I’ll be building the “Linux Beast” and add 2/3 HDDs in RAID. Is this filesystem really worth putting on it, once I’ve built it?

That might be a bit like , “I’m going to build a rally car and go rallying” … “do I really need that roll-cage?” … “and what about the turbo charger?” … “and do I really need off-road tyres?” … “and surely not a helmet?” … “I’m sure those gloves aren’t needed” … “and the truck and support crew, I’m pretty sure I can manage without them”.

I guess it depends (a) how much effort you want to put in and (b) how important your data is … :slight_smile:

Is it more work … yes … will it give a better solution … yes …

I just love MP’s writing style :slight_smile:

@BkS

I’d guess this is (in general) more aimed at server and system/network admins than home PC builds (though I suppose it could have a place there too) … people who look after data that they CANNOT afford to loose.

Yes, I know the average home user will also say “I can’t afford to loose my data”, but if they were that concerned they’d backup, which most don’t … your average home user generally considers hard drive capacity over and above data security so doesn’t bother with things like mirroring drives, or even backing up “user” data.

So as MP says … how important is your data ?, are you willing to loose capacity in favour of data security ?, do you backup regularly ?, what do you backup and where to/from ?, how large are these backups likely to become ?, is failover a requirement ?, etc.

It’s impossible for someone else to suggest the best solution for your requirements without knowing exactly what those requirements are, and if they’re likely to evolve.

Question … recommend a backup/failover strategy for my requirements.

See what I mean :wink:

@Mad Penguin

For those that may not consider backups, data security, and failover a priority … you touch on the fact that large arrays of smaller disks can outperform single large disks because of their ability to read/write to multiple places simultaneously … if this (I/O performance) was your ONLY consideration, would you STILL choose ZFS ?

I’d also be interested to hear what these “shortcomings”, “gotchas”, and “licensing issues” are ?

Licensing aside … I suppose I’m asking if there are times/uses where you WOULDN’T consider ZFS the next generation file system of choice … as I’m guessing that was BkS’s real question ?

Ah, let me rephrase the my question then.

My data is very important to me, I do back-ups regularly and I CANNOT afford to loose my data. I usually back-up to my NAS. Is this the “best” method, for fast file-sharing, backing up, etc, once the “Linux Beast” has come to life?

I’ll have dedicated hard-drives specifically for backing up. (I know your thinking, “wait… how many hard-drives are you buying?”)

I suppose I'm asking if there are times/uses where you WOULDN'T consider ZFS the next generation file system of choice .. as I'm guessing that was BkS's real question ?

This was part of the question, yes.

If IO speed were the only consideration, your choices would be Linux MD, ZFS or a hardware RAID controller. (did I miss one?)

NEVER use a hardware RAID controller unless you’re using it in JBOD mode, i.e. as a dumb disk controller, not only will you regret it one day, but typically this will be slower and less reliable than the software RAID options available.

Would I choose Linux MD or ZFS?

Ok, I had to think about it, but I’m reminded of a quote from Data in Start Trek “First Contact”;

Lieutenant Commander Data: And for a time, I was tempted by her offer. Captain Jean-Luc Picard: How long a time? Lieutenant Commander Data: 0.68 seconds sir. For an android, that is nearly an eternity.

ZFS!

When wouldn’t I use it;

  • Machines with limited amount of RAM, say 4Gb or less.
  • Machines with less than two hard drives
  • Erm …

Shortcomings;

  • Currently ZFS uses it’s own page cache rather than the systems free memory (but this ‘can’ be beneficial sometimes)
  • Automatic replacement of hot spares is problematic, currently you need to switch out faulty disks manually
  • It’s sensitive to hardware, so the “order” disks appear in the system is significant, this is in contract to Linux MD which simply relies on the UUID on each disk header … i.e. if you take all your disks out and put them back in a different order, be sure you picked naming by UID rather than device name
  • There are many features, so you really need to do your homework
  • It’s a custom PPA rather than being built into Ubuntu or the Kernel

When I say “manually”, I mean;

zfs replace <volume> <faulty device> <hot spare device>

If ZFS is not built into the kernel, can it be used for a boot drive, or swp ?

Does Grub(2) support ZFS ?

0.68 seconds eh? … Maybe Lieutenant Commander Data runs NTFS and needs a defrag :wink:

I guess ZFS will never be part of the Linux kernel whilst it’s licensed under the Sun CDDL … Maybe Oracle can be talked into releasing it under the GPL as a way of currying favour back with the OSS world.

There goes Miss Piggy in an English Electric Lightning again :wink:

Ahem… Oracle give back the OSS world? Are you freakin’ kidding me? ???

If ZFS is not built into the kernel, can it be used for a boot drive, or swp ?

Technically yes/yes, although there have been issued reported with both, so would tend to avoid, certainly atm.

Does Grub(2) support ZFS ?

I think there is a patch / loader for it somewhere …

I guess ZFS will never be part of the Linux kernel whilst it’s licensed under the Sun CDDL …
Maybe Oracle can be talked into releasing it under the GPL as a way of currying favour back with the OSS world.

afaik ZFS has been forked, and the Linux version is nothing to do with Oracle, and as ZFS is a kernel module, although you may not get it included on an install CD because of licensing issues, I don’t see why an install process can’t download it as part of an install - indeed in the future I predict (!) that installers will be tiny and installation processes will download from the web as they go … so I’m thinking the licensing stuff is all a play on words and will evaporate as the differential is superseded by technology.

All an installer has to go is add the PPA and apt-get ubuntu zfs … legally I can conceive of no difference between a user doing this themselves and an installer doing it for them at their request …

All an installer has to go is add the PPA and apt-get ubuntu zfs .. legally I can conceive of no difference between a user doing this themselves and an installer doing it for them at their request ...

True … Ubuntu kind of do this now with flash/mp3 etc.

But it will still be a loadable module, rather than part of the mainstream kernel … so I’m just wondering how this would effect using it on the boot partition ?

Once it’s in initrd, it would be no different … :slight_smile: