In the lab, yes. Oracle killed my plans of adopting it.
I don't know where you got that. It's actually just the opposite. It checksums everything and includes several facilities to catch early corruption and address it before you ever lose data.
Yes, that is the idea. But it is based, as the article mentions, in ZFS uses atomic writes and barriers. So, how can you guarantee that? ZFS does it by writing data to the drive and making sure that the data actually has been written, how can ZFS be sure of that? It asks the drive. The problem is that consumer drives often lie and report that data has been written when it is currently only residing in the cache. And that's a great recipe for data corruption.
ZFS checksums are beyond awesome, but they don't combat this.
Jeff Bonwick explains it better than I could:
As you note,
some disks flat-out lie: you issue the synchronize-cache command,
they say "got it, boss", yet the data is still not on stable storage.
Why do they do this? Because "it performs better". Well, duh --
you can make stuff really fast if it doesn't have to be correct.
Before I explain how ZFS can fix this, I need to get something off my
chest: people who knowingly make such disks should be in federal prison.
It is fraud to win benchmarks this way.
I personally have pulled the power plug on 24 to 48 disk arrays, literally hundreds of times while under heavy load and have never once lost data.
I hope you don't use consumer drives or really do your research.
The same problem has bitten people running ZFS in a virtual machine, such as vmware ESX, that had similar problems where they couldn't trust the drive to behave as it was told, and corruption followed. Which kind of sucks when you don't have any means of repairing the damage.
Also, the replies I've gotten seems to imply that I have something against ZFS. I love ZFS but I see more potential in btrfs - but btrfs is many years away from even being worth considering an opponent to ZFS so that is kind of moot. All I want is a decent copy-on-write linux file system with cheap snapshots (and preferably good encryption support).
Recent versions of ZFS have solved the problem of drives ignoring the synchronize cache command.
Now ZFS always keeps around the latest 3 transaction groups, regardless if any data on any one of those transaction groups has been freed/updated already.
So if the last transaction group gets corrupted during an abrupt power down, it can always go back to the latest consistent one.
By spending effort and providing tools for data recovery and file system repair (since btrfs is neither stable nor mature these leave much to be desired but the effort is clear and it is quite a contrast compared to ZFS).
The ZFS stand on this is that nothing can go wrong. If something does go wrong, which by the way is impossible, you better have your backups ready because you are on your own.
A good fact to keep in mind is that ZFS is meant to run on adequate hardware. ECC Ram (ZFS is more prone to corruption from random memory errors than most other filesystems, it assumes your ram is reliable), drives that don't lie about writes, adequate CPU overhead. it was designed to go BIG - and when you go big, these things are a given. If you skimp on any of these, it doesn't look as good - but small-scale wasn't it's target market.
(And I'm not saying nobody should use ZFS on a small scale... but it was designed with some specific assumptions that are perfectly fair given it's target market)
If you are runninga a system at a scale where ZFS makes sense and not making backups of critical data, your operations process is fundamentally broken anyway. ZFS doesn't change the need for backups one bit (and it brings to the table some novel ways of doing offsite snapshots and whatnot, to boot)
In the lab, yes. Oracle killed my plans of adopting it.
I don't know where you got that. It's actually just the opposite. It checksums everything and includes several facilities to catch early corruption and address it before you ever lose data.
Yes, that is the idea. But it is based, as the article mentions, in ZFS uses atomic writes and barriers. So, how can you guarantee that? ZFS does it by writing data to the drive and making sure that the data actually has been written, how can ZFS be sure of that? It asks the drive. The problem is that consumer drives often lie and report that data has been written when it is currently only residing in the cache. And that's a great recipe for data corruption.
ZFS checksums are beyond awesome, but they don't combat this.
Jeff Bonwick explains it better than I could:
As you note, some disks flat-out lie: you issue the synchronize-cache command, they say "got it, boss", yet the data is still not on stable storage. Why do they do this? Because "it performs better". Well, duh -- you can make stuff really fast if it doesn't have to be correct.
Before I explain how ZFS can fix this, I need to get something off my chest: people who knowingly make such disks should be in federal prison. It is fraud to win benchmarks this way.
http://mail.opensolaris.org/pipermail/zfs-discuss/2008-Octob...
I personally have pulled the power plug on 24 to 48 disk arrays, literally hundreds of times while under heavy load and have never once lost data.
I hope you don't use consumer drives or really do your research.
The same problem has bitten people running ZFS in a virtual machine, such as vmware ESX, that had similar problems where they couldn't trust the drive to behave as it was told, and corruption followed. Which kind of sucks when you don't have any means of repairing the damage.
Also, the replies I've gotten seems to imply that I have something against ZFS. I love ZFS but I see more potential in btrfs - but btrfs is many years away from even being worth considering an opponent to ZFS so that is kind of moot. All I want is a decent copy-on-write linux file system with cheap snapshots (and preferably good encryption support).