How to fix my ZFS pool mistakes

abies_exarchia@lemm.ee · 2 years ago

How to fix my ZFS pool mistakes

mlaga97 · 2 years ago

I think it’s worth pointing out that this article is 11 years old, so that 1TB rule-of-thumb probably probably needs to be adjusted for modern disks.

If you have 2 full backups (18TB drives being more than sufficient) of the array, especially if one of those is offsite, then I’d say you’re really not at a high enough risk of losing data during a rebuild to justify proactively rebuilding the array until you have at least 2 or more disks to add.

Hopfgeist@feddit.de · 2 years ago

Let’s do the math:

The error-reate of modern hard disks is usually on the order of one undetectable error per 1E15 bits read, see for example the data sheet for the Seagate Exos 7E10. An 8 TB disk contains 6.4E13 (usable) bits, so when reading the whole disk you have roughly a 1 in 16 chance of an unrecoverable read error. Which is ok with zfs if all disks are working. The error-correction will detect and correct it. But during a resilver it can be a big problem.

mlaga97 · 2 years ago

If the actual error rate were anywhere near that high, modern enterprise hard drives wouldn’t be usable as a storage medium at all.

A 65% filled array of 10x20TB drives would average at least 1 bit failure on every single scrub (which is full read of all data present in the array), but that doesn’t actually happen with any real degree of regularity.

Hopfgeist@feddit.de · 2 years ago

Then why do you think manufacturers still list these failure rates (to be sure, it is marked as a limit, not an actual rate)? I’m not being sarcastic or facetious, but genuinely curious. Do you know for certain that it doesn’t happen regularly? During a scrub, these are the kinds of errors that are quietly corrected (althouhg the scrub log would list them), as they are during normal operation (also logged).

My theory is that they are being cautious and/or perhaps don’t have any high-confidence data that is more recent.

mlaga97 · 2 years ago

I’ve read many many discussions about why manufacturers would list such a pessimistic number on their datasheets over the years and haven’t really come any closer to understanding why it would be listed that way, when you can trivially prove how pessimistic it is by repeatedly running badblocks on a dozen of large (20TB+) enterprise drives that will nearly all dutifully accept hundreds of TBs written to and read from with no issues when the URE rate suggests that would result in a dozen UREs on average.

I conjecture, without any specific evidence, that it might be an accurate value with respect to some inherent physical property of the platters themselves that manufactures can and do measure that hasn’t improved considerably, but has long been abstracted away by increaed redundancy and error correction at the sector level that result in much more reliable effective performance, but the raw quantity is still used for some internal historical/comparative reason rather than being replaced by the effective value that matters more directly to users.