See my OWC storage wishlist.
Need help on storage and backup? Consult with Lloyd.
Update: see see also Backblaze Hard Drive Stats for 2017. I am glad to see that Toshiba hard drives (the brand I'm buying in 14TB size) has a 0.00% failure rate for 4TB and 5TB drives. However the figures listed are only failures for Q4 2017. The Backblaze Hard Drive Stats for Q3 2018: Less is More still show 0.0% failure for Toshiba. See also my post on a drive failure in 2016 in Even ”Enterprise Grade” Drives Fail.
Reader Rich S writes about hard drive failures:
Reallocated bad sectors are the death knell for any drive. As soon as there is one, the drive should be replaced.
My spreadsheet of failed drives shows ~30 drives I tracked that failed after a reallocated bad sector. Most of these were consumer drives used with SoftRAID mirrors. This attribute in the SMART stats should be carefully monitored and immediately acted upon.
Drive life is very variable. In tracking about 60 drives, I’ve seen everything from infant mortality to drives lasting over 40,000 hours. With a modern NAS rated drive in a two drive fail safe like a DROBO or Synology, I feel comfortable running a drive for about 30,000 hours before relegating it to offline backup status. And I try and keep RAIDs with a good mix of usage hours.
When I need a “new" drive for backup, that drive goes in the RAID to replace a high hour drive that moves to use for offline backup. RAIDS should always use NAS rated drives, not consumer drives, for a variety of reasons.
...let me get a little more granular. My reluctance for consumer drives come after a very bad run where I was constantly replacing Western Digital green drives because they had settings that did not work with 24/7 operation. So may drive failed so frequently that it changed my life habits because of being woke up by staff with messages that the servers were down.
After that, I stick to NAS rated drives for multi drive enclosures. NAS was for networked environment where multiple people needed shared access to image files for work and printing. I wouldn’t use a NAS for a single user setup. Synology was very smart in that it could sleep the drives after backup and during off hours. Before that my OSX servers ran 24/7 because they behaved better when running, as well as nightly backup. Staff were accessing files for over 12 hours a day, then nightly tape and later HD backups.
Most of those 30,000 hour drives were 6-7 years old [diglloyd: OMG]. Once I switched to the Synology, I think I was logging about 10,000 hours a year per drive in M-F operation, and sleeping rest of the time. My personal system is much more simple, but still robust.
DIGLLOYD: I would not touch Western Digital hard drives (green series or otherwise) with a 10 foot pole, also because of the poor performance I saw when I tested WD green drives.
The conclusion is a good one (use NAS drives or enterprise drives), but the logic is faulty: WD drives were notoriously poor. Other brands might have been fine. Still, for the described usage, NAS drives or enterprise drives are definitely the way to go as there are other concerns involving spindles and vibration in multi-drive setups.
Agreed that when a bad sector appears it's time to throw the drive away—dumpster—it will almost certainly fail totally within month. Fortunately SoftRAID detects this and warns, which is how I got the heads up I mentioned in Even ”Enterprise Grade” Drives Fail.
Rich S figure of 30000 hours is 3.4 years spinning constantly. That is a waste of power and money unless there is solid reason to do so—I cannot see a legitimate reason to have a drive spin half the day when I won’t be using—one spin-up a day is no big deal. But that figure reflects a workgroup usage and (too many) years of usage. For personal use, I can't see every getting beyond ~2 years of spin time.
I would never use a NAS of any kind for my work due to terrible performance—gigabit ethernet runs at half the speed of just one hard drive and is far slower than that for small files. Every NAS I’ve tried is a dog for performance. It makes sense in shared setting as with Rich S, but it is not a good solution for a workstation. While 10 gigabit on new Macs will help, latency is still very high for small files.
Too many drives
Guessing a little in terms of how used, I’d say that Rich S is using too many hard drives—migrate to larger ones. Sixty drives all but guarantees a failure or three every year. It also adds complexity, which raises the chance of mistakes. Backups on geriatric drives is a dubious approach in terms of the odds, but can be OK for backup if carefully monitored and there is a lot of redundancy. But the very idea of having to spend mental energy on monitoring 60 drives does not appeal to me. I’m speaking generally of course, as I do not know the details of Rich S’ setup.
See my discussion of validating data integrity in Reader Comment: Hard Drive Error Rates. Validating data can kick out latent errors meaning that validation might fail because the drive cannot even read the data.
Consumer vs enterprise drives
Hard drives are lumped into "consumer" vs enterprise drives and NAS drives (NAS drives are supposed to spin reliably for many years). However, some consumer drives are “waterfall” drives that are actually enterprise drives that didn’t meet some spec—bad sectors not being the issue since they are not surface tested anyway (enterprise or otherwise).
While I wouldn't consider most consumer drives for my own use (I prize reliability), a blanket statement is erroneous in my view, because it ignores waterfall drives. I would say that waterfall drives and NAS drives are A-OK, but also that some brands are more reliable than others. Last I checked, Toshiba had a very low failure rate.
While I use mostly enterprise (or waterfall) hard drives, I have not seen even a fraction of the failures that Rich S notes. But I also don’t use drives more than about 5 years old—mainly because the capacity becomes useless. I am also not keen on managing 5 dozen flaky drives for issues—better to use high grade drives of high capacity IMO. Finally, I do not spin drives 24 X 7, needlessly wasting energy. Nor do I leave backup drives spinning except when backing up (excepting one LaCie 2Big)—sleep the machine before bed, spin it up in the morning—one spin-up per day is no big deal.
Pre-detecting drives with reallocated sectors
New hard drives are not surface tested, so the best thing to do with a new drive is to graph its performance across the capacity, that is, write the entire drive and read it back, showing the performance in a graph. Which is what doing a fill-volume on a new drive using diglloydTools DiskTester does. See Checking drives before putting into “production”.
The graph makes visible remapped sectors. That is, remapped sectors show up as discontinuities in performance, such as the drive being strangely slow in the fast parts of the drive, or blazingly fast at the end. Such discontinuities also wreak havoc on RAID performance.
The graph below shows some spikes, but those are within normal range—most likely some other behavioral issue. Remapped sectors show up as a contiguous block of performance out of kilter with the pattern. See Testing Seagate 12TB Enterprise Capacity 3.5-inch (Helium) Hard Drives.