Latest or all posts or last 15, 30, 90 or 180 days.
2024-04-26 01:53:26
Designed for the most demanding needs of photographers and videographers.
877-865-7002
Today’s Deal Zone Items... Handpicked deals...
$1999 $998
SAVE $1001

$500 $400
SAVE $100

$2499 $1999
SAVE $500

$5999 $4399
SAVE $1600

$2499 $2099
SAVE $400

$5999 $4399
SAVE $1600

$999 $849
SAVE $150

$1049 $849
SAVE $200

$680 $680
SAVE $click

$300 $300
SAVE $click

$5999 $4399
SAVE $1600

$4499 $3499
SAVE $1000

$999 $999
SAVE $click

$799 $699
SAVE $100

$1199 $899
SAVE $300

Delayed For Days: Hard Drive Horrors

Update: using a Apple Macmini (so I don’t have to touch my production system), I’ve isolated the problem to Seagate 12TB enterprise drives in one particular OWC Thunderbay enclosure—they run fine for a while, but once triggered, it’s perma-failure in that enclosure unless it sits for many hours. The cause is most likely a firmware issue on the Seagate drives, according to highly credible source (these are over a year old, so whether current models at issue, no idea). The Toshiba 14TB drives have never failed me, even in the problem enclosure. However, I’ve only been using them a short while. Still, I could not provoke a failure when swapping them for the Seagate drives.

...

I’ve gotten little done the past 3 days due to storage issues.

My primary RAID-5 and RAID-0 storage volumes kept going offline due to disk I/O errors. I thought it was a bad drive since always the error was on the same drive. But replacing that drive, it just chose another drive to fail with (and always failing on the same drive, just as before). It was sporadic at first, but got progressively worse until I could provoke it within a few seconds.

Ultimately my main store was hosed badly enough for macOS to force it to read only. The faults were so frequent that while SoftRAID kept rebuilding the RAID-5 successfully, things began to fail so often that the rebuild could not occur (the whole bus was hosed). SMART status is/was OK for all drives and there are/were no remapped sectors.

At one point, the failures propagated to other devices including six other (non-RAID) hard drives in the Thunderbay 6, and hosing a brand-new OWC Thunderblade so that it I/O errors trying to initialize it (was able to fix it later on a 2016 MBP). Whatever hardware issue is going on is pretty darn scary, hosing the entire Thunderbolt bus. I strongly suspect that the whole problem is due to the firmware of the hard drives. It could also be the enclosure firmware perhaps, or a bad interaction. More on that below.

The worst case (and a serious possibility) is Apple Core Rot, e.g., a bug in macOS. But it would have to be on both 10.13 High Sierra and 10.14 Mojave.

Isolating for the cause

A summary of just how much I did to isolate the issue:

  • NOT the drive—replacement drive fails too. And this time the failure is on one of the drives that was already there, not the replacement drive. Which tells me that it has nothing to do with the drives.
  • NOT computer specific (reproduced on 2017 iMac 5K and 2016 MacBook Pro)
  • NOT cable specific (two 0.5m cables and one 2m cable tried).
  • NOT unit specific (two different OWC Thunderbay 4 units and one OWC Thunderbay 6).
  • NOT bay specific (swapped drive into another bay, error followed the drive).
  • NOT macOS version specific; fails on both macOS 10.13 High Sierra and 10.14 Mojave (two different machines).
  • NOT software specific: can provoke with a Finder copy or an "ic verify" (by sheer good luck, Carbon Copy Cloner did not provoke the issue, so I was able to make up-to-date backups).
  • NOT a daisy chaining issue (direct connect, nothing else on that port).
  • NOT an interaction with other peripherals (sole peripheral on the 2016 MacBook Pro)
  • NOT a bad file system (Disk Utility gives clean bill of health, plus the errors are right off the drive).
  • Could not reproduce the issue with a single drive, a 2-drive RAID-0 or a 3-drive RAID-0, only a 4-drive RAID-0 or RAID-5.

Cause TBD

I still don’t know for certain what the cause is, but it might actually be the firmware on the hard drives but I won’t name the suspect hard drives until I have some certainty—would not want to blame unfairly. To be clear, they are NOT the Toshiba 14TB drives, which so far have performed flawlessly (and very quietly)—love 'em.

I expect to have a fresh set of Toshiba 14TB MG07ACA hard drives tomorrow with which I can perform one more test to verify or disprove that theory: try to reproduce the problem with the Toshiba 14TB drives versus the problem drives. If I cannot reproduce the problem with the Toshiba 14TB drives, then I will reproduce it with the problem drives. I’ll do this several times, and if the Toshiba drives do not fail and the other ones do, then I will finally have an answer, and a solution—get rid of the problem brand. If both fail, then I’ll have to blame the enclosure firmware.


View all handpicked deals...

Seagate 22TB IronWolf Pro 7200 rpm SATA III 3.5" Internal NAS HDD (CMR)
$500 $400
SAVE $100

diglloyd Inc. | FTC Disclosure | PRIVACY POLICY | Trademarks | Terms of Use
Contact | About Lloyd Chambers | Consulting | Photo Tours
RSS Feeds | X.com/diglloyd
Copyright © 2022 diglloyd Inc, all rights reserved.