Had a lot of five year old 3TB and 4TB SAS drives fail this year on an H810, RAID6. Maybe to be expected.
Some questions:
1.) Who fails the drive? The H810 or Openmanage?
2.) OpenManage doesn't show SMART info so I decided to look at it. 'smartmontools' will read SAS drive SMART (I put the drives on a simple HBA to do this). Here's the output from an older Seagate which is clearly bad-
=== START OF READ SMART DATA SECTION ===
SMART Health Status: DATA CHANNEL IMPENDING FAILURE GENERAL HARD DRIVE FAILURE [asc=5d, ascq=30]
Current Drive Temperature: 26 C
Drive Trip Temperature: 50 C
Manufactured in week 24 of year 2013
Specified cycle count over device lifetime: 10000
Accumulated start-stop cycles: 72
Specified load-unload count over device lifetime: 300000
Accumulated load-unload cycles: 10924
Elements in grown defect list: 85
Vendor (Seagate) cache information
Blocks sent to initiator = 2497332109
Blocks received from initiator = 224501625
Blocks read from cache and sent to initiator = 869607790
Number of read and write commands whose size <= segment size = 5289426
Number of read and write commands whose size > segment size = 0
Vendor (Seagate/Hitachi) factory information
number of hours powered up = 27741.20
number of minutes until next internal SMART test = 58
Error counter log:
Errors Corrected by Total Correction Gigabytes Total
ECC rereads/ errors algorithm processed uncorrected
fast | delayed rewrites corrected invocations [10^9 bytes] errors
read: 1488533785 28 0 1488533813 28 28404.120 0
write: 0 0 1 1 1 2604.920 0
verify: 27761909 515 0 27762424 865 408573.176 349
Non-medium error count: 1
SMART Self-test log
Num Test Status segment LifeTime LBA_first_err [SK ASC ASQ]
Description number (hours)
# 1 Reserved(7) Completed 32 2 - [- - -]
# 2 Background short Completed 48 0 - [- - -]
Long (extended) Self Test duration: 32700 seconds [545.0 minutes]
But here's the output of a Toshiba replacement that Dell sent me under warranty that was also failed:
=== START OF READ SMART DATA SECTION ===
SMART Health Status: OK
Current Drive Temperature: 24 C
Drive Trip Temperature: 65 C
Manufactured in week 08 of year 2017
Specified cycle count over device lifetime: 50000
Accumulated start-stop cycles: 27
Specified load-unload count over device lifetime: 600000
Accumulated load-unload cycles: 172
Elements in grown defect list: 0
Error counter log:
Errors Corrected by Total Correction Gigabytes Total
ECC rereads/ errors algorithm processed uncorrected
fast | delayed rewrites corrected invocations [10^9 bytes] errors
read: 0 0 0 0 0 30182.858 0
write: 0 0 0 0 0 3030.824 0
verify: 0 0 0 0 0 3000.662 0
Non-medium error count: 5
No self-tests have been logged
So my second question is why was the Toshiba, a new drive that is healthy, failed?
Thanks, Art