Quantcast
Channel: DELL-Daniel My's Activities
Viewing all articles
Browse latest Browse all 2846

H810/OpenManage failing drives

$
0
0

Had a lot of five year old 3TB and 4TB SAS drives fail this year on an H810, RAID6.  Maybe to be expected.

Some questions:

1.) Who fails the drive?  The H810 or Openmanage?

2.) OpenManage doesn't show SMART info so I decided to look at it.  'smartmontools' will read SAS drive SMART (I put the drives on a simple HBA to do this).  Here's the output from an older Seagate which is clearly bad-

=== START OF READ SMART DATA SECTION ===
SMART Health Status: DATA CHANNEL IMPENDING FAILURE GENERAL HARD DRIVE FAILURE [asc=5d, ascq=30]

Current Drive Temperature:     26 C
Drive Trip Temperature:        50 C

Manufactured in week 24 of year 2013
Specified cycle count over device lifetime:  10000
Accumulated start-stop cycles:  72
Specified load-unload count over device lifetime:  300000
Accumulated load-unload cycles:  10924
Elements in grown defect list: 85

Vendor (Seagate) cache information
  Blocks sent to initiator = 2497332109
  Blocks received from initiator = 224501625
  Blocks read from cache and sent to initiator = 869607790
  Number of read and write commands whose size <= segment size = 5289426
  Number of read and write commands whose size > segment size = 0

Vendor (Seagate/Hitachi) factory information
  number of hours powered up = 27741.20
  number of minutes until next internal SMART test = 58

Error counter log:
           Errors Corrected by           Total   Correction     Gigabytes    Total
               ECC          rereads/    errors   algorithm      processed    uncorrected
           fast | delayed   rewrites  corrected  invocations   [10^9 bytes]  errors
read:   1488533785       28         0  1488533813         28      28404.120           0
write:         0        0         1         1          1       2604.920           0
verify: 27761909      515         0  27762424        865     408573.176         349

Non-medium error count:        1

SMART Self-test log
Num  Test              Status                 segment  LifeTime  LBA_first_err [SK ASC ASQ]
     Description                              number   (hours)
# 1  Reserved(7)       Completed                  32       2                 - [-   -    -]
# 2  Background short  Completed                  48       0                 - [-   -    -]
Long (extended) Self Test duration: 32700 seconds [545.0 minutes]

But here's the output of a Toshiba replacement that Dell sent me under warranty that was also failed:

=== START OF READ SMART DATA SECTION ===
SMART Health Status: OK

Current Drive Temperature:     24 C
Drive Trip Temperature:        65 C

Manufactured in week 08 of year 2017
Specified cycle count over device lifetime:  50000
Accumulated start-stop cycles:  27
Specified load-unload count over device lifetime:  600000
Accumulated load-unload cycles:  172
Elements in grown defect list: 0

Error counter log:
           Errors Corrected by           Total   Correction     Gigabytes    Total
               ECC          rereads/    errors   algorithm      processed    uncorrected
           fast | delayed   rewrites  corrected  invocations   [10^9 bytes]  errors
read:          0        0         0         0          0      30182.858           0
write:         0        0         0         0          0       3030.824           0
verify:        0        0         0         0          0       3000.662           0

Non-medium error count:        5

No self-tests have been logged

So my second question is why was the Toshiba, a new drive that is healthy, failed?

Thanks, Art


Viewing all articles
Browse latest Browse all 2846

Trending Articles



<script src="https://jsc.adskeeper.com/r/s/rssing.com.1596347.js" async> </script>