Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

Excerpt

What are the cost trade-offs of reducing server hard drive failures? What can be done to mitigate consequences to drive failures?

Table of Contents

Can we reduce the chance of hard drives failing, and at what cost/ benefit?

...

Invest in learning how to better deploy and use monitoring tools. Some tools may cost money. Maybe not all relevent, but here are some buzz-words Oliver has come across:

  • Nagios
  • NetOps
  • S.M.A.R.T.

Data from a company using over 40,000 hard drives

https://www.backblaze.com/hard-drive.html

High-Level Summary

With 40,000 hard drives, Backblaze knows a lot about the reliability of hard drives and shares the statistics: 

  • 78% of drives survive more than 4 years.
  • The median hard drive survives 6 years.
  • Drives have 3 distinct failure modes that follow a bathtub curve:
    • Early “Infant Mortality” Failure
    • Constant (Random) Failure
    • Wear Out Failure
  • As long as the temperature is within spec, reliability is not affected by heat.
  • HGST drives are generally reliable; Seagate and Western Digital hard drives’ reliability vary by model. (see web page for graphic)