Excerpt |
---|
What are the cost trade-offs of reducing server hard drive failures? What can be done to mitigate consequences to drive failures? |
Table of Contents |
---|
See also
Can we reduce the chance of hard drives failing, and at what cost/ benefit?
...
- 2/100 drives fail per year (it's actually much higher for the drives ChemIT uses!!)
You can expect something like 1/100 of your drives to really fail this year. And you can expect another 1/100 of your drives to fail this year, but not actually be failed. You'll You’ll still pay all the operational overhead of not actually having a failed drive – rebuilds, disk replacements, management interventions, scheduled downtime/maintenance time, and the OEM replacement price for that drive – what $600 or so?
...
Invest in learning how to better deploy and use monitoring tools. Some tools may cost money. Maybe not all relevent, but here are some buzz-words Oliver has come across:
- Nagios
- NetOps
- S.M.A.R.T.
Data from a company using over 40,000 hard drives
Excerpt:
High-Level Summary
With 40,000 hard drives, Backblaze knows a lot about the reliability of hard drives and shares the statistics:
- 78% of drives survive more than 4 years.
- The median hard drive survives 6 years.
- Drives have 3 distinct failure modes that follow a bathtub curve:
- Early “Infant Mortality” Failure
- Constant (Random) Failure
- Wear Out Failure
- As long as the temperature is within spec, reliability is not affected by heat.
- HGST drives are generally reliable; Seagate and Western Digital hard drives’ reliability vary by model. (see above cited web page for graphic)