Excerpt |
---|
What are the cost trade-offs of reducing server hard drive failures? What can be done to mitigate consequences to drive failures? |
Table of Contents |
---|
Can we reduce the chance of hard drives failing, and at what cost/ benefit?
...
Invest in learning how to better deploy and use monitoring tools. Some tools may cost money. Maybe not all relevent, but here are some buzz-words Oliver has come across:
- Nagios
- NetOps
- S.M.A.R.T.
Data from a company using over 40,000 hard drives
https://www.backblaze.com/hard-drive.html
High-Level Summary
With 40,000 hard drives, Backblaze knows a lot about the reliability of hard drives and shares the statistics:
- 78% of drives survive more than 4 years.
- The median hard drive survives 6 years.
- Drives have 3 distinct failure modes that follow a bathtub curve:
- Early “Infant Mortality” Failure
- Constant (Random) Failure
- Wear Out Failure
- As long as the temperature is within spec, reliability is not affected by heat.
- HGST drives are generally reliable; Seagate and Western Digital hard drives’ reliability vary by model. (see web page for graphic)