Can I Get An MTBF Please?
In one of my previous posts I described how unusual it was for companies to maintain causal information (such as aircraft landings, or installed base) that could be used to perform causal forecasting. After the hard drive in our iMac went out, and I was performing a search for the most reliable model to replace it with, we learned that MTBF (mean time between failure) figures are not available for even the most commonly purchased item by companies and by individuals.
Two pieces of data are necessary to perform causal forecasting, which is very important for service parts planning:
- MTBF of the causal value
- Installed base or other causal value
With just the MTBF consumers and organizations can make informed purchase decisions. However, with both these values companies can use service parts planning software to drive our forecast and stocking. (to read more about this, see these posts)
At this point, it is well-known that the official MTBF statistics published by vendors are unreliable and pure fantasy. Because there is no objective third-party that does drive comparisons across vendors and publishes the results, there is no reliable source for failure information (if anyone knowns of one please comment on this post). Although I know that companies, especially companies that purchase and deploy large numbers of disks may keep their own private statistics. When asked questions about this topic, vendor spokesperson move into a degree of doublespeak that would make Henry Kissinger green with envy.
Where Are The Failure Stats?
According to a white paper by Wiebetech – a drive enclosure maker Manufacturers are loath to give out real world statistical information.
All of the drive vendors do what they can to obscure any differences between their drives in terms of quality or MTBF. This allows them to compete on the basis of retail box design and marketing, as well as personal business to business relationships, which appears to be their preference. A quote from a recent article on this topic in PC World reinforces how much OEMs like to dance around the issue of reliability and failure. Several drive vendors declined to be interviewed.
“The conditions that surround true drive failures are complicated and require a detailed failure analysis to determine what the failure mechanisms were.”
..said a spokesperson for Seagate Technology in Scotts Valley, Calif., in an e-mail.
“It is important to not only understand the kind of drive being used, but the system or environment in which it was placed and its workload.”
This is hilarious. Apparently hard drives are the only thing for which MTBF statistics cannot be developed. Interestingly, companies like Google or any company with a large number of servers has this information because they have many drives and their drives fail as time passes, and as they are all in servers in the same building, the usage is similar, and therefore comparable.
Vendor Studies from Russia
One of the few vendor studies on the failure rate of hard drives was performed by a company in Russia. The results are documented at this article link.
This image shows the most reliable drives with Hitachi leading all producers.
The drives I use most often are by Western Digital, but it is interesting that I can expect around 3.5 years of life from them, which squares with my experience after owning many Western Digital drives. This statement is of great interest, as it cautions against buying very high-capacity drives.
The remaining 41% exceeded 500 GB. Due to their construction and additional platters, these larger models are less durable, exhibiting an average lifespan of only 1.5 years. – Tom’s Hardware
The Costs of Publishing the Truth
It’s easy to publish positive information about vendors, but a huge headache to publish negative information. I know. I tested backup software several years ago and published my results online. My general finding was that PC backup software was very unreliable and difficult to use. Also that Norton Ghost, but in particular Acronis True Image never actually recovered a computer image properly after 10 attempts. After publishing this, I was contacted by a representative from Acronis who told us I did not know the software and that my findings were wrong. They then offered to send us the newest software,….which I took a lot of time to test…and which also failed. Publishing negative information like this, if you take advertising is even more difficult. This is one of the reason so few companies do it. CNET will publish on the different merits of products, but won’t touch the issue of reliability, nor will 98% of other publications. Consumer Reports is one of the few that does. While their publication is trailblazer in the area of reliability studies, have to have a legal team ready because they are often sued. However they do not publish at the level of detail of MTBF or other failure statistics. Something more is needed.
Article on how vendors refuse to provide real MTBF values.
It turns that it is not just the disk that is important, but the configuration as well. If a person is using an external multidrive enclosure, it appears that mirroring is the best and most reliable. This is of course because of the 100% redundancy. Redundancy does come at a cost, but with the highest quality drives now costing roughly $75 — depending upon the make, redunancy is affordable. This also brings up the topic of whether all computers should have two drives internally, so that mirroring can be accomplished. Since the iMac and most laptops do not offer mirroring, they can be seen as less reliable designs than mirrored disk computers. Mac offers the ability to boot off of external disks, which does offer the capability of mirroring. This is only one reason why online data tends to be so much better maintained than off-line data stored personal computers and exernal disks. Almost all servers use a high degree of redunancy, which includes mirroring of disks. To read more about this see this post on Box.net.
See the article here: