Harddrrive reliability

Mercutio

Fatwah on Western Digital
Joined
Jan 17, 2002
Messages
22,275
Location
I am omnipresent
My hatred of WD is based on a lifetime of collected data. The WD20EADs drives I have are supposed to have an order of magnitude higher MTBF rating than WD Blue drives but I think that only 17 of the 24 I had still work.

I did use a lot of the 1.5TB -AS drives in one of my file servers for quite a while. I never had a problem with them, but I started buying after their original firmware issues were corrected. They got repurposed as external drives about nine months ago.

I will say that I have yet to see a Hitachi 3 or 4TB drive fail and I have a bunch of them.
 

Tannin

Storage? I am Storage!
Joined
Jan 15, 2002
Messages
4,448
Location
Huon Valley, Tasmania
Website
www.redhill.net.au
I was very gloomy when Samsung folded, and did not look forward at all to returning, after many years, to Seagate. The practical result, however, has surprised me a little: I have not had a single Seagate drive fail on me. I don't sell all that many these days, nevertheless, that's a great result. The worst thing I had happen was a quasi-DOA: a 2TB drive that detected, for some reason, as 500MB. It worked perfectly, but only at that capacity. Weird! So I used another drive for that particular build and had the faulty one swapped out before it went into service. Like Merc, I still avoid Western Digital products so far as possible. I just don't trust them the way I trust Seagate or Hitachi. Not that I trust any current drive the way I used to trust Samsungs.
 

LunarMist

I can't believe I'm a Fixture
Joined
Feb 1, 2003
Messages
17,497
Location
USA
Speaking of reliability, I have a number of drives used for a backup set in 2009. How viable are those old hard drives?
I'm thinking of redoing all that with newer drives although it will be time-consuming, not to mention the cost of drives and shipping.
 

ddrueding

Fixture
Joined
Feb 4, 2002
Messages
19,728
Location
Horsens, Denmark
I feel pretty strongly that in order to be confident about a backup set those drives should be running always and have a mechanism for verifying their integrity as well as redundancy among the drives. Bare minimum is a 2-drive NAS unit in RAID-1 with checksums documented. To protect from as many viruses as possible that NAS system shouldn't be reachable by regular network traffic (different IP subnet, or unplugged altogether unless accessing it) and require a manually entered username/password to write anything to it.
 

LunarMist

I can't believe I'm a Fixture
Joined
Feb 1, 2003
Messages
17,497
Location
USA
I feel pretty strongly that in order to be confident about a backup set those drives should be running always and have a mechanism for verifying their integrity as well as redundancy among the drives. Bare minimum is a 2-drive NAS unit in RAID-1 with checksums documented. To protect from as many viruses as possible that NAS system shouldn't be reachable by regular network traffic (different IP subnet, or unplugged altogether unless accessing it) and require a manually entered username/password to write anything to it.

No, they are in offsite storage. It would be quite impractical to run an offsite storage server.
Most of the reliability tests assume that drives are always running. I wish there would be more tests of sitting drives, for example it would nice to run them for 100 hours and then test them once every year.
 

ddrueding

Fixture
Joined
Feb 4, 2002
Messages
19,728
Location
Horsens, Denmark
I understand the situation, but then we're back to the tape days of "what are the odds that this data set is still good?" I would consider that an unacceptable position.
 

Tannin

Storage? I am Storage!
Joined
Jan 15, 2002
Messages
4,448
Location
Huon Valley, Tasmania
Website
www.redhill.net.au
Yuk! (Dave's solution.) I mean, seriously, yuk! That's a dreadful way to go about data protection. There are way, way, way too many things that can go wrong with it.

Sure, power-on, power-off cycles are hard on drives. But constant wear isn't good either (even though drives generally are very long-wearing) and the risk of damage is quite high when you add all the possibilities up, from bad power (yeah, year, UPS, filtering, yadda yadda, shit still happens), bad management (everyone makes mistakes), physical damage (bomb, asteroid strike, fire, flood), commercial risk (your service company that provides the server room goes broke without warning and you can't get your drives back from the liquidator until the lawyers have made themselves a pension - yes it happens, and more often than you'd think), theft, malicious damage, all manner of ways of hacking into your system .... have you really, really, really thought of everything?

Above all, Dave's solution is complicated and very expensive - and it doesn't matter how carefully Dave specifies his constant-on RAID storage, it doesn't matter how much high-tech wizardry he throws at it, 'coz you will always, repeat always be able to buy a lot more physical drives for the same money Dave is spending (don't forget to add in his time and expertise, 'coz that's money too), and provided you keep those physical drives of yours in different physical locations, your chance of data loss will always be much lower than Dave's.

Keep it simple, Lunar. Make a full backup, keep it at your sister's house. Make another full backup, keep it in your garage. Make another full backup, keep this one in your office. Send another to your cousin in Baltimore or Burbank. Make as many as it takes to make you feel safe. Do a fresh one every now and again (you decide how often, I do this about every 3-6 months and wing it for more recent stuff, but YMMV) and, from time to time, retire your oldest backup. You can re-use those drives for the next version.

Use internal drives with a caddy, or externals if you prefer. If in doubt, buy the second-largest model on the market - the largest ones tend to be expensive dollar-per-GB and not always quite sorted enough for best reliability. Drives these days withstand being left idle for years on end, but ideally you'd spin them up once a year or so. If you don't get around to it, don't fuss too much, it never seems to bother them. (Not like the old days!)

"What are the odds that this data set is still good?" With this simple, practical method, very good indeed, and with every extra drive set you make, you tack another zero onto the odds in your favour. And no matter what Dave does with and how much he spends on his over-complicated, over-thought solution, your odds will always be lots better than his odds for the same spend, or even half the spend. And you have a simple system you understand completely, which is a benefit in itself.
 

ddrueding

Fixture
Joined
Feb 4, 2002
Messages
19,728
Location
Horsens, Denmark
And no matter what Dave does with and how much he spends on his over-complicated, over-thought solution, your odds will always be lots better than his odds for the same spend, or even half the spend. And you have a simple system you understand completely, which is a benefit in itself.

This is clearly impossible. In the last minute I've verified that I have four completely functional copies, and that two of them are current as of this second with the other two synced as of midnight last night. I have been able to do this because the drives are online. If a drive does go bad I know immediately (because the system tells me) and I have it replaced within a couple days. If someone asked me right now if their data was backed up I can say yes without having to hedge about drive reliability or storage or the last time I tried to access it. It also requires incredibly little maintenance; set up the infrastructure, hit the web interface every couple days to make sure your alerts are still working, replace a failed drive every year or so.

Using Tannin's method to establish the same level of protection would be impossible. In how many locations is yesterdays data? Even last months? How many drives do you plan on sending out? How often will you ask for them back to verify and update? And that is before we talk about datasets that exceed the capacity of the largest single drives. Are you now sending out RAID-5 arrays of disks counting on them rebuilding correctly when you verify?

Sure, if you have a static set of data that is smaller than 6TB just make some copies and scatter them around. But that method becomes immensely labo(u)r intensive when you want to keep them updated more frequently or the dataset grows.
 

Tannin

Storage? I am Storage!
Joined
Jan 15, 2002
Messages
4,448
Location
Huon Valley, Tasmania
Website
www.redhill.net.au
Bahh. Utter nonsense. For starters, you are backing up near-live data. You have four near-live datasets: in other words you have four different ways to back up the primary data disaster that might be happening to the system right now. One undetected virus, one subtle corruption, one attack by a disgruntled employee ... wait 24 hours, and all four copies are borked!

All that aside, we are not talking about near-live storage here, we are talking about Lunar's requirement for archival backup. 24-hour cycles are meaningless and counter-productive in the context of a requirement for archival backup of data that spans years of work - Lunar's 2009 backup dataset is 5 years old already.

You can do 6TB in two drives - one drive if you want to go silly with a single bleeding edge drive rather than conservative with a pair of 3 or 4TB units. Obviously, at some point, a big enough dataset would require too many drives to be manageable. My own backups take up between two and four drives per set at present, which is perfectly manageable. (Variable number per set both because it is stored on a motely collection on 2TB, 3TB, and 4TB drives and because I prioritise the data: some I don't back up at all - very low-salience stuff that I can't quite bring myself to delete but don't really care much about - some would be a bit inconvenient to lose - back it up seldom and only keep a couple of copies - and some is vital enough to have three or four spare copies of.) Plucking a number out of the air, I'd draw a line in the sand with maybe about 8 drives per set - say 40-odd TB at present - and start looking for a different method. For Lunar, however, a multiply redundant collection of standard drives stored in widespread locations and refreshed on a multi-year cycle is by far the best method.
 

LunarMist

I can't believe I'm a Fixture
Joined
Feb 1, 2003
Messages
17,497
Location
USA
Bahh. Utter nonsense. For starters, you are backing up near-live data. You have four near-live datasets: in other words you have four different ways to back up the primary data disaster that might be happening to the system right now. One undetected virus, one subtle corruption, one attack by a disgruntled employee ... wait 24 hours, and all four copies are borked!

All that aside, we are not talking about near-live storage here, we are talking about Lunar's requirement for archival backup. 24-hour cycles are meaningless and counter-productive in the context of a requirement for archival backup of data that spans years of work - Lunar's 2009 backup dataset is 5 years old already.

You can do 6TB in two drives - one drive if you want to go silly with a single bleeding edge drive rather than conservative with a pair of 3 or 4TB units. Obviously, at some point, a big enough dataset would require too many drives to be manageable. My own backups take up between two and four drives per set at present, which is perfectly manageable. (Variable number per set both because it is stored on a motely collection on 2TB, 3TB, and 4TB drives and because I prioritise the data: some I don't back up at all - very low-salience stuff that I can't quite bring myself to delete but don't really care much about - some would be a bit inconvenient to lose - back it up seldom and only keep a couple of copies - and some is vital enough to have three or four spare copies of.) Plucking a number out of the air, I'd draw a line in the sand with maybe about 8 drives per set - say 40-odd TB at present - and start looking for a different method. For Lunar, however, a multiply redundant collection of standard drives stored in widespread locations and refreshed on a multi-year cycle is by far the best method.

Thanks, I'll not trust the old drives and make a fully new set. Mainly I'll use some new 4TB drives, along with some 3TB drives that are being replaced with the Seagate 6TB ones. Are the ST4000DM000 still a good bet now?
 

LunarMist

I can't believe I'm a Fixture
Joined
Feb 1, 2003
Messages
17,497
Location
USA
This is clearly impossible. In the last minute I've verified that I have four completely functional copies, and that two of them are current as of this second with the other two synced as of midnight last night. I have been able to do this because the drives are online. If a drive does go bad I know immediately (because the system tells me) and I have it replaced within a couple days. If someone asked me right now if their data was backed up I can say yes without having to hedge about drive reliability or storage or the last time I tried to access it. It also requires incredibly little maintenance; set up the infrastructure, hit the web interface every couple days to make sure your alerts are still working, replace a failed drive every year or so.

Using Tannin's method to establish the same level of protection would be impossible. In how many locations is yesterdays data? Even last months? How many drives do you plan on sending out? How often will you ask for them back to verify and update? And that is before we talk about datasets that exceed the capacity of the largest single drives. Are you now sending out RAID-5 arrays of disks counting on them rebuilding correctly when you verify?

Sure, if you have a static set of data that is smaller than 6TB just make some copies and scatter them around. But that method becomes immensely labo(u)r intensive when you want to keep them updated more frequently or the dataset grows.

I suspect that is only feasible with business-related equipment. It's not like I have a datacenter to send the files to or bandwidth to transmit up to 200 GB per day.
 

ddrueding

Fixture
Joined
Feb 4, 2002
Messages
19,728
Location
Horsens, Denmark
It certainly makes it easier ;)

When I worked from home I talked a friend down the street into running a NAS unit for me as a backup. As compensation, I set up a point-to-point wireless link between our houses and let him share my 100MB internet connection.
 

Bozo

Storage? I am Storage!
Joined
Feb 12, 2002
Messages
4,396
Location
Twilight Zone
At work someone made the point that if the data from a process is corrupted, then copying it some where just corrupted all the data. I set up a computer to collect all the data from the process computers. This computer was busy, with data being collected almost non stop. Inside this computer was a second data hard drive that copied the main collection hard drive every 2 hours. A second computer was set up with 9 separate hard drives in it. A boot drive, a data collection hard drive and a hard drive for each day of the week. Every 6 hours the data collection hard drive would be updated from the data collection computer. If it was Tuesday, at 3 PM Tuesday afternoon, the Tuesday hard drive would be updated from the data hard drive. The Tuesday hard drive would be upgraded again at 3 AM Wednesday. Then at 9 AM Wednesday a script would run on the Tuesday hard drive to delete anything older than 7 days. (the data collection computer deleted any thing older than 10 days, and the data hard drive in the daily backup computer deleted anything older than 8 days) This process was repeated every day of the week. Because there was 7 days of data on each days hard drive we could go back 14 days if necessary. And there was 7 copies of each days data spread across 7 individual hard drives. If a hard drive failed the data was still available. Just replace the hard drive and it will be updated when its day of the week came around again.

The daily backup computer was duplicated off site, but with it's hours shifted by 3 hours. Each process computer could hold 3 days of data. And we weren't copying corrupt data over good data.
 

ddrueding

Fixture
Joined
Feb 4, 2002
Messages
19,728
Location
Horsens, Denmark
At work someone made the point that if the data from a process is corrupted, then copying it some where just corrupted all the data. I set up a computer to collect all the data from the process computers. This computer was busy, with data being collected almost non stop. Inside this computer was a second data hard drive that copied the main collection hard drive every 2 hours. A second computer was set up with 9 separate hard drives in it. A boot drive, a data collection hard drive and a hard drive for each day of the week. Every 6 hours the data collection hard drive would be updated from the data collection computer. If it was Tuesday, at 3 PM Tuesday afternoon, the Tuesday hard drive would be updated from the data hard drive. The Tuesday hard drive would be upgraded again at 3 AM Wednesday. Then at 9 AM Wednesday a script would run on the Tuesday hard drive to delete anything older than 7 days. (the data collection computer deleted any thing older than 10 days, and the data hard drive in the daily backup computer deleted anything older than 8 days) This process was repeated every day of the week. Because there was 7 days of data on each days hard drive we could go back 14 days if necessary. And there was 7 copies of each days data spread across 7 individual hard drives. If a hard drive failed the data was still available. Just replace the hard drive and it will be updated when its day of the week came around again.

The daily backup computer was duplicated off site, but with it's hours shifted by 3 hours. Each process computer could hold 3 days of data. And we weren't copying corrupt data over good data.

Fortunately most higher-end NAS units support versioning, so even if something unfortunate happens you have the previous copies to step through. In my case I've set it to keep the most recent 8 versions, one from every day this week, one from last week, one from last month, and one from last year.
 

Tannin

Storage? I am Storage!
Joined
Jan 15, 2002
Messages
4,448
Location
Huon Valley, Tasmania
Website
www.redhill.net.au
That's a good system, Dave. Bozo's point is a very good one. With data which remains important over any length of time, it becomes crucial. I lost the photographs of an entire day a while back. Somewhere along the line I'd accidentally deleted that day on the primary storage and didn't wake up to it until some time after I had completed a whole backup cycle - in other words, even my oldest backup, somewhere close to two years old at that time, contained only empty folders for that day. So it's lost. Fortunately, it wasn't a particularly important day and had (so far as I know) only one or two really worthwhile pictures, both of which I still have the originals of because I keep various versions of worked-on pictures, including the raw file, in a scratch folder, and archive that folder along with everything else on the system. Nevertheless, a valuable lesson there.
 
Top