Chewy509
Wotty wot wot.
Saw this linked off HardOCP...
http://www.zdnet.com/who-makes-the-best-disk-drives-7000025375/
Take it was you wish.
http://www.zdnet.com/who-makes-the-best-disk-drives-7000025375/
Take it was you wish.
I feel pretty strongly that in order to be confident about a backup set those drives should be running always and have a mechanism for verifying their integrity as well as redundancy among the drives. Bare minimum is a 2-drive NAS unit in RAID-1 with checksums documented. To protect from as many viruses as possible that NAS system shouldn't be reachable by regular network traffic (different IP subnet, or unplugged altogether unless accessing it) and require a manually entered username/password to write anything to it.
And no matter what Dave does with and how much he spends on his over-complicated, over-thought solution, your odds will always be lots better than his odds for the same spend, or even half the spend. And you have a simple system you understand completely, which is a benefit in itself.
Bahh. Utter nonsense. For starters, you are backing up near-live data. You have four near-live datasets: in other words you have four different ways to back up the primary data disaster that might be happening to the system right now. One undetected virus, one subtle corruption, one attack by a disgruntled employee ... wait 24 hours, and all four copies are borked!
All that aside, we are not talking about near-live storage here, we are talking about Lunar's requirement for archival backup. 24-hour cycles are meaningless and counter-productive in the context of a requirement for archival backup of data that spans years of work - Lunar's 2009 backup dataset is 5 years old already.
You can do 6TB in two drives - one drive if you want to go silly with a single bleeding edge drive rather than conservative with a pair of 3 or 4TB units. Obviously, at some point, a big enough dataset would require too many drives to be manageable. My own backups take up between two and four drives per set at present, which is perfectly manageable. (Variable number per set both because it is stored on a motely collection on 2TB, 3TB, and 4TB drives and because I prioritise the data: some I don't back up at all - very low-salience stuff that I can't quite bring myself to delete but don't really care much about - some would be a bit inconvenient to lose - back it up seldom and only keep a couple of copies - and some is vital enough to have three or four spare copies of.) Plucking a number out of the air, I'd draw a line in the sand with maybe about 8 drives per set - say 40-odd TB at present - and start looking for a different method. For Lunar, however, a multiply redundant collection of standard drives stored in widespread locations and refreshed on a multi-year cycle is by far the best method.
This is clearly impossible. In the last minute I've verified that I have four completely functional copies, and that two of them are current as of this second with the other two synced as of midnight last night. I have been able to do this because the drives are online. If a drive does go bad I know immediately (because the system tells me) and I have it replaced within a couple days. If someone asked me right now if their data was backed up I can say yes without having to hedge about drive reliability or storage or the last time I tried to access it. It also requires incredibly little maintenance; set up the infrastructure, hit the web interface every couple days to make sure your alerts are still working, replace a failed drive every year or so.
Using Tannin's method to establish the same level of protection would be impossible. In how many locations is yesterdays data? Even last months? How many drives do you plan on sending out? How often will you ask for them back to verify and update? And that is before we talk about datasets that exceed the capacity of the largest single drives. Are you now sending out RAID-5 arrays of disks counting on them rebuilding correctly when you verify?
Sure, if you have a static set of data that is smaller than 6TB just make some copies and scatter them around. But that method becomes immensely labo(u)r intensive when you want to keep them updated more frequently or the dataset grows.
At work someone made the point that if the data from a process is corrupted, then copying it some where just corrupted all the data. I set up a computer to collect all the data from the process computers. This computer was busy, with data being collected almost non stop. Inside this computer was a second data hard drive that copied the main collection hard drive every 2 hours. A second computer was set up with 9 separate hard drives in it. A boot drive, a data collection hard drive and a hard drive for each day of the week. Every 6 hours the data collection hard drive would be updated from the data collection computer. If it was Tuesday, at 3 PM Tuesday afternoon, the Tuesday hard drive would be updated from the data hard drive. The Tuesday hard drive would be upgraded again at 3 AM Wednesday. Then at 9 AM Wednesday a script would run on the Tuesday hard drive to delete anything older than 7 days. (the data collection computer deleted any thing older than 10 days, and the data hard drive in the daily backup computer deleted anything older than 8 days) This process was repeated every day of the week. Because there was 7 days of data on each days hard drive we could go back 14 days if necessary. And there was 7 copies of each days data spread across 7 individual hard drives. If a hard drive failed the data was still available. Just replace the hard drive and it will be updated when its day of the week came around again.
The daily backup computer was duplicated off site, but with it's hours shifted by 3 hours. Each process computer could hold 3 days of data. And we weren't copying corrupt data over good data.