question Backing up compressed files

Adcadet

Storage Freak
Joined
Jan 14, 2002
Messages
1,861
Location
44.8, -91.5
Hey guys,
As can be seen in another thread (http://www.storageforum.net/forum/showthread.php?t=8940), I'm thinking about a serious backup strategy these days. Many people seem to backup compressed files to save space. Does that increase the risk of data loss should the destination drive develop some corrupted areas? Is it safe to assume that backing up uncompressed data is the safest? Any known advantage/disadvantage to one format versus another (i.e.-Window's native ZIP vs 7Zip format)?


Thanks,
Adcadet
 

Chewy509

Wotty wot wot.
Joined
Nov 8, 2006
Messages
3,357
Location
Gold Coast Hinterland, Australia
Unfortunately corrupted data is corrupted data. The only time not compressing your data will save you is when dealing with plain ASCII text files, but since the majority of files these days are in some form of binary format, you're hosed either way.

In one respect using some form of compression will assist in detecting corruption, as most compression formats use some form of checksumming within the archive. Mismatched checksums mean file corruption. In this, most compression archive formats (like zip, 7zip, rar) isolate files and compress files individually, so that corruption is generally limited to singular files, rather than the entire archive. (The only filesystems that actually checksum your data is ZFS and BRTFS both of which are not available on Windows - so using a compression archive format that does checksumming is a good thing).

As to your second question, each compression format works best with different types of data. For example bzip2 works extremely well for mostly ASCII textual files, but rar works better with binary data files. Unfortunately I don't have an up to date comparison of each format, and what they are suited for. (Know your data type, and choose the best compression algorithm suited to that data type - all else fails go with the most popular format ZIP, so in future you can get it back).

Also be away some files don't compress well or at all, since they themselves are actually compressed files, eg JPEG, MPEG, DIVX, MPEG4, MP3s, etc. (It's hard to recompress already compressed data). You'll waste CPU time trying to compress them, with little or no gain.

I used to compress all my backups, but no longer do so. This is because all my backup media (which are external HDDs, I've ditched tape) is formatted as ZFS volumes, which has internal checksumming, file compression and automatic file/block level deduplication. If I wasn't using ZFS, I would continue to compress my backup sets using bzip2 - as most of my data is ASCII or RAW camera images which bzip2 does really well on.
 

Bozo

Storage? I am Storage!
Joined
Feb 12, 2002
Messages
4,396
Location
Twilight Zone
I avoid compressed files. If the files are just copied you can read them from almost any computer. If they are compressed you need the correct program to read them which may not be available just when you need the files the most.
The compression itself adds another chance at corruption too.
Besides, hard drive space is cheap.
 

ddrueding

Fixture
Joined
Feb 4, 2002
Messages
19,729
Location
Horsens, Denmark
The other thing to keep in mind is that multimedia files (other than BMP and uncompressed RAW) are already compressed. You're going to burn through a ton of CPU time and obfuscate what files are where without actually saving much space.
 

Mercutio

Fatwah on Western Digital
Joined
Jan 17, 2002
Messages
22,275
Location
I am omnipresent
To be honest, practically every modern file format is compressed to some greater or lesser degree. Even things like Word documents and CAD drawings are inflated and deflated as they're opened.
 
Top