Saturday, 05 January, 2002

Compression Confusion

While we're on the subject of compression, I thought I'd do a little experiment.  I ran a program executable through a filter that turns it into a .HEX file.  That is, every byte of the file is represented as a 2-digit hexadecimal number in the output file.  The resulting file is exactly twice the size of the original executable, but is really just a different representation of the same information.  I then compressed both files into a single archive using WinZip's maximum compression.  Here are the results:

File Original size Compressed size
executable 8,588 bytes 4,441 bytes
hex file 17,176 bytes4,951 bytes

I'll admit my ignorance and say that I'm slightly puzzled by these results.  Whereas I didn't expect a general purpose compressor to figure out that the files are essentially the same thing and perform the hex-to-binary translation before compressing the .hex file, I certainly didn't expect a difference of 11.5% in the sizes of the compressed files.  I guess this just shows that there's still room for general purpose compressors to improve.