Tuesday, 28 June, 2005

I'm looking for a program . . .

I'm not sure what to call the program I'm looking for.  I thought it would be called an archive manager, but a Google search on that term returns backup programs, file managers, and other file manipulation or viewing tools, but nothing like what I'm looking for.  Let me explain.

I have CD backups of my systems going back almost 10 years.  Some of the data on them is over 20 years old.  We'll leave the discussion of how useful most of that stuff is for another time.  The thing is, I have a dozen or more CDs with archives of old email messages, old source code, and all manner of stuff.  I'm afraid to throw any of the old CDs out because I don't know if they contain stuff that isn't on the newer backups.  It's clear, though, that I can't continue to add to my backup CD collection indefinitely.

What I want to do is consolidate the backups into a single directory structure, remove duplicate information, and make sure that the most recent version of any duplicated file is the one that's kept.  I know I can't fully automate the process, as I'll want the ability to manually resolve any changes, but a program should be able to do most of the grunt work for me.  Here's what I envision.

  1. Copy the entire contents of the oldest backup CD to a directory on the hard drive.
  2. Insert the next most recent backup and start a program that will compare the new CD with the existing file structure.
  3. All files that exist on the new CD but not on the original are copied without question.
  4. Files that have duplicate names, modification dates, sizes, and contents (using a CRC or MD5 hash) are ignored (not copied).
  5. Files that have duplicate names but different dates or contents are copied to the destination and assigned a version number, and are flagged in the user interface for further action.

After each CD is processed, I manually resolve the changed files by deleting one version, or by renaming so that the most recent version is the one that's kept.

If I apply that methodology to every one of my backup CDs, by the time I'm done I should have a single top-level directory that contains all of the information from the 10 years of backup CDs.  I can then go through that directory and delete anything I no longer want to save (I really don't need those emails from 1985), burn a single CD, and delete the working directory from my hard drive.  The next time I want to do a backup, I copy the backup CD to a working directory and then use the program to make the comparisons of the data on my hard drive with the backup image, copy the necessary files, and allow me to burn the result to CD.

I have to think there's a similar program available.  But I don't know what it is or even what to call it.  Any ideas?