2010
← How to not upgrade an Ubuntu serverWhat happened to the story line →
  Backups Better safe than sorry!
Sun 4th April 2010   

Do you copy?

When I ask to my family, friends and colleagues how they proceed to keep safe their important digital data, most of the time I get one of these answers:
- I've nothing worth keeping
- Well, it's on the PC/Mac so it's safe no?
- I have copies of all my files in another folder on my hardrive !
- I'm doing backups regularly on cd/dvd/exernal hardrive
admittedly some people are answering differently because they have already been burnt by the issue of unsafe backups and have solved the issue by implementing a real strategy.

So let me tell you why should you be worried about the notion of backups.
From floppies to hard-drives, from tapes to DVD\'s...
From floppies to hard-drives, from tapes to DVD\'s...

Why are people not good at doing backup?

Most people have a very casual approach about backups, learned from using legacy material like real printed out photos, negatives, notebook, diaries, letters and post cards, insurance and tax papers, bank account reports, book club subscriptions, receipts for things you bought, etc... these things take a lot of room, they are spread over between your office, drawers in the living room, in the attic, in the basement, on the photo frames, at your parent's house, etc...

These things can of course be lost, destroyed, stolen. But in the real world what happens is that you will lose few photos, some papers will get lost, possibly a fire in your home will destroy some of it1, but it's not generally going to be a full scale destruction, and it's rarely going to be stolen2.

So by their simple nature as individual items that tend to get spread over, your old-style data is self preserving. That's mostly why people still have the diaries of their grand father, love letters of the grand grand mother, that people can write biographies, or that we can publish post-mortem books - lost manuscripts found -.

That's where the main difference between digital data and legacy material is. The digital data does not take any physical room, and the storage capacities of our equipment tend to grow faster than our ability to generate these data, so it seems logical that it all ends-up in one single location: Your main computer3. If something bad happens to this machine, you may risk losing it all at once, except of course if you have backup copies of all these data.

Should I backup everything?

The short answer is no.

The long answer is that you should backup only the things that are unique, or that would be very difficult to recover in the case where you have lost them.

Typical things you should backup, are your mails, original of the photos and videos you made, electronic documents sent by your employer or bank, manuscripts and documentation you are writing, the source code of your next killer application, website data, database content, connection information and passwords4, registered applications downloaded from a website, game saves, internet favoured sites, ...

Typical things you should not backup are your installed applications, MP3 and DIVX collections, your temp folder, the operating system files, ...

The rationale for that is that the more data you have to backup, the more costly (in term of time and storage requirement) it become, generally resulting in a non-optimal backup strategy.

A note about integrity and confidentiality.

When devising a backup strategy, it's important to acknowledge the fact that some of your data is more important than some other, either because losing it would be a major annoyance, or because having other people having it would be a real issue.

Typically you don't want to lose important data from your bank or insurance company, you don't want to lose all the passwords and login information to all these websites, but you certainly don't want these information to be stolen because this would expose you to a number of issues such as identity theft or people posting disturbing messages using your login information on facebook.

Assuming you are not producing gigabytes of amateur porn video, the amount of sensitive data you have should be quite moderate in comparison to the normal data, so it would make sense to use something simple to keep this data safe5. The important is that you are aware that keeping all your passwords in clear text in a text file called my_important_passwords.txt is not a very good idea.

Characteristics of a good backup system.

Instead of giving you a ready to use backup method, I will just enumerate the important characteristics of a good backup system. The reason is simply that with the time passing some new options may become available possibly making obsolete (or non optimal) the existing backup solutions.

It's actually pretty easy to see if a backup system is good or not. You just need to consider a number of points. A good backup is...

...done frequently

This may seems obvious, but when a problem happens you will not be able to recover anything that happened after your last backup. So the more often the backup procedure is running, the less data you are going to lose.

An idea backup system would be able to recover every single small modification done on your data, but in practice once a day should be good enough.

...automated

If you need to perform a manual operation to start your backup, Murphy's Law states that the problem will happen this particular moment when you were not able - of forgot - to run it. It will also happen the day when you transferred these 600 photos from your digital camera - erasing them from the memory card in the process -.

The ideal backup system should be setup once, and then work automatically without you having to do anything special.

...non intrusive

If your backup procedure interferes with the usage of your machine you will start to disable it and bad things will happen.

...easy to control

Doing backups is nice, being sure they are actually correctly performed is better: You will know for sure if your backup strategy was working the day you need to restore data after a nasty event, and trust me, you don't want to discover at this moment that some data were never backed-up.

You should trust your backup system, so from time to time check that everything seems fine, and that all your important files are here.

...easy to restore

That one sound obvious, but many people found out the hard way that their backups were very hard to use. Typical examples are complex tape systems requiring dedicated hardware and software installed, or incremental backup systems that require to know exactly when you want to restore the data, or DVD based systems where you have to put back the disks one by one in a perfect order. Other possible problems are systems that require a master password or key... which happens to be only present in your backup data.

So to be on the safe side, try to restore one of your backups on a new machine or drive, just to see how long it takes, how easy it is, and if everything is present.

...redundant

Another obvious one: A backup is mostly a copy of something. The more copies you have, the more chances that at least one copy survives 6. If there is only one version it's not backed-up.

...refreshed

Most people think that burnt CD's or DVD's are reliable forms of backups. They are not: If you are lucky a well protected optical media will keep the information at least 10 years, but if it's not protected it's common to not being able to read it back after two or three years. The reliability depends of the brand, quality of discs, quality of the writer device, writing speed, etc... the same thing applies to magnetic medias like floppies or hard-drives, the higher the density of information, the faster the data will become unreadable. And of course don't imagine one moment that memory cards are any better.

The only practical solution is to backup regularly and replace the media from time to time. We are still waiting for the digital equivalent of hieroglyphs and stone tablets7.

...geo-dispersed

That one is a very important one. Having multiple copies of your data is nice, but if one single event can annihilate all the copies at once then it's not much of a security. If some of the copies are stored in a physically distant location you guarantee that a fire or a burglary will not make every single copy to disappear.

The opposite would be to have a backup on a secondary hard drive in the same machine as the main data: If the machine is stolen or attacked by a nasty virus, if the disk controller dies, if the machine is victim of a power surge, if it fall down... the result is the same: You lose both the data and the copies at the same time. That's the same reason as why RAID is not a backup method 8.

...self-sufficient

That one is something not everybody agrees on. Basically the idea is that any part of data you manage to backup, should be usable to restore lost data. This means that you should not compress your data in one big archive split on 5 DVD's, or do incremental backups.

The reason for that, is that the backup target can have suffered corruption, or parts can be missing, and not being able to recover anything because just a part of the backup data is unusable... is just plain wrong.

Any recommendation?

What I would suggest first is to be pragmatic: You are not going to be able to design a 100% perfect backup system9 . So just try to found a solution that works for you with a good enough balance between cost and safety.

Ok, let's discuss some of the things you can do to make your backups easier.

Reorganize your data

The simplest thing is the one that people generally forget: Organize your data in a way that makes it easy to backup them all. Instead of 50 folders spread over 3 hard-drives it's a lot easier to have one or two big folders containing only the important data. It will also makes it easier to not erase important data by mistake when you want to make some cleaning on your machine.

On my machines I explicitly set up the location of the My Documents (or Home) folder on my secondary drive[9. allows me to access it from Windows and Linux, and I can reinstall the operating system without having to restore the data.]. This folder contains sub-folders with my photos and pictures, administrative papers, bank statements, game designs, travel reservations, my todolist file, etc... I also have a small TrueCrypt archive for the more confidential data.

For historical purpose I also have some backups targets that are out of this folder, but I would probably do it differently if I started from scratch. Since it's working, I'm not touching it now. So I have another folder for the website, one for the source code, and finally one for the registered software.

Start simple

Just get yourself an external hard-drive, and some software to do the backup of your important data. Some products comes bundled with their own backup software, some are good, some are bad, some are simple to use, some are super non intuitive. You may use the bundled software, but you can also use your own: It can be as simple as a simple script started when you connect the hard-drive to the machine:
  1. Create a text file called autorun.inf on the backup disk.
  2. Open the file, and enter the following text:

    [autorun]
    action=Start the backup
    open=cmd.exe /C backup.bat
    icon=backup.ico
    label=Backup my data


  3. Create a text file called backup.bat on the backup disk.
  4. Open the file, and enter the following text:

    ECHO BACKUP STARTED
    XCOPY /D /E /C /I /R /Y D:\Photos Backup
    XCOPY /D /E /C /I /R /Y D:\Documents Backup
    PAUSE

The Windows XP autorun dialog
The Windows XP autorun dialog

Of course you are supposed to modify the script to actually point on the location where your data is stored. Oh, and don't forget to add an icon file on the disk too, it would work without, but it looks more professional with one :)

You can find some pretty icons on http://findicons.com, you can learn more about the scripting commands by launching CMD.EXE and then typing HELP or XCOPY /?

If everything works fine, now when you plug your external drive, the autorun dialog should appear, with the option to start the backup.

Many people now have mini-servers at home (NAS, WHS, ...10) used to access media files so they can be replayed on the TV or the audio system. These systems can also be used to host files, so they are perfect candidates for automated backup scripts. Since a server is always on, you can make sure that the backup script is automatically started when you start-up or shut-down your machine, or as a scheduled task. If you have such a machine I really recommend investing in an UPS11 as well.

We just solve the issue of redundancy. Now it's easy to make a copy of your data. Of course you will still lose everything if something bad happen in your home. Let's improve that now.

Improvements

The most significant improvement you can do, is to make sure you have copies available in a distant location, you can achieve that in a number of ways:
  • You can organize data swaps with a person who trust (friend, member of the family, colleague) who live in a different place but close enough so you meet her often. Each of you can have a similar hard-drive (with twice the capacity), you modify the backup script to take into consideration the computer or user name, you do your backups regularly as usual, you take the drive with you when you meet, and you exchange them. If something bad happens, the other person happens to have a not too old copy of your own data. And you have the other persons data too.
  • If you are an IT worker, you most probably have a computer at work with a storage capacity that largely exceeds your actual needs. Most people have machines with multiple hundreds of gigabytes while in practice all they needed was enough to install Windows, the applications, and few hundreds of megabytes of diverse documents. If that's the case, check with the operation department or your team leader if you can be authorized to have a personal folder on the machine. If you get the agreement, then you can use your work machine as a backup replication: Bring your portable hard-drive at work, and copy the content to the work pc.
  • Do online backups. You can do that either by doing copies of your data on a server hosted somewhere else, or you can use a dedicated backup service provider. I'm personally using Jungle Disk but there are numerous other companies offering similar types of services. Just make sure to review carefully, search for negative opinions from previous users, compute how much it's going to cost you based on your usage patterns, check if the service is Windows only, etc...

Conclusion

My objective was not to write the definitive guide on backups but just to get a list of important things to consider. In the few last years some people in my family - myself included -, colleagues, friends, suffered of data loss due to some inadequate backup procedures.

I know it was quite a long wall of text, but I hope some of you at least will try to remedy to their situation, and will be among the ones not crying out loud they have lost all the photos of their child.


1. If not by the fire, by the water used by the fire department
2. If only because nobody care about your super 8 movies showing the first steps of your baby, the pictures of your holidays in Iceland, or the 20 years archive of your insurance statements: The usability/encumbrance ratio is not worth the hassle, better steal your mobile phone, dvd player, digital frame, flat screen tv and laptop.
3. Where your mail, the online shopping receipts, holidays photos, scans of important documents, HD movie of your brother's wedding, the three first chapters of your book, your complete address list, the bookings for the summer holidays, etc... all that is on the same machine, spread over various folders
4. Don't skip the part about confidential data
5. Possible suggestions are to use password protected archives or a small TrueCrypt container. The topic would deserve a full article so I will not enter more in the details
6. That's why distributed source control systems and peer to peer software are so good at keeping stuff alive on the internet.
7. Don't expect to use the current technology to make your own digital time capsule. When people will open it in 50 or 100 years most probably everything will be unreadable, the equipment you judiciously bundled with the data will be unusable (dead capacitors or batteries, corroded connections), your best bet is to get a laboratory to print out your photos.
8. RAID (Redundant Array of Inexpensive Disks) makes your main storage more reliable, faster, or both. And that's it. Nothing else.
9. do you really think your backups will still be working if a giant meteorite hits the earth? Do you think it would actually matter?
10. Network Attached Storage, Windows Home Server
11. Uninterrupted Power Supply, you can read more about in a previous post
comments powered by Disqus