
I’ve been thinking a lot about backups lately. Yesterday, blogger Stephen Foskett put up a post on his Nirvanix blog, Enterprise Storage Strategies, that got me thinking about the subject some more. In seeking to answer the seemingly simple question, “What is a Backup?” he garnered input from several industry experts — coming up with a brand new, working definition of the term.
Here’s my take. Almost all traditional backups today are based on a model developed for 1990’s Unix machines. A backup software agent ran on the Unix machine, passed files to be backed up to a media server, and the media server packaged up those files to write to a tape drive. Why did you package up files - which users and applications can understand perfectly well - into backup software proprietary formats? Well, primarily because to use tape efficiently, you needed large files that could keep a tape drive streaming at a reasonable rate.
In the process, you ended up with tapes that had “backup data” in a format that only the backup software could read. Great for backup vendors, not so great for users. Now, in the modern world, people don’t have Unix machines, they don’t have tape drives, and the open question is, do they really need that backup software and media server at all? All sorts of vendors bend over backwards to build products - like disk arrays with virtual tape library interfaces - so that their new technology can look like old tape technology. Backup vendors still package up files in “saveset” file formats even though they are not writing to tapes whose heads have to be kept streaming.
The scale of data has also changed dramatically. In the 90’s, 1 Terabyte was a huge amount of data. Now you can buy that at Fry’s for 200 bucks. At this point, we’re in the Petabyte era, and there’s no sign that data growth is slowing. All of the 90’s technology is sure to begin to fall apart.
Here’s the obvious question, then: Why not just move files that are candidates for being backed up to a separate tier of storage, keeping them as files in their native format, and organizing them in time coherent views? Sure, you can restore whole volumes or directories, but with a little search engine capability, end users can find and restore any file, from any point in time, themselves without backup software or storage admins. This also makes it much more straightforward to integrate intelligent dedupe, compression and other mechanisms for storing large amounts of backup data and files efficiently.
By making backups sets of files in their native file format, as opposed to backup software-specific saveset files, it also starts to become possible to blur and merge the distinction between backups and archives. Retention policies, compliance and regulatory rules can be applied to files based on their metadata, their contents, their owners, their business context - because all of that information is there in the backup file set just the way it was in the primary copy of the data. While this is possibly all years away - people are very conservative about changes in their backup technology and workflows - the technology to do next gen backups that actually are designed and built with modern technology that looks and acts like modern technology, instead of masquerading as tape, is all here and ready today.






[...] Carter George continued the conversation on backups, asking if the conventional backup paradigm was obsolete, and if file copies could serve the same purpose. As mentioned in our "What Is a Backup?" [...]
[...] Part of the discussion about backups revolves around the format of the backup. Some argue it should be in a different format, while others say the same format. Why does this matter? Well long ago, when storage wasn’t that affordable, Unix systems packaged up data and compressed it before backing it up for efficiency and storage considerations. Carter George recognizes, Great for backup vendors, not so great for users. [...]