Comment Is a copy of a file a backup? A group of of storage bloggers from EMC, Nirvanix, Ocarina and other locations in the storage blogosphere have been debating this topic and have generally agreed that it is, so long as it meets certain criteria.
The background is that a backup file is traditionally a container file, holding encoded target files in a way that maximises streaming to tape and doesn't optimise restorability. But the essence of a backup isn't that it is a container file and nor is it that it resides on tape, according to this group.
A Technical Business Consultant at EMC, Scott Waterhouse, described the essence of a backup file as "an instance of data to restore from." He added three other attributes:
- "It resides on a piece of storage on a different array and/or in a different location than the source data,"
- Its creation, ageing and disposition is managed by an application, and it is in a different format from the source files,
- "At some point in its lifecycle, the backup must move offsite."
I've simplified his points for the purposes of the discussion, which then moved on. It's not enough to simply have a copy of a file as a backup because, if it is on the same disk or system, then it could go down with the system, and, if you have hundreds or thousands of copies of individual files there is no practical way of managing them without a management application. Copies, on their own, are just not effective.
Steve Foskett, the Consulting Director from Nirvanix, expanded this in his blog. To Waterhouse's three criteria above he added three more. Firstly: "The sole purpose of a backup is to allow for restore or recovery of data in whole or part. It is not appropriate to rely on a backup for other purposes."
Secondly: "Recoveries normally seek a coherent point in time representation, even if the backup system copies data more frequently or through incremental or differential techniques," and, thirdly: "The existence of the backup should not affect the performance or usability of the primary data set."
Foskett concluded: "A backup is something special. It exists outside the realm of production, waiting to present a set of data on demand."
Ocarina's George Carter, VP for products, then posed this question: "Why not just move files that are candidates for being backed up to a separate tier of storage, keeping them as files in their native format, and organizing them in time coherent views?"
He said that "you can restore whole volumes or directories," and users can restore files themselves from any point in time using a search engine. There is no need for backup software to do this and it then means it's simpler to deduplicate and compress the filestore holding the copied files.
You can use this filestore as an archive as well, and apply compliance and regulatory rules to it as policies. It's understood by Carter, I think, that Waterhouse's application to manage the creation, ageing and disposition of the copied files would be implemented in this filestore.
Carter believes that this is technically feasible today, but that inherently cautious and conservative users will stick with their known and trusted, albeit sometime unloved, backup software and infrastructure for some time yet.
Foskett took this further, agreeing that such file copies could function as backups and also provide other advantages. Because the data is in its original format, restoration is simpler and faster, and indexing is easier. It's also easier for new applications to use the data, for example, for data mining, and replication or other data moving requirements are more simply met.
Foskett says that copy-based backup is now a default paradigm for consumers, citing Apple's Time Machine and EMC's Mozy. He reckons that the various forms of disk-based business data protection, such as virtual tape libraries (VTL) and online backup, are combining to make copy-based backup a best practise. He hopes that the spread of offsite (copy-based) protection will be the "the killer app that will drive the final nail into the coffin of traditional backup."
The three main contributors to this blog discussion are all representatives of disk-based (or cloud-based) data protection vendors and so the discussion is self-serving from that point of view. However, with backup software vendors embracing disk-to-disk protection, Symantec for example with OST, and tape hardware vendors also embracing disk-based protection more and more - see Quantum's DXi2500-D announcement - there does seem to be an increasing bandwagon building to eventually roll over tape and roll it right out of the door. ®