Deep dive El Reg has teamed up with the Storage Networking Industry Association (SNIA) for a series of deep dive articles. Each month, the SNIA will deliver a comprehensive introduction to basic storage networking concepts. This month the SNIA examines archiving in today's data centres.
What is an archive?
An archive is a collection of data objects, perhaps with associated metadata, in a storage system whose primary purpose is the long-term preservation and retention of that data (definition from the 2011 SNIA Europe Dictionary). One of the longest-standing trends in the storage market has been the constant data growth, and archives address this challenge by removing unused data from primary and backup storage.
One of the biggest challenges for today’s end users is to understand what data is critical and fundamental for their businesses, which needs to be preserved (and for how long), and which can be deleted, usually according to Service Level Agreements (SLA) and corporate and/or government regulations.
The ‘keeping everything forever’ is not an effective strategy because storing all data is expensive, time consuming and doesn’t necessarily work if you don’t have a suitable discovery and retrieval process in place.
However, it is key to retain the relevant data in order to be compliant. This means that although data retention can be a headache the archival market will continue to grow as organisations need to better manage their information and be able to produce it on demand.
What an archive is not
There is significant confusion among end users about the various elements of a data storage strategy/infrastructure and it is pivotal that they understand for example that a data archive is not the same as a backup, data protection or simple storage tiering.
For instance a backup involves both inactive and active data that users typically need to keep for the short term and that can be periodically overwritten.
We could consider the traditional backup systems as a collection of data stored on (usually removable) non-volatile storage media for recovery purposes should the original data be lost or become inaccessible due to accidental deletion, inadvertent modification, hardware failure, or procedural error. To be successfully recoverable, a backup must be made by copying the source data image when it is in a consistent state.
An archive on the other hand is more about the storage of data that users need to retain and access in the long-term for analysis, value generation, history, or compliance.
Sometimes archives also get confused with Hierarchical Storage Management (HSM), which is however used to reclaim primary storage and is based on the migration of data between different platforms driven by policies based on usage or age.
When organisations need to move older data to lower cost tiers for storage reclamation, the storage is physically partitioned into multiple distinct classes based on price, performance or other attributes; data may be dynamically moved among these classes in a tiered storage implementation based on access activity or other considerations.
The challenge of a HSM approach is that the primary storage is overflowing with aged data; according to IDC 95 per cent of data is unstructured, data which is rarely accessed after creation.
Sometimes, this is why existing data management policies cannot cope with the data growth problem; for example to back up static data users usually need longer windows.
However, they don’t usually address this issue because they lack understanding of the data value. This is a strong indication that there is a widespread need to change the mindset among storage administrators. So, in order to have more complex visibility into access patterns, you would need to revise and update your internal processes.
In addition, archives needs to satisfy specific regulatory requirements related to data retention, disposal and change history. Based on the SNIA-Data Management Forum’s "100 Year Archive Task Force Requirements Survey", most organisations have a long-term retention problem that exceeds 50 years and the respondents are far from confident that they can meet these requirements.
Not only there is a disconnection between awareness and requirements but, as said before, IT lacks the methods (and often the interest) for long term data preservation.
Graphic from SNIA Tutorial: Archiving for Data Protection for the Modern Data Center by authors Tony Walker, Dell, Inc. and Molly Rector, Spectra Logic © 2010 Storage Networking Industry Association. All Rights Reserved
Why archiving is important
Archiving is as important as the data it stores. There are typically two reasons why users archive: because they want to or because they have to. However they must both determine their retention requirements on creation of the archive.
For example, if some data is valuable to the business, then it is necessary to design and implement policies and software to enable the migration from the production environment into the archive. If data retention is required for compliance reasons on the other hand archive policies will typically require legal approval.
When data is archived it is moved from the production environment to the archive. This movement removes data from the production environment, in order to guarantee the compliance. The archive environment then occurs across all storage tiers and provides long-term data retention and disposition; it is not just another storage tier.
When designing an archiving strategy an organisation should consider it part of a broader data management view, i.e. data should not simply be moved from one place to another (see ‘archive everything forever’ above). Retention periods and disposition polices for all business-critical data should be established at the outset, not as an afterthought. In other words, information should be classified upon its creation and, a copy of all the necessary records should be appropriately archived upon creation.
What are the benefits of archiving?
A data archive brings several benefits to an organisation, ultimately resulting in a more cost-effective and higher-performance IT infrastructure. More specifically an archive allows users to meet compliance requirements by preserving data, simplify data management and backup/recovery and disaster recovery operations.
How to implement archiving?
The answer could be an active archive. This is considered unstructured data such as Microsoft Office files and documents, video/audio files, email PST files and CAD/CAM files, that contain production data. An Active Archive gives you online access to ALL your data and is a combined solution of open systems applications, disk and tape hardware that allows users to access all of their data, and gives you an effortless means to store and manage all of your data
Source: SNIA 100 Year Archive Requirements Survey © 2010 Storage Networking Industry Association. All Rights Reserved
Active archive is a method used for storing, preserving and guaranteeing access to data across a combined infrastructure of multiple storage platforms, including performance disk, high-density disk and tape, using advancements in data management software to maintain end user accessibility to that data regardless of the storage device it is residing on.
There are several basic requirements for active storage today. It must integrate easily with archive management applications making it simple to access the data regardless of the type of storage it is stored on. It must be power efficient, high density, reliable, low-cost, fast throughput and must have a commitment to a long-term roadmap.
Many companies would cope with this challenge by storing more data on a disk, but even less-expensive, high-capacity disks tend not to be as cost-effective as tape, either in terms of cost/capacity ratio or operating costs (space, power and cooling).
Because of recent developments in tape technology we need to consider the possibility of viewing a tape archive as a reliable extension of the archive infrastructure. Even more reliable are the intelligent libraries that proactively alert you if a media drive or hardware issues are developing.
If we look at the utilisation of a disk drive we will see that up to 70 per cent of every disk drive installed today is misused:
- 40 per cent of data inert
- 15 per cent allocated but unused
- 10 per cent orphan data
- 5 per cent contraband data.
It is accepted that disk storage accounts for between 20 - 40 pence of every £1 spent on IT hardware and this trend is accelerating. Today’s tape technologies represents a fundamental part of an active archive approach as reliability has increased 700 per cent in the past ten years e.g:
- Error correction codes
- Advances in the coating of tape film
- Read-after-write data verification
- Drive technology features simplifying tape paths and servo tracking systems.
Gone are the days of not knowing if a tape has been used beyond any manufacturer thresholds, if there has been environmental damage to media, if failures are drive or media related and if data is on the tape media. In today’s modern datacentre, don’t put all your eggs in one basket; protect your data by archiving.
The work of the SNIA has been able to establish a standard definition of data archiving in enterprise environments and the benefits that can be achieved with an optimised approach. Understanding data life cycles divisions and subcategories helps you understand where and how your data should be stored.
By examining guidelines and policies for an effective archiving system will show how companies can optimise their storage network and reduce their bandwidth requirements needed for replication and disaster recovery. Both are necessary elements to satisfy data integrity and regulatory compliance requirements. For more information on this subject, please visit the SNIA Europe website, and under the technology section you will find information on SNIA tutorials and technology specific data.
This article was written by Walter Moriconi, an SNIA Europe Board member who works for Oracle.
For more information on this topic, visit: www.snia.org and www.snia-europe.org. To download the tutorial and see other tutorials on this subject, please visit: http://www.snia.org/education/tutorials/2010/fall
About the SNIA
The Storage Networking Industry Association (SNIA) is a not-for-profit global organisation, made up of some 400 member companies spanning virtually the entire storage industry.
SNIA's mission is to lead the storage industry worldwide in developing and promoting standards, technologies, and educational services to empower organisations in the management of information. To this end, the SNIA is uniquely committed to delivering standards, education, and services that will propel open storage networking solutions into the broader market.
About SNIA Europe
SNIA Europe educates the market on the evolution and application of storage infrastructure solutions for the data centre through education, knowledge exchange and industry thought leadership. As a Regional Affiliate of SNIA Worldwide, we represent storage product and solutions manufacturers and the channel community across EMEA.