Where do put your archive data, those masses of unstructured and semi-structured files that you can't throw away but must keep online just in case? Should it go on bulk SATA disk tiers in primary drive arrays or on dedicated archive arrays?
Permabit says put it on dedicated and highly intelligent arrays, a Permabit Enterprise Archive system, like its recently announced 4010, using commodity SATA drives. Such centralised bulk archive storage is sharable; cheap, at less than $1/GB; can hold petabytes of data; and be reliably secured and protected.
We talked to Permabit's chief technology officer Jered Floyd about the 4010 and its capabilities. It seems to us that a starting point for a Permabit archive is value. It can cost $40 to $50/GB to store data on primary arrays and 60 to 80 per cent of it is static semi- and un-structured information.
Permabit makes archive silos for unstructured data that are accessible through standard NFS and CIFS protocols, not specialised APIs which characterise some content management systems. It requires a data moving application. Permabit provides the archival storage platform, not archiving software, partnering with vendors such as Atempo, CommVault, Symantec, ZLTechnologies, and others for specific archiving software applications. End users can also send files directly to the archive using file access protocols.
The store's scalability is provided by separating the access functions from the storage functions. Both are independently scalable. The default 4010 access node consists of a high-availability pair of quad-core Xeon 5400 servers with specialised software to present NFS and CIFS personalities and fingerprint data. If higher access performance is needed, there can be additional access nodes. These access nodes carry out encryption and decryption functions.
The storage nodes consist of quad-core Xeon 5400 processors and four 1TB SATA hard drives, a ratio of one core per drive. These nodes can be aggregated together with up to 36 in a grid with a raw capacity of 144TB. Grids can be further aggregated together, with a maximum of 32 grids combining to provide 4.6PB of raw storage capacity.
These storage nodes carry out data indexing and routing. They are self-healing and provide virtualised storage management Have a 5400 core per drive gives Permabit a seriously high processing to data storage ratio. It means Permabit can do a lot to protect drives and nodes and virtualise the storage pool. "Having one core per drive provides substantial benefits in reducing the impact of rebuild performance," Floyd said.
Permabit can also ride the Intel processor development curve. Asked about this, Floyd said: "The price/power/performance ratios for multi-core processors means it makes sense to include them in the storage nodes to support future software innovations."
What about Nehalem? "We plan to use Nehalem technology in our next platform later this year." Great. Expect a doubling of performance, meaning that there will be a substantial amount of additional processing headroom and faster processing of the current Permabit archive operations.
Filesystem and protection
Permabit has a global file system that can span across an entire grid, or customers can have separate file systems within each grid. What about spanning across grids? "We currently rely on third-party virtualisation solutions to provide the global view of our system, " Floyd said. "We are working on a future release that will span the file system across all grids...on integrating global file system technology into our product."
The idea is to make the use of Permabit's archival storage as transparent as possible to end users and applications. That would give the Nehalem processors something to chew on.
There is a RAID-EC protection scheme. How does this work? "It's actually RAIN (redundant array of independent nodes) rather than RAID. RAIN-EC leverages a patented erasure coding algorithm that allows for up to two simultaneous nodes failures, including any number of drives in these nodes. This is protection per grid, so multiple grids allows for multiple simultaneous failures in each grid."
What's the usable capacity of the 4010? "Each storage node has its 4TB of raw capacity," Floyd said. "A grid has a raw capacity of 144TB. Our combined data protection and file system overhead is approximately 35 per cent, but our technology incorporates deduplication. It is very typical for us to see over 4TB usable per storage node and 144TB usable per grid."
How about legal hold? "Permabit Enterprise Archive supports the ability to enforce retention of records on a file-by-file basis. Records are maintained in a non-modifiable, non-erasable manner for the time period specified by the administrator or application."
Floyd says there are more than 250 customers using Permabit archive products, some of them being cloud storage providers. "We see ourselves as an arms merchant to cloud storage - we can provide a stunningly low price with exceptional reliability upon which cloud service providers can build their services." A Permabit 4010 costs $250,000 for a 144TB configuration.
That's the Permabit pitch. Put the unstructured and semi-structured data that you have to keep into a purpose-built archive to keep it secure and protected for a better price per GB than anything else offering comparable facilities. ®