NetApp is going to add a native object interface to its storage arrays, probably before the middle of next year.
Val Bercovici, the company's cloud czar, revealed this in a blog posting entitled NetApp 3.0. So what is an object interface, as compared to a file interface and a block interface?
In block storage a hard disk drive storage array is told to read or write blocks N to N+some number by a requesting application. The application knows what the data is in those blocks. In file storage the file-based storage array is told to read or write a file, a named collection of data that may contain meta data such as date created, owner, last modification date, size in bytes and so forth. The storage array, filer in this case, decides where to put the data, in which blocks to write it or from which blocks to read it.
An object, like a file, is a set of data. With EMC's Centera, a prominent object storage array, the set of data, the object's contents are hashed and a request to read the object must be made using the hash key to identify it, giving rise to the term content-addressed storage.
We send a file to or request a file from Centera. That request is turned into an object identifier and Centera fetches it for us. It builds a meta data file, called a c-clip descriptor file (CDF), when an object is first created and this is the link between the application and the object file. The application knows the address of the CDF, not the object itself. The CDF stores meta data such as time and date, the content address, and meta data supplied by the application.
In a cloud or a content-addressed storage (CAS) system like Centera, objects can be replicated and transferred between sites with new locations mapped into the CDF. Accessing applications tell the system that they want an object with such-and-such a CDF identifier. The cloud system looks at the CDF location, gets the actual object address and then delivers the object to the application.
It's said that an object storage system can scale better than either block storage or filer storage to cope with the billions if not trillions of items that will be stored in a cloud storage system. The hash key, since it is built from the object's contents, can be used to verify that its contents are unchanged, by regenerating a hash key from the current contents and comparing it to the old one. This property was useful for the storage of fixed content data for a long time, the Centera use-case.
Unlike a file system, which is tied to an accessing system's operating system, an object is independent of the accessing computer's operating system and file system. Of course, the accessing application in a computer has to know how to deal with the object but it does not have to understand and use a file system to identify, locate and access it.
A file uniquely identifies a single file. A block number identifies a single block, but an object can consist of several entities if the CDF or equivalent structure is set up to do so.
We note that NetApp already uses a content hashing algorithm in its ASIS deduplication.
The inference of Bercovici's remark (and the fact that Bercovici himself made it) is that NetApp will offer an object-based storage facility for use by cloud storage providers. Given NetApp's traditions it will likely be integrated in Data ONTAP instead of being supplied as a separate storage product, like NetApp's existing virtual tape library (VTL).
Bercovici says we should "look for native object interfaces from NetApp in the not too distant future". We can estimate that this means before the end of 2010, which would imply it will probably be a feature delivered during the life of the next major release of ONTAP.
NetApp is shaping up to use a single unified storage architecture marketing tack as one of its differentiations from EMC and other storage vendors, with multiple product lines for different storage functionalities.
Bercovici also says: "Going forward it’s important to look beyond the controller for dramatic and powerful new storage and data management functionality. Primitive storage controllers will give way to powerful primitives implemented as storage controllers – forming part of a rich system of storage and data management services available in the Cloud."
This could mean that storage and data management operations affecting several individual arrays, with their individual controllers, will be organised by functionality above the controller, in a server linked to the controllers and orchestrating them. ®