Marketing veeps love hockey sticks and the idea that sales growth could accelerate in a way resembling the curve of such a stick on a chart really turns them on.
Hockey sticks are hard to find if you are in object storage but glimmers can be seen, especially through marketeer's telescopes, and storage biz Caringo thinks, and hopes, that vast vaults of unstructured data will deliver the hockey stick it wants.
Caringo CEO Mark Goros spoke at a press briefing in Sunnyvale and said that CAStor, Caringo's content-addressed store, is in its fifth version and rock-solid.
It is OEMed by Dell for its DX6000 object storage array and that has given Caringo lots of extra heft in the marketplace. Every object storage supplier is convinced that scads of unstructured data is flooding into business filers and overwhelming them. Most of it is fixed and rarely, if ever, needs changing so there is no need to store it in filers built for storing data that changes often.
Step forward object storage, which says it can store this fixed content more efficiently than filers, with higher performance, and much greater scale. Here's where El Reg starts nit-picking, as these claims are relatively untested and unproven. Most actual implementations of object storage have not been at fantastic scale and have not shown up the inadequacies of filesystem-based storage in performance, efficiency, and scalability terms.
Indeed, with EMC buying Isilon and IBM boosting SONAS performance with flash, object storage looks as if it's struggling to keep up.
Forget the scalability and performance and efficiency points. The key thing is this: object storage is cheaper than filers because you don't need RAID, you don't need fancy array interconnects, and you can use cheap and cheerful JBODs. Yes, it does scale although filesystems can slow down as they fill up, but until you're storing north of multiple tens of PBs and beyond these imitations won't necessarily show up.
Nits picked, let's return to Caringo. It has about 400 customers with 100 or so of them coming in through the year-old Dell OEM deal; that's how important Dell is to Caringo.
What a dose of CAStor oil does
CAStor provides a single flat address space for objects, which contain all of a file's data plus all of the system and any user metadata, and are identified by a globally unique 128-bit UUID string.
Objects are written sequentially and a fresh object is written at the end of the current objects on a node's drives. In other words it is appended. Changed objects are written as new objects and the older version of the object marked for delete and space recovery by a background garbage collection process.
There are no actual numbers saying CAStor gets data faster than a file system
Object UUIDs are kept in RAM for extremely fast look-up, and this UUID table is built afresh whenever the system is booted. System metadata holds lifecycle information such as whether the object is immutable or not. An object is contiguous on disk and not split into 4K blocks as with a file system, so reading is faster as it's sequential.
The minimum cluster size is 3 and nodes are peers. An object write gets a copy written to a second node for protection. There is no central map of which node has which object. If the cluster loses a node, the system rebuilds its contents from the replicas distributed around the other nodes. As objects are rebuilt then they spark a re-replication as replicas of objects on the lost node are re-replicated.
A hot object can be replicated in RAM to avoid bottlenecking on spindles.
Caringo and filers
Goros said: "CAStor is biased to storing fixed content that doesn't change much. Changing data is not a use case for us... We are not looking to replace file storage. Our customers tend to be building new applications in medical, government, and the media and entertainment areas ... The original mission of the company was to change the economics of fixed content storage."
Caringo, and by extension object storage in general, is not aiming to replace filesystems with object storage. Instead it aims to provide a more cost-effective alternative to filers for fixed content data.
Some object storage marketing says object storage is simply better than filesystem storage. For example, a Caringo spokesperson thinks CAStor is four times faster than general filers. Another point made is that CAStor doesn't use RAID – and RAID rebuilds are slower than CAStor drive rebuilds.
But there are no actual numbers saying CAStor gets data faster than a file system or rebuilds its drives faster than a RAID rebuild. And we infer that customers are not buying this because it's a faster alternative to filers.
CASTor oils roadmap
A new version of CAStor, release 5.5, is due in December. With the upgrade, object size is virtually unlimited and objects can span across multiple disks. Maximum object size is currently limited to the size of the disks in the system, effectively 3TB. The release will feature chunked encoding, and there are management improvements, with extended hardware reporting via Net-SNMP.
The content router function gets more robust replication speed and progress reporting. There will be graphical storage reporting covering operational activity, CPU and network loading. The NAS function (content file server) gets high availability features.
In this release, CAStor can be used instead of Swift in the OpenStack system through API integration. Goros says CAStor is enterprise-ready whereas Swift is not.
There as also a version 6.0 due next year. This will introduce so-called smarter objects with a smaller footprint.
Caringo is also working on CAStor Cloud Services to enable CAStor to be used in public, private and hybrid clouds with an API approach in release 5/5 and possibly a portal approach in version 6.
The missing hockey stick
Why is object storage penetration so relatively low? Goros said: "It takes time for a huge industry to change … File systems are more popular and more in use than object storage. We see the world moving towards to object storage because of unstructured data rising and big data…. We believe we're just at the beginning of the market place. What we've done over the past five versions is missionary work."
There is a strong feeling in Caringo that the fixed content portion of the unstructured data flood will steadily grow until it becomes the single largest component of an enterprise's stored data. At that point enterprises will have to take a more serious look at object storage and, it's hoped, Caringo's advantages will come to the fore.
There is a lot of competition around: Goros says there are 10, even 20, players. Prominent rivals include EMC Atmos and Isilon, Scality, Amplidata, RackSpace and OpenStack.
Goros claims: "We're much more cost-effective than Isilon. We can mix and match HW" – which Isilon cannot – and: "We have compliance built in."
Caringo hit profitability in the second quarter and the Dell deal is unquestionably a big win. There is everything to play for and Goros and his colleagues are going for it in their steady and thorough way; serious people building a serious product and hoping that the market comes to them. If and when it does, and growth accelerates, then then they may well relax a little and say: "Jolly hockey sticks!" ®