Interest in solid state drives (SSDs) is growing as their initially terrifying price plummets: Intel recently announced a 60 per-cent reduction in the cost of its X25-M Mainstream SATA SSD since its launch last year.
SSDs are attractive because they radiate less heat than traditional spinning disks, they're smaller, quieter and consume less energy. And, of course, they're fast. So naturally we bung them into laptops.
However, there's growing use of SSDs in servers, and particularly database servers. Teradata, one of the world's leading business intelligence companies, recently unveiled a prototype data warehouse running entirely on SSDs.
Do the same reasons reasons for using SSDs on a laptop also apply to database servers? And what do SSDs mean for the applications that will access the data they hold?
Performance improvement is certainly top of the list of appealing factors, but uptake is also being driven by energy issues. Keeping servers cool is an ongoing battle so anything that reduces the heat generation is welcome, so how do the two technologies stack up?
A 600GB HDD might use about 16 watts when active: that's about 30W per terabyte. The X-25M has an active power consumption of 150mW. Running six of the 160GB SSDs gives a figure of about 1W/TB - 30 times less. If that wasn't reason enough, the cooler disks can be packed more closely together saving space, another server-room plus point.
So, what of the speed? To read from a particular place on disk the read/write head in an HDD must move to the right position, which takes "seek" time - quite variable but let's say an average of four milliseconds. Once the head is in position there is a "latency" as the sector containing the data crawls painfully around to it's appointed position under the head.
A 15,000 rpm HDD might have an average latency of two milliseconds. An SSD with no moving parts doesn't really have equivalents but latency may be quoted - for example a recent SSD was rated at 65 microseconds.
All the advantages of using SSDs for general servers hold true for database use, but there are other potential advantages. Databases have always, if you ignore punched cards, used rotating media and whilst HDDs aren't serial access, they certainly aren't random access either, so all manner of database mechanisms have been developed to improve data access performance. I'll pick on just two: indexing and column/row oriented databases.
A non-clustered index is often a sorted list of pointers to data and for performance reasons should be kept on a separate spindle. Of course if the table is small enough, and particularly if it's marked as read-only - as it may well be for analysis - any database engine worth its salt will run a full table scan, pop the table into memory and ignore the index. Why? Because memory is much faster than disk. Intelligent DBAs understand all this and modify their indexing strategies accordingly.
Moving down the scale, there are doubtless DBAs who index all fields needing fast access regardless of table size, those who do this and keep the indexes on the same disk as the data and even those who don't know about indexes.
SSDs can be thought of as a hybrid between rotating media and volatile memory and their adoption should significantly change intelligent indexing behavior. At its broadest this simply means that good DBAs will use fewer indexes. Also, the use of SSDs will almost certainly affect decisions about the placement of indexes: for example there may be far less of a performance hit from keeping data and indexes on the same device because SSDs are essentially random access.
Ingres and Postgres creator Michael Stonebraker is well known for his controversial opinion of MapReduce and he has also voiced strong views on row- and column-oriented databases. In transactional systems we tend to read most of the data in a row and in analytical systems we tend to read most of the contents of a column - somewhat of a simplification, but essentially true.
Stonebraker's argument is that in analytical databases we should store the data column-by-column so that the data most likely to be read together is stored together. It's a perfectly reasonable argument when discussing an HDD and - whilst I'm not suggesting that the argument evaporates completely with random access storage - I am saying that the use of SSDs will materially alter the balance here. In other words, for many systems it simply may not matter whether we use column or row-oriented storage.
Clearly SSDs are going to be seen in wider deployment on general purpose servers. And, just as we’ve developed ways of extracting the best from rotating media, we must now engage upon doing the same for solid state drives.
Then once we've grasped the idea that storage need not rotate, alternatives to SSDs also enter the picture. Why not oscillate instead? DataSlide, for example, is developing a Hard Rectangular Drive (HRD) with a "massively parallel 2D array of magnetic heads" past which the media moves. All sorts of good vibrations surround this device.
Bootnote: SSD explained
An SSD is a block of memory acting as a storage medium (meaning, incidentally, that the D in SSD is a misnomer as there's nothing physically disk-like about an SSD). The latest models, like Intel's X25-M, use NAND flash memory technology, which doesn't use power to maintain stored data. NAND is short for 'Not AND', a Boolean logic operator that describes how data is stored: its use in SSDs was pioneered by Samsung.
There are two flavors of NAND flash: single-level cell (SLC) which stores one bit per cell and multi-level cell (MLC) that - you guessed it - stores two. Both can be read very rapidly but writing to SLC is around twice as fast as writing to MLC.
On the other hand, MLC technology is cheaper: the 80Gb X25-M is MLC and the new bulk purchase price - 1,000 units - is $225, which works out at under $3 per gigabyte. One SLC Intel 32Gb X25-E costs around $350 or $11-ish per gig which, whilst not directly comparable for several reasons, at least gives a feel for the price differential. ®