There are two points of view on databases. Database experts tend to be enthusiastic, as you'd expect, and point to excellent scalability; business resilience (including transaction processing and automated backup/restore) and the availability of utility programs that let you add functionality without programming. Others sometimes regard the whole idea as a con and point to the greater efficiency with which any single function can be coded with specialist code and a simple data structure such as ISAM (indexed sequential access method), although indexing can constrain scalability and flexibility.
We come down on the side of the database – few data stores remain small and/or single use for long and a little investment in database technology now pays dividends with fewer unwelcome surprises later. But there are two or three caveats.
To start with, many programmers don't understand database design or query efficiency. One benefit of a database is that you can always find your data – but you can easily find it with horribly inefficient queries. Part of your investment in database should include tuning tools, training and mentoring.
Another issue is the upgrade cycle. Database functionality is complex and constantly evolving. Vendors like Oracle want you to upgrade regularly – which can be a nuisance if you are only using a little bit of database functionality, which hasn't changed since the year dot. This explains (partly) the popularity of Open Source databases, which largely remove the pressure for constant upgrade. If you go this route, however, remember there are alternatives to the excellent (particularly in simple web applications) MySQL – Pervasive Postgres, for example, or CA's Ingres having enterprise functionality rivalling Oracle or DB2.
And, finally, remember that there is life beyond the big relational database management systems. Embedded databases such as Pervasive PSQL or Gupta SQLBase; or 4GL databases such as Progress OpenEdge, perhaps need less support than the big RDBMSs, as do alternative technologies such as the excellent Caché or FileMaker.
You needn't think conventionally, and along that route here's an interesting new idea from Coppereye, which originally made its name by indexing large Oracle databases faster than Oracle could do it for itself.
It has an indexing SDK for database developers and links with IBM for optimising Informix indexing but why not try a plug-in generalised product as well? So, it applies its proprietary indexing technology to very large flat files, such as you might find in compliance and RFID applications, where updating the underlying data is positively discouraged (leaving out update logic certainly streamlines indexing).
Coppereye's Greenwich product targets write-once business event recording, characterized by high transaction rates and long retention requirements – and hundreds of terabytes of data – but only on the main UNIX platforms (including Linux). It doesn't alter or copy the underlying data, just indexes it, and presents a conventional SQL (ODBC) interface to other applications.
So, you get fast indexed access that lets you retrieve specific data efficiently from a huge store – a small subset of the whole - which you can then process with comparatively cheap and simple BI software. OK, so people like SAS, BMC and Information Builders have alternative technology to do this, but the Coppereye approach deserves a look.
It seems to have the merit of simplicity and should enable you to use analysis and presentation technology you may have already. "Proper" relational databases are good, but aren't always the only effective approach – and perhaps a clever high-performance index on a flat file is effectively some sort of specialist database anyway... ®
- Caché a "post relational" database which presents alternative relational or object views into the underlying data.
- Filemaker Pro the latest version of this popular Mac and Windows database has just been released.
- Ingres now a CA open source project, and the database underlying many CA management products).
- Pervasive PSQL the evolution of a useful database called BTrieve.
- Postgres (just one distribution of OpenSource PostgreSQL, an evolution of Oracle's original, and superior, competitor, Ingres.
- Progress OpenEdge: Progress OpenEdge is part of a 4GL platform).
- SQLBase from Gupta available for both Windows and Linux.