Microsoft added single instance storage (SIS) to Exchange 2005 and has now removed it from Exchange 2010, ensuring that duplicated applications will be stored in all their redundant, space-gobbling glory on Exchange server's disks. Why has Microsoft made this apparently retrograde step?
A Microsoft Social Technet entry reads:
Header data for all mailbox items is stored in a single database table — this change makes the database more efficient because it can process a single table for a mailbox during a client session instead of accessing different tables for different mailbox folders. A side effect of this schema change is that Exchange no longer uses Single Instance Storage (SIS) to keep just one copy of message content per database. Most servers support multiple databases, so the efficiency gained from SIS is less and less as time goes on.
This talks of Exchange server efficiency and doesn't mention storage array efficiency, in which the storage of gigabytes of duplicate data is to be abhorred.
A change in the opposite direction concerns compression:
The Store compresses attachments — Microsoft calculates that the CPU time spent compressing and decompressing attachments is less than the work required to manage the storage of very large uncompressed data within the database. This change also reduces the overall size of Exchange databases, which speeds up operations such as backups.
Again the focus of optimisation is the server, with storage efficiency a byproduct.
Previously we have learnt that Exchange 2010's I/O improvements meant that its database could use cheap high-capacity SATA drives instead of faster, more expensive Fibre Channel or SAS disk drives. Now we see that high capacity SATA drives may well be needed anyway because of attachment duplication.
Across a large enterprise with tens of thousands of Exchange users, there must be the potential for multiple tens of gigabytes of wasted storage space - if not terabytes. ®