I was thinking about re-creating single instance storage...
Assuming a server's got capacity to spare (regarding local resources anyway - CPU, RAM, disk i/o)... if the .edb were to contain an additional [hash] table, attachments could be given a hash value (not necessarily requiring a full scan either, perhaps only based on first 20 bytes, as you'll see in a moment)... the table would also include the location for each attachment. Doing this, if ever a user needed to be moved (between servers/storage groups/etc), or exmerge imported, a quick hash table could be used to determine LIKELY matches for single instance storage.
Clearly a unique identifier can't be used across all orgs/servers/etc, as the bandwidth in multiple-server environments would be overwhelming on such menial traffic... but using the contents as a base for the hash, with a lookup table for likely matches, if a match is found exch can scan each file completely to determine true SIS status or not.
I say this if for no other reason than to stress the importance of this. I maintain the exchange server at my company of 50 employees. Because of the nature of our email, we do not have quotas defined for any of our users. (To reduce wasted space i have an outlook GPO to always "ask user to remove anything from 'deleted items' upon exit"). As we look at upgrading to exchange 2003, I will be forced to look at our 10gb database with SIS ratio of 6 (AFTER upgrading from our 6gb exch5.5 store via exmerge, and loosing first round of SIS).
I've looked at archive copies, multiple servers, blah blah... our email is critical... it's going to take space on our servers (as opposed to workstations/laptops) anyway, and going to be part of our nightly offsite tape backup, and i'd rather it stay in a single-instance-storage container such as Exchange.
I would imagine such a scan would be easy to allow as an option, allowing administrators to choose whether it's a task worth burdoning on their servers... heck, if i had the option i might ONLY turn it on if i ever had to import data (via move mailbox wizard, or exmerge, or outlook importing via PST).
Thanks again,
-Scott