Bas Raayman, an SAP consultant has a post on StorageMonkeys that questions the emphasis on de-dupe and thin provisioning, when evidence is mounting that storage at many enterprises is largely underutilized. This got me thinking.
Bas is asking a good question: Why bother shrinking your data when you’re only at 50% capacity? Now the premise is debatable, as certain verticals such as life sciences, media and entertainment, social networks, etc. are as prolific as ever and driving a lot of the 2009 revenue of our NAS partners Isilon, HDS, BlueArc, HP, and EMC.
However, for customers where utilization is low, it’s legitimate to ask the question “why mess with de-dupe?” The answer lies in understanding the resources saved beyond just storage capacity.
Here are some obvious opportunities:
1. Higher performance data migration, backup and replication (thus improving reliability and reducing bandwidth costs);
2. Consolidation into fewer filers (thus reducing management cost and OpEx);
3. For Internet businesses, reduce distribution and CDN costs by delivering smaller files.
Of course end-to-end optimization is a pipe-dream for most vendors where de-dupe is a low-level embedded block-based solution. These point solutions lead to the problem of “de-dupe headache” (yes, you heard it here first!) where islands of disparate de-dupe solutions work in isolation, and any data-movement requires re-hydration and re-shrinking into the next solution. This is something that Curtis Preston takes up in a recent article in SearchStorage.
Here’s where I think an out-of-band content-aware tunable optimization solution like the Ocarina ECOSystem has bridged the gap. It delivers end-to-end fully optimized workflows--not just storage, but bandwidth, power, and reliability. This is really the end-game and where we’ve set our sights.
One example–Ocarina has been shipping the capability to optimize files while leaving them in their native format. In other words, no reconstitution mechanism is necessary. This is particularly useful for photo sites where JPG images are optimized using our visually lossless algorithms, but still retained as JPG files. So the photo site can distribute shrunken images without quality impact. We may not deliver the 8x reduction of a bunch of office docs, but 30-50% optimization delivers massive benefits in bandwidth cost in addition to storage savings.
So even for storage consumers with low utilization, it’s becoming more clear that end-to-end data optimization will lead to improved costs across the entire workflow and infrastructure. In short, as the headline says, it’s not just about storage anymore.

Hey Mike,
first off I want to thank you for your comment, but my first name is Bas, not Raayman.
Second, I wanted to reply a bit earlier, but my schedule gave me some conflicts. But here goes…
I don’t have any doubts that features like de-dupe and thin provisioning have it’s very valid uses. As soon as you are not talking about shrinking and blowing up the files in the various steps between the block device and the end user, this can be a very effective way of actually reducing your storage cost and footprint.
I can see an ideal environment in the combination of de-dupe, TP and “Storage as a Service”. But only if such environments allow me to reduce my storage footprint when I don’t need the data anymore, or to change the service level when the value of the data changes for me.
Say for example that I write code for a living. I could use the model described above to develop and backup everything, for example in an online development repository. I want to pay as little as possible, so things like de-dupe will help me reduce the storage footprint. But the point I was trying to make was if older programs, or older program versions really need to be kept? And if so, do I need them on the same storage tier as my productive development environment? Why not move it to cheaper bulk and change the backup frequency? Dedupe will help me to keep my backup volume lower, and will still be of use, but the question on how valuable the data is to me, is on a way higher level, and is a question that is most seldomly asked by the providers of such services, or even your own internal service provider if you work in big company.
Cheers,
Bas