With all the talk about the Data Domain acquisition, there less attention paid to EMC’s native de-dupe features in Celerra, not to mention its other related partnerships, such as with Ocarina for optimization of vertical applications. Last week I had the privilege of attending a webinar, “Surviving the Data Explosion through Data Reduction” with John Hayden, CTO of NAS Engineering at EMC, where I got a fuller picture of Celerra’s latest optimization features.
John provided us with insights on how the new Celerra NAS product integrates data optimization. And while he never mentioned Data Domain directly, an astute observer could see how well EMC is integrating prior acquisitions into its architecture, and draw conclusions from that.
First, he provided us with a couple of interesting factoids from the Digital Universe research EMC sponsored for IDC:
- In 2009 there is positive growth in digital content, but IT spending for servers & storage are down 6%
- Over the next 4 years, data will grow 5x, but IT budgets will only grow 1.2x
- The administrative and overhead cost of storage is 4-7x the CapEx
This was all a prelude to John discussing the new data optimization features for their Celerra NAS product. It’s great to see the NAS vendors recognizing the value of data optimization as a central part of the NAS stack. Drilling a little deeper, EMC basically pulled together file-level deduplication (single instance storage or SIS) from the Avamar acquisition, and LZ77 data-generic compression from their Recoverpoint acquisition. SIS + LZ77 are a good price-performance combination for generic office files and text docs, but they don’t make much of a dent where we see the real capacity and scalability challenges; vertical applications such as life sciences, oil & gas, and media. In fact, the use of generic compression is becoming impotent against the latest MS Office docs that use ZIP as a container. If you change a single text character in an office doc, the entire file changes.
So there’s a reason that Ocarina has a solid partnership with EMC, with an optimization solution that’s complementary to Celerra’s. When it comes to customers with serious capacity issues and data growth – we’re talking about gene sequencing, post-houses, and so on and so forth – there is little to gain from deduplication, and little to gain from generic compression. Not only does the optimization solution need to more intelligently unwind and understand the file structure, but it needs to make better decisions about what algorithms get applied to specific file sub-objects. The is where Ocarina comes in. Like the native Celerra de-dupe solution, the Ocarina ECOsystem integrates with the FileMover API for a tightly knit, policy-based optimization solution that works even on media and ZIP files that are already compressed.
We look forward to our collaborations with EMC, and will be very interested to watch how they continue to integrate dedupe and compression across their offerings.
