Our Thoughts on Performance and Recent Dedupe/Compression Comments


In the recent InfoStor article on primary storage optimization http://www.infostor.com/index/articles/display/2460996926/articles/infostor/storage-management/2010/april-2010/consider-compression.html , Ocarina was mentioned along with some other vendors who have offerings that provide either dedupe or compression for primary storage. Ocarina is characterized as being a post-process solution.    This is a theme that we’ve seen in several product review pieces, and it’s worth clarifying.   Ocarina does offer post-process optimization, but the product can also be configured to do inband optimization. The common wisdom seems to be that post-process gets the best data reduction, but is slower than inband and therefore can only be used for cold data.   Ocarina is happy to be recognized for having the best data reduction, but we’re not post-process only, nor are we willing to concede that we are for cold data only.

User access to optimized data is always inband and real time. This whole inband versus post-process discussion only applies to the question of when you shrink the data.

Ocarina’s ECOsystem is a configurable multi-stage data reduction pipeline and it can be set up to run post-process, inband, or both. Ocarina’s family of storage optimization appliances come pre-configured to do post-process dedupe and compression, but in the cases where Ocarina’s ECOsystem has been embedded as software inside our storage partners’ products, we have been configured inband in some cases.

The ECOsystem pipeline has four elements, all optional:    object dedupe, block dedupe, regular compression, and content-aware compression. To run inband, an element has to be fast enough to keep up with the storage system’s I/O speed. Just as you want compression to be data invisible (it’s invisible when a user gets their file back bit-for-bit the way it was originally without ever knowing it was compressed), you want it to be performance invisible too. Adding dedupe and compression should not affect the perceived performance of a storage system. Some of Ocarina’s data reduction elements are fast enough to run inband, and can be configured that way. Some elements, especially advanced content-aware compressors, are slow enough that in most cases you would want to run them as background post-processes. With post-process, you also have the option of using policies to decide which data to compress, and when. With most inband solutions, you have to compress everything, all the time.

In the latest issue of Storage Magazine, Curtis Preston, editor in TechTarget’s Storage Media Group and an independent backup expert, in covering data reduction vendors said, “Ocarina takes a very different approach to data reduction than many other vendors. Where most vendors apply compression and deduplication without any knowledge of the data, Ocarina has hundreds of different compression and deduplication algorithms that it uses depending on the specific type of data.”


Because we have such a rich toolbox, we can do different things with it.  If you used Ocarina to build a backup solution, you might use our block dedupe and fast regular compression only. If you used us for a deep archive, you might use object dedupe and advanced content-aware compression as post-process only. For primary NAS, you might do fast regular compression inband, and dedupe as a post process, and so forth.


To make this point a bit clearer, we’ll publish some performance results in the next week or two showing how we attack different data sets inband, post-process, and with both. We’ll make the data sets public, so if other vendors want to try their wares, they can download the data and see how they do, apples to apples To directly address the issue of whether Ocarina can be fast enough to be used for “true” primary storage, if you run our fast regular compressor only, inband, we have been benchmarked over 3,000 MB/sec. Most of our customers will elect to do more than just simple regular compression, so that’s not a number we’d claim for the real world – in the real world, people want to get better data reduction than you get with just regular compression.


At the end of the day, going fast is important, but only if you actually do something useful. If a solution goes really fast, but can’t actually shrink your data, then we don’t think that’s very interesting.    The right solution will get the best possible data reduction while still meeting performance requirements. Different use cases have different performance requirements, and what you want is something that can be configured to hit the sweet spot for performance while still getting smokin’ dedupe and compression results.

  • Twitter
  • Facebook
  • LinkedIn
  • del.icio.us
  • Digg
  • StumbleUpon

Tags: , ,

About Carter George

Carter runs storage strategy for Dell

Trackbacks/Pingbacks

  1. Storage's 2010 Hottest Technology | The Storage Alchemist - April 19, 2010

    [...] Networks recently published a blog stating that they do ‘in-line’ storage optimization for primary storage.

Leave a Reply