Protecting Compressed Data and Reducing Costs


As @storagebod (Martin Glassborow) noted today in a blog post, the issue of protecting deduplicated data is an important one for any business.

Deduplication and other data reduction technologies offer the opportunity for increased data protection at a reduced cost.

When you hit a threshold of 50% or greater overall data reduction, it lowers the cost of performing a a full mirror of data. For example, if you have 100 terabytes of data, mirroring today would require 200 terabytes of disk space. But if the data were reduced by 75%, you could store the same data in 25 terabytes, and full mirroring would require on 50 terabytes of disk – a significant savings. In other words, it is possible to fully protect your storage, including full mirroring, with less disk than it takes to store the unprotected data today.

While this is true with any amount of space savings, when that level of savings goes beyond 50% (or a dedupe ratio of 2:1 or better), the storage cost of full mirroring is zero.

There are a number of options that your deduplication provider should be pursuing to compliment your data protection strategy. For example, a vendor should be able to allow you do two key things with your dedupe configuration:

  1. Allow a minimum level of duplicate blocks to accumulate prior to starting deduplication. For example, you could allow data to be written twice, and then deduplicate all subsequent occurances. So long as the dedupe solution is aware of both original occurances, you have a form of mirroring without needing to do full mirroring at the disk level. Call this duplicate mirroring.
  2. Set a threshold for a maximum number of duplicates. To set a maximum level of exposure on the loss of physical disk, you should be able to ask that once you have found ‘n’ instances of the block, to start over. You could set that number at 8, 32, 128 or whatever frequency makes business sense to you, and therefore, the potential loss of the sector on which the duplicate is stored would only affect a certain number of files.

These two examples are not necessarily a complete answer in and of themselves, but they do provide guidance for tools you can work with as part of an overall data protection strategy. As deduplication becomes a storage fundamental, and is in place across multiple tiers of storage and on multiple products in your data center, understanding the impact of dedupe on your data protection strategy will be key and your dedupe vendor should be providing you the right tools to manage that.

  • Twitter
  • Facebook
  • LinkedIn
  • del.icio.us
  • Digg
  • StumbleUpon

Tags: , , ,

About Carter George

Carter runs storage strategy for Dell

Trackbacks/Pingbacks

  1. Tweets that mention Online Storage Optimization » Blog Archive » Protecting Compressed Data and Reducing Costs -- Topsy.com - May 26, 2010

    [...] This post was mentioned on Twitter by Emulex Links, Ocarina Networks. Ocarina Networks said: New Blog Post: Protecting Compressed Data and Reducing Costs http://bit.ly/dzO1XZ /cc @storagebod #dataprotection #dedupe #shrinkage [...]

Leave a Reply