Content feed Comments Feed

Online Storage Optimization

Exploring Next Generation Storage Solutions

Object-Based Storage - New Possibilities

Posted by Carter George On May - 26 - 2009

Nice piece this past week in the “Storage Tips” section of SearchStorage by a contributor named Alan Radding about the object-based direction that SAN is taking. The piece also ran in the recent issue of Storage Magazine, which is now an electronic only pub.

There are very interesting possibilities opened up by object-based storage. Radding raises some of them in his article — discussing the fact that there are trade-offs when moving towards a more wide usage of object metadata. To me, there is a great deal to explore when looking at this potential trend. In essence, storing data as objects with rich metadata means that object dedupe becomes a natural thing to do. In this scenario, you not only dedupe objects (as opposed to blocks), but you can also drive decisions about dedupe, compression, and other optimizations based on the metadata of the object.

Essentially, a block in a SAN array is just a fixed-size chunk of data that the array knows very little about, whereas an object is a variable chunk of data that you may know quite a bit about. This includes: file and data type, owners, use cases, compliance policy, etc. Storing all that metadata is only interesting if you do intelligent things with it. With this in mind, my view is that object dedupe is a natural winner.

Because the information you know about an object is discoverable, it can be used to make intelligent decisions about how to reduce its space - using smart object dedupe, file type (content-aware) compression, even selective deletion. For example, a medical image might include a large lossless original MRI image and a small thumbnail for convenience. If you know that an object is a medical image, that HIPAA compliance applies, and that the object is considered archived, there are several things you might decide about storing that object. First, you know that dedupe probably won’t do much for that object type, but you might compress the big image with a bit-for-bit lossless compressor that is specific to MRI’s, and you might delete the thumbnail, knowing that you can recreate it on the fly any time, automatically, from the lossless original you’ve kept.

You might end up reducing the space required to store a medical image archive by 75% without losing a single bit of information, and meeting all HIPAA requirements. You can’t make those kinds of decisions on a block in SAN storage array — that’s just 4K of data that belongs to a file system or database or application, and at the SAN array controller or switch level, you can’t do much with it because you don’t know what it is. You can only do physical things, such as: mirror, replicate, move to a faster or slower tier, etc. You can’t make decisions based on content, use-case, or lifecycle.

When people talk about object stores, the examples are almost always about compliance. But really, object stores — both in the data center and in the cloud, where the object store is the dominant model already (just look at Amazon S3) — could emerge as a natural tier two storage for NAS and files in general. If they do, then object stores provide very rich possibilities for dedupe, compression, and other cost and space saving optimizations.

All worth thinking through, especially in light of the need for more and better optimization technology to manage the massive upsurge in data across many industries.

Share and Enjoy:
  • del.icio.us
  • Facebook
  • LinkedIn
  • TwitThis

One Response to “Object-Based Storage - New Possibilities”

  1. David Slik says:

    Object Storage opens up many possibilities, of which more intelligent de-duplication is just one. I’ve written a series of posts about Object Storage on my blog, which discuss many of these emerging capabilities, that I think you may find interesting:

    Object Storage, Part 1 - Introduction
    Object Storage, Part 2 - Metadata
    Object Storage, Part 3 - Explicit and Implicit Policies
    Object Storage, Part 4 - Query
    Object Storage, Part 5 - Security

    Please let me know what you think.

Leave a Reply