Content feed Comments Feed

Online Storage Optimization

Exploring Next Generation Storage Solutions

Archive for July, 2009

Dedupe Grows Up

Posted by Sunshine On July - 29 - 2009

George Crump has a piece in Byte and Switch today that poses an important question: “Can we get to a single point of deduplication?” This is a question that we have taken up in one form or another in some of our recent posts, such as this one and this one.

In the article, Crump asks the question in another way: “… can you have all your data tiers; primary, archive and backup deduplicated by a single engine?”

In light of the recent focus on deduplication, this in my view is a question that really does need to be raised. For how long will the industry to silo out these different tiers for its deduplication solutions? And how much sense does it make to rehydrate data every time you move it, in order to once again deduplicate it? Not a lot.

Crump writes: “The current deduplication vendors could work on building out their solutions to either scale up into primary storage performance (see Data Domain’s DD880) or they could move their existing data duplication technology into other markets; see the increased speed of Ocarina Networks and Permabit as well as their move into cloud storage.”

At the same time, as we’ve pointed out here, online storage is quite a bit different than backups and so far at least, none of the successful backup dedupe vendors - Data Domain, Diligent, Quantum, etc. have been able to break into it. Rather, it is NetApp and Ocarina who have been the trailblazers.

Crump makes another key point:

“NetApp and Ocarina could continue to enhance and improve the re-hydration speed of their technologies to make read performance a non-issue, making primary storage a viable platform. Ocarina can already maintain the deduplicated format as they move through tiers, so landing on backup or archive disk would simply be another move for them.”

This is an interesting observation, and one that is often missed in reporting on both of these solutions. We look forward to seeing more debate and discussion on this issue, which was well kicked off with this piece.

Who’s Afraid of the Big, Bad, Dedupe?

Posted by Ocarina On July - 28 - 2009

Martin Glassborow on his Storagebod Blog has written a controversial piece raises questions about the two hottest technologies in storage at the moment, dedupe and thin provisioning. In his post, entitled “Living on a Prayer,” he suggests that both of these technologies could be the road to a storage nightmare, in which, “you could be many times over-subscribed with de-duped storage.” He gives the example of someone turning on encryption and all the dupes reappear at once, suddenly requiring all kinds of storage capacity that wasn’t needed until then.

He also sounds the alarm on migration, saying, “migrating deduped primary storage between arrays  … is going to need a lot of planning. Deduping primary storage may well be one of the ultimate vendor lock-ins if we are not careful.”

Here are some of my responses to this thought-provoking post, which will no doubt be getting a lot of attention.

On oversubscription:

I agree with Martin that there is a real risk here. When a bulk operation could cause massive rehydration, it’s essential that you have the proper warning and planning tools. There is also an economic component to this–essentially, you’re weighing paying for disk now or later.

A good dedupe solution will allow you to control the degree of over-subscription. While this does not matter so much for backup dedupe, it does matter for online. So you should be able to say, make a new copy of data every time the reference count on a duplicate hits 10 (or whatever number you choose). That way, while you limit your space savings to 10:1. You also limit your exposure to some application level decision that would cause all the duplicates to be rehydrated and returned to primary storage.

Encryption is a good example - encryption will cause most dedupe solutions to not be able to find duplicates at all if the encryption is done at the application or file level. Increasingly, we’re seeing encryption moving to the drive level, and in that case, it will be transparent to primary dedupe, but that’s not to say that there’s aren’t other cases where being oversubscribed couldn’t happen.

The lesson here is clear: Your online or primary storage dedupe tool must be able to give you the tools to manage that risk.

On Migrating Deduped Data

The topic of end-to-end deduplication is the natural next step in the maturation of the deduplication market. Today, you have many vendors, each of whom have built dedupe in to their filer as a feature. Every time you move data, you have to rehydrate it. This is often the case even when you are moving deduped data from one filer to another from the same vendor! NetApp dedupe will rehydrate every file any time you move it off the filer - for SnapMirror, for an NDMP backup, etc. There are really two things that the IT user wants to see. First, you want to be able to move optimized data in its most efficient form (deduped, compressed) not only across filers, but across vendors and storage tiers.

For example, why dedupe data on the filer, then rehydrate it, back it up to a VTL target, and then dedupe it again? Why not dedupe it once, and move the already-optimized data to the backup target, to the DR site, to the tier 2 filer? In the backup case, you’ll still get more dedupe benefit from your dedupe appliance. The repetitive nature of backups mean that when you back up the same file over and over, even if it was already deduped on the filer, it will still benefit from being deduped again with each backup. But you ought to have less data to move to the backup appliance, and you ought not to have to burn up a bunch of filer CPU cycles rehydrating files that are just headed off to backup.

Ideally you want dedupe and compression that is not a lock-in feature of a vendor, but that is a vendor-neutral data reduction solution that the IT shop can deploy across multiple filers (primary, nearline, etc), archive, and backup. And so the lesson again is to take a close look at the dedupe product and be sure that you’re not headed for vendor lock-in.

We look forward to seeing what others are saying about this provocative post.

EMC Dedupe - Beyond Data Domain

Posted by Mike Davis On July - 27 - 2009

With all the talk about the Data Domain acquisition, there less attention paid to EMC’s native de-dupe features in Celerra, not to mention its other related partnerships, such as with Ocarina for optimization of vertical applications. Last week I had the privilege of attending a webinar, “Surviving the Data Explosion through Data Reduction” with John Hayden, CTO of NAS Engineering at EMC, where I got a fuller picture of Celerra’s latest optimization features.

John provided us with insights on how the new Celerra NAS product integrates data optimization. And while he never mentioned Data Domain directly, an astute observer could see how well EMC is integrating prior acquisitions into its architecture, and draw conclusions from that.

First, he provided us with a couple of interesting factoids from the Digital Universe research  EMC sponsored for IDC:

  • In 2009 there is positive growth in digital content, but IT spending for servers & storage are down 6%
  • Over the next 4 years, data will grow 5x, but IT budgets will only grow 1.2x
  • The administrative and overhead cost of storage is 4-7x the CapEx

This was all a prelude to John discussing the new data optimization features for their Celerra NAS product. It’s great to see the NAS vendors recognizing the value of data optimization as a central part of the NAS stack. Drilling a little deeper, EMC basically pulled together file-level deduplication (single instance storage or SIS) from the Avamar acquisition, and LZ77 data-generic compression from their Recoverpoint acquisition. SIS + LZ77 are a good price-performance combination for generic office files and text docs, but they don’t make much of a dent where we see the real capacity and scalability challenges; vertical applications such as life sciences, oil & gas, and media. In fact, the use of generic compression is becoming impotent against the latest MS Office docs that use ZIP as a container. If you change a single text character in an office doc, the entire file changes.

So there’s a reason that Ocarina has a solid partnership with EMC, with an optimization solution that’s complementary to Celerra’s. When it comes to customers with serious capacity issues and data growth - we’re talking about gene sequencing, post-houses, and so on and so forth - there is little to gain from deduplication, and little to gain from generic compression. Not only does the optimization solution need to more intelligently unwind and understand the file structure, but it needs to make better decisions about what algorithms get applied to specific file sub-objects. The is where Ocarina comes in. Like the native Celerra de-dupe solution, the Ocarina ECOsystem integrates with the FileMover API for a tightly knit, policy-based optimization solution that works even on media and ZIP files that are already compressed.

We look forward to our collaborations with EMC, and will be very interested to watch how they continue to integrate dedupe and compression across their offerings.

It’s Getting Cloudy up There

Posted by Sunshine On July - 24 - 2009

Seems Uncle Sam is trying to tighten his belt through cloud computing. Government, at both the federal and state level, is debating how to make better use of this cost-cutting innovation. As we reported a few weeks ago, there are already some initiatives coming down the pike, such as Nirvanix providing cloud storage for NASA moon orbiter photos.

Now it looks as if NASA is getting seriously hooked on the cloud. As NextGov is reporting, the Obama administration is considering making NASA an IT service provider, using its cloud computing model in development, Nebula to manage and share all kinds of government data.

As the article explains:

“Federal CIO Vivek Kundra, Obama’s top technology executive, is examining many alternatives for innovation in the cloud, including using Nebula as a centralized platform to service multiple agencies, OMB officials said. Chris Kemp, CIO at NASA’s Ames Research Center, who is spearheading the program, is working with the federal government’s cloud working group, officials added.”

I know this sounds like a good idea in theory, but I do wonder whether it makes sense to trust the folks who brought us the Columbia should be entrusted with vast amounts of federal data. Just saying…

Vivek Kundra has also been busy coming up with federal cloud security standards. As Tim Greene reports in Network World, Kundra is proposing a “storefront model” in which a set of standards can “designate acceptable cloud service providers that government agencies can hire quickly without each agency having to independently determine that they are secure. The goal is to cut the cost and time needed to expand computing resources of government agencies by embracing the well known economic advantages of cloud computing.”

Meawhile, blogger Christopher Hoff (known to many as “Beaker”) says such a standard is already available, as he sketches out in a recent post on his Rational Survivability blog.

It’s not just the feds that are jumping on the cloud. (Which reminds me of this article from “The Onion,” but I digress.) Earlier this week, we read a report that a couple of Washington state legislators are attempting to derail a potential plan for a $300 million data center, because, they argue, the data it’s being built to store could be handled far more cheaply by a cloud provider.

Data Center Knowledge notes that they argue: “…Washington state is ‘home to many of the leading providers of this rapidly evolving commodity service … Still, our own state government has yet to move in this direction in any material way.’ Both Amazon (AMZN) and Microsoft (MSFT) are headquartered in the Seattle area and have in-state data centers that host cloud services.”

Some are concerned that this cloud mania might be getting out of hand. As one Twitterer, Bas Raayman argues–right now, all one has to say is “cloud” and people are lining up for it, without knowing whether it will really turn out to be a good match for the type of data they are trying to store and/or manage.

Time will tell whether this is a flash in the pan, or a flashy new plan that could save the government oodles of much needed dough.

Doing More With Less

Posted by Sunshine On July - 23 - 2009

If you’re trying to figure out how to do more with less when it comes to your storage, I’d strongly suggest you participate in an upcoming Webinar, “How to Use Storage Tiering to Create Cost Efficient Storage of your Online Data.” It will take place on August 5 at 9 a.m. PDT and 12 p.m. EDT.

The time to register for this event is now, and visiting the above link will walk you through the steps to do so.

Sponsored by Ocarina and BlueArc, the webcast will delve into the practicalities involved in achieving storage efficiency. The focus will be on use of intelligent storage tiering and capacity optimization technologies to reduce data footprint and effectively manage data center storage resources.

Featured panelists will be Noemi Greyzdorf, Research Manager, Storage Software at IDC, Victoria Kepnik, Sr Product Manager at BlueArc, and Eric Scollard, VP of Sales at Ocarina Networks.

As we have discussed here in the past, storage tiering can be one important way to reduce disk costs. As Carter George put it in a recent post: To keep up, you have to cut the flab out of your storage. This, too, calls for a two-pronged approach. … This means doing a better job of tiering, and keeping files only as long as you really need them … The second part is the “exercise” element of keeping your storage slim and trim. That is, run a storage efficiency tool –may we suggest Ocarina as one example — that will efficiently trim the fat out of your data. That kind of combination means that you really can tighten your belt on your storage budget.”

And while storage tiering and capacity optimization are frequently discussed in storage publications, this is the first time I’ve seen this particular group of storage experts come together and take a serious and significant look at the details of how this can best be achieved for your enterprise. We look forward to your participation.

Thanks to our Readers

Posted by Sunshine On July - 22 - 2009

As the new vendor storage blog on the block, Online Storage Optimization has a lot to be grateful for. Since our relaunch in February, we have become a known source of opinion and information on the very hot topic of data deduplication. You, our readers, are the reason we exist. Your comments and thoughts are what keep us going–and thinking. Your visits to our site, which according to our analytics software are increasing almost daily, make us realize that we’re providing a useful service to you.

So, this post is simply to say “thank you.” We appreciate your interest in what we have to say. We hope that you will tell others about this blog, especially those who are seeking to better understand the ways in which data reduction can offer cost savings and earth savings. We hope to hear more of your comments. And for those of you who we are meeting on Twitter, we thank you for your retweets–of which there have been so many. Not only does this mean that you are helping to garner more interest in this blog, but it also shows to us that what we are saying has value to you.

If you disagree with us, that’s fine as well. We know that vendor blogs are used as a forum for healthy debate, and we welcome any and all respectful comments–and so far, all of your comments have been respectful. We look forward to meeting all of you on these pages and in the outside world.

Storage News and Views July 21

Posted by Sunshine On July - 21 - 2009

What is it about mid-summer that turns everyone a little bit mad? EMC is now the proud parent of a baby DDUP. NetApp is $57 million richer. And here in the Bay Area, the weather veers from freezing to sweltering, depending on how close one is to the coast.

Here are a few headlines that caught my eye this morning:

SearchStorage: Cornell University, Shopzilla deploy primary storage data reduction to consolidate storage, keep up with data growth - Beth Pariseau reports on the way that Ocarina and Storwize are reducing data for their customers. Thanks to Ocarina, Cornell University has slashed its storage costs, and can now consider better economies of scale by consolidating storage for other departments.

StorageIO Blog: Summer Weddings: EMC+Datadomain and HP+IBRIX - Greg Schulz on the two storage mergers of the season. Some interesting thoughts on the HP-IBrix merger that I haven’t seen anywhere else. And what is this Mass./Calif. love affair all about?

Chuck’s Blog: Data Domain: The Cone of Silence is Lifted EMC’s Chuck Hollis drops some hints about how Data Domain will be integrated into its existing storage offerings. The comments are also worth reading. Chuck does a great job fielding the fevered speculation that’s going around.

Network Storage is Back

Posted by Sunshine On July - 20 - 2009

One of our favorite blogs, Network Storage by Anil Gupta, has been something of an on again off again project. It is written by Anil Gupta, a Quantum systems engineer–though the ideas are his own.

Yesterday, he posted on an observation he had made. Namely, that several different spreadsheet programs had miscalculated the answer to his query on a factorial. Welcome back to blogging, Anil, and we hope to see more of your posts.

How Much Are You Saving on Storage?

Posted by Sunshine On July - 17 - 2009

Lately it seems that everyone out there is telling you how much you can save on storage. In many ways, we truly are experiencing a new era in storage, in which the base costs are being reduced through advancements such as virtualization, thin provisioning, and of course deduplication and compression. But how much are you really saving? It’s sometimes difficult to know.

Ocarina has decided to make it much easier to determine the answer to this question. You purchase Ocarina to save you money, right? Now, find out how much you’ll be saving with one quick trip to the web site, with the Ocarina ROI Calculator.

Nothing could be simpler to use: you enter the key data into the calculator, press the “Calculate Savings” button and find out the answer. Obviously, this is really just a first step, but it’s a powerful way to get a sense of the kind of reduction you can expect based on your fileset.

If you find this a useful tool, we hope you’ll pass on the word.

Data Centers Grow Up, Chill Out

Posted by Sunshine On July - 16 - 2009

datacenter

Lots of news about improvements in data centers this week. It’s as if suddenly, a whole bunch of folks woke up and realized that these things are here to stay, and so need some extra attention.

For example, Data Center Knowledge is reporting that Datapipe has built a beautiful, glass-enclosed atrium for its data center. The pictures of the lobby posted on Flickr are impressive to say the least.

Meanwhile, UK publication TechWorld is reporting that IT consultant Glasshouse is offering a data center “greening” service to ensure that your data center is getting a “green bill of health.” Its Energy Proficiency Impact Analysis is the path to success. I personally would’ve come up with a different phrase, so as to get a better acronym–something that spells out GREEN ME, perhaps? Just a thought…

Finally, blogger/consultant Steve Duplessie has already picked up on some very interesting data center news–Google is planning to build a data center in Belgium that dispenses entirely with chillers to support its cooling systems. As we’ve discussed on this blog in the past, there is a growing recognition that cooling can be done in a far more energy-efficient manner by simply using the air from the outside. Let’s hope this trend continues, as it’s obviously good for the planet.