Content feed Comments Feed

Online Storage Optimization

Exploring Next Generation Storage Solutions

Archive for May, 2009

Storage News and Notes - May 29

Posted by Sunshine On May - 29 - 2009

This has been a very interesting week in the storage blog-o-tweet-osphere, and the hottest topic was, somewhat ironically, an announcement that seemed to fall flat. Wednesday, HDS brought out its High-Availability Manager for USP-V (quickly dubbed “HAM” by bloggers), and several bloggers called it underwhelming and confusing.

Chris Evans, The Storage Architect - Enterprise Computing: USP-V - So Long And Thanks For All The Fish

Stephen Foskett, Gestalt IT -  HDS’ HAM-Fisted Announcement Can’t Be All

Storagebod’s Blog - I Wanted Bacon not Ham

To its credit, HDS immediately fired up a whole boatload of responses. Consultant Tony Asaro can be found arguing each point in all of these blog posts.

He also posted this on his Blog Bytes HDS blog:

Real World Implications and Impact of Hitachi High Availability Manager

HDS’s Hu Yoshida also put out a short post that clarified some of the issues:

Hu’s Blog - High Availability Cluster

In the end, there was this Seussian wrap-up of the whole debacle by Stephen Foskett - A Taste of HAM

I have to admit, this last one made me laugh.

In other news, there was some actual news out there this week! A Massachusetts court has ruled that Dave Donatelli, formerly of EMC, may work at HP, but he isn’t allowed to work in the storage division–the result of a non-compete clause the storage veteran signed with his former employer.

And finally, this blog’s parent Ocarina Networks was profiled in The UK Register this week:

Chris Mellor - Ocarina makes waves with lossless image compression

The article takes a look at the company’s compression technology–the first article that gets into this level of detail about it that I’ve seen. Definitely worth a read for those who are wondering about the magic behind its amazing results with image compression.

Backup to the Future

Posted by Carter George On May - 28 - 2009

back_to_the_future

I’ve been thinking a lot about backups lately. Yesterday, blogger Stephen Foskett put up a post on his Nirvanix blog, Enterprise Storage Strategies, that got me thinking about the subject some more. In seeking to answer the seemingly simple question, “What is a Backup?” he garnered input from several industry experts — coming up with a brand new, working definition of the term.

Here’s my take. Almost all traditional backups today are based on a model developed for 1990’s Unix machines. A backup software agent ran on the Unix machine, passed files to be backed up to a media server, and the media server packaged up those files to write to a tape drive. Why did you package up files - which users and applications can understand perfectly well - into backup software proprietary formats? Well, primarily because to use tape efficiently, you needed large files that could keep a tape drive streaming at a reasonable rate.

In the process, you ended up with tapes that had “backup data” in a format that only the backup software could read. Great for backup vendors, not so great for users. Now, in the modern world, people don’t have Unix machines, they don’t have tape drives, and the open question is, do they really need that backup software and media server at all? All sorts of vendors bend over backwards to build products - like disk arrays with virtual tape library interfaces - so that their new technology can look like old tape technology. Backup vendors still package up files in “saveset” file formats even though they are not writing to tapes whose heads have to be kept streaming.

The scale of data has also changed dramatically. In the 90’s, 1 Terabyte was a huge amount of data. Now you can buy that at Fry’s for 200 bucks. At this point, we’re in the Petabyte era, and there’s no sign that data growth is slowing. All of the 90’s technology is sure to begin to fall apart.

Here’s the obvious question, then: Why not just move files that are candidates for being backed up to a separate tier of storage, keeping them as files in their native format, and organizing them in time coherent views? Sure, you can restore whole volumes or directories, but with a little search engine capability, end users can find and restore any file, from any point in time, themselves without backup software or storage admins. This also makes it much more straightforward to integrate intelligent dedupe, compression and other mechanisms for storing large amounts of backup data and files efficiently.

By making backups sets of files in their native file format, as opposed to backup software-specific saveset files, it also starts to become possible to blur and merge the distinction between backups and archives. Retention policies, compliance and regulatory rules can be applied to files based on their metadata, their contents, their owners, their business context - because all of that information is there in the backup file set just the way it was in the primary copy of the data. While this is possibly all years away - people are very conservative about changes in their backup technology and workflows - the technology to do next gen backups that actually are designed and built with modern technology that looks and acts like modern technology, instead of masquerading as tape, is all here and ready today.

Object-Based Storage - New Possibilities

Posted by Carter George On May - 26 - 2009

Nice piece this past week in the “Storage Tips” section of SearchStorage by a contributor named Alan Radding about the object-based direction that SAN is taking. The piece also ran in the recent issue of Storage Magazine, which is now an electronic only pub.

There are very interesting possibilities opened up by object-based storage. Radding raises some of them in his article — discussing the fact that there are trade-offs when moving towards a more wide usage of object metadata. To me, there is a great deal to explore when looking at this potential trend. In essence, storing data as objects with rich metadata means that object dedupe becomes a natural thing to do. In this scenario, you not only dedupe objects (as opposed to blocks), but you can also drive decisions about dedupe, compression, and other optimizations based on the metadata of the object.

Essentially, a block in a SAN array is just a fixed-size chunk of data that the array knows very little about, whereas an object is a variable chunk of data that you may know quite a bit about. This includes: file and data type, owners, use cases, compliance policy, etc. Storing all that metadata is only interesting if you do intelligent things with it. With this in mind, my view is that object dedupe is a natural winner.

Because the information you know about an object is discoverable, it can be used to make intelligent decisions about how to reduce its space - using smart object dedupe, file type (content-aware) compression, even selective deletion. For example, a medical image might include a large lossless original MRI image and a small thumbnail for convenience. If you know that an object is a medical image, that HIPAA compliance applies, and that the object is considered archived, there are several things you might decide about storing that object. First, you know that dedupe probably won’t do much for that object type, but you might compress the big image with a bit-for-bit lossless compressor that is specific to MRI’s, and you might delete the thumbnail, knowing that you can recreate it on the fly any time, automatically, from the lossless original you’ve kept.

You might end up reducing the space required to store a medical image archive by 75% without losing a single bit of information, and meeting all HIPAA requirements. You can’t make those kinds of decisions on a block in SAN storage array — that’s just 4K of data that belongs to a file system or database or application, and at the SAN array controller or switch level, you can’t do much with it because you don’t know what it is. You can only do physical things, such as: mirror, replicate, move to a faster or slower tier, etc. You can’t make decisions based on content, use-case, or lifecycle.

When people talk about object stores, the examples are almost always about compliance. But really, object stores — both in the data center and in the cloud, where the object store is the dominant model already (just look at Amazon S3) — could emerge as a natural tier two storage for NAS and files in general. If they do, then object stores provide very rich possibilities for dedupe, compression, and other cost and space saving optimizations.

All worth thinking through, especially in light of the need for more and better optimization technology to manage the massive upsurge in data across many industries.

Got Ocarina?

Posted by Carter George On May - 26 - 2009

milk

With so much talk about dedupe lately, it’s hard to know which aspect of it to discuss first. We’ll begin by addressing a question we occasionally hear–most recently from a comment on this very blog. The question can be summed up as: why someone would pay an extra amount for a solution such as Ocarina, rather than simply go with the dedupe that is free from NetApp (or another vendor)? This is not unlike the old saw “Why buy the cow when you can get the milk for free?” Well, the really quick answer in this case is that you are likely going to get a LOT more milk than you would otherwise be able to.

In any case, this question was one of many indications that we need to write a post or two to help clear up some points that may be confusing to those who aren’t deeply involved in this particular corner of the storage industry. So we thought it worthwhile to get into some detail with our response.

NetApp is the market leader in NAS, and they were pioneers in dedupe for online storage. If we at Ocarina are going to be successful, we have to explain why our technology is different and better than what they have.

NetApp is a model for both good technology and execution - and so is their recent acquisition, Data Domain. We have great respect and admiration for both companies, but we do have a better mousetrap when it comes to dedupe for online storage. It’s our job to tell the world about that. We’d love to partner with NetApp. In fact, we have NetApp customers who have purchased our product and are running it in production, having evaluated us against NetApp Dedupe. However, NetApp clearly feels they own “dedupe for primary” for their filers, and their purchase of Data Domain makes it clear that they intend to make storage efficiency and dedupe a major focus of the company going forward.

We are going to have to compete with their offerings. That’s fine - competition is what drives innovation and forward progress, and what makes the technology industry so much fun to work in! That said, then, if a company like Ocarina is going to be successful, we clearly have to do two things:  1) have a better mousetrap — if we’re not better than the thing you get for free, why would anyone buy our product? and 2) have a solution that works for both NetApp customers and non-NetApp shops.

NetApp Dedupe is pretty good for some things, but it has limitations. One obvious one is that it doesn’t help the EMC, HP, Isilon, Dell, IBM, BlueArc, or HDS NAS shop. It goes without saying that every NetApp customer will try the free dedupe first, and would only come to Ocarina after realizing that they need better results. If the NetApp dedupe is “good enough,” we do not expect customers to bring in Ocarina. However, there are several for whom it is not.

Keep in mind that the NAS market is huge, and growing faster than any other storage market segment. There are many other NAS vendors besides NetApp, and all the customers of all those storage vendors - from Windows file/print servers, to big players like EMC and HDS, to aggressive technology-leading NAS players like BlueArc and Isilon - are interested in the benefits of dedupe and storage efficiency for online storage. Ocarina wants those customers to know that that technology is available for all those platforms. You don’t have to buy a NetApp filer to get dedupe for online. And we want them to know that if they do go with Ocarina, they are not just paying for something on their storage that’s no better than what they’d get for free from NetApp. They are getting something better.

Now to the Storage Switzerland report that started so much of this discussion. It was about one thing: how well Ocarina object dedupe and content-aware compression shrinks data compared to NetApp dedupe. We think we did pretty well.

But there are other issues to consider.  Two that always come up in every customer are 1) how fast can you dedupe a data set and 2) how fast can users and applications access data after it has been optimized? We’ll cover both of those topics in future benchmarks and reports, as they’re both very important topics.

I’d like to point out that there are also two other big considerations for customers who want to reduce their storage footprint using dedupe or compression technology. These are things that are less measurable in a lab report, but just as important.

One of those is the ability for a customer to buy one “dedupe for online” solution and have it work across storage from multiple vendors that they might have in house. Sure, some customers have only one file serving platform, but many have multiple. They have multiple tiers, or they have old stuff and new stuff, or they have a standard of Vendor A but got a lot of Vendor B stuff when they acquired and integrated another company.

With Ocarina, a customer can choose a single “dedupe for online” solution that would work across all those vendors’ storage - including NetApp - with a single interface, a unified management console, and even the ability to dedupe across platforms.

The second product design differentiator, and why people will pay for us instead of deploying something that’s free, is the granularity of optimization. With NetApp Dedupe, you either dedupe a volume or not. It only works on NetApp, and your choice for a given NetApp volume is dedupe or don’t dedupe. With Ocarina, you have any number of dials to choose from based on your specific needs. You can choose to optimize sets of files within a volume by fine-grained policy. Examples might be, don’t dedupe database files at all, dedupe Office and PDF files that are two days old, dedupe and compress all media, video and photo files that have not been modified for 10 days. Really, any mix of file type, age, or metadata characteristic can be used to create policies that determine not only whether a file gets optimized, but how aggressively - object dedupe only, object and subfile dedupe, object and subfile dedupe and lightweight compression, or all of the above and sophisticated content-aware compression. You can match your level of dedupe and compression to the SLA’s, business value, and characteristics of your files. I don’t know how you benchmark that, but it’s pretty significant.

With all this in mind, we plan a second post to delve into yet more issues related to the Ocarina-NetApp lab report, so please stay tuned.

A Q&A with Michael Callahan, HP

Posted by Sunshine On May - 22 - 2009

mjc-photo-cropped1

We’ve been hearing about high capacity storage systems, such as HP’s Extreme Data Storage 9100 System (ExDS9100) for the past year or so. There’s clearly huge potential for using these types of systems to manage many terabytes of data. We decided to sit down with Michael Callahan, chief technologist for network-attached storage in the HP StorageWorks Unified Storage Division, and get his views on the trends towards larger capacity storage, deduplication and other responses to the rising tide of data. Below is our interview.

Sunshine: What kinds of trends are you seeing in storage, from where you stand?

Michael Callahan: We feel like we’re very well aligned with two of the big trends that are underway. The first trend is that storage systems are consolidating. Many enterprises are realizing they’re creating lots of individual silos, and data is spreading across many filers. While it’s natural for this to happen, it’s very inefficient. You wake up one morning and realize you’re dealing with immense and costly complexity, as well as poor utilization. That’s why there’s been a huge amount of interest in consolidation as an IT project, really across all industries. It’s just now that people are getting focused on consolidation as regards storage in some of these environments.

The second trend, of course, is around data reduction and other similar efficiencies.

Sunshine: Why is dedupe such a hot topic these days, in your view?

Michael Callahan: Well, obviously it has to do with economics. Much of the last decade has been focused on cost control — being able to do more with less. The more geeky answer is that on the technology side, there is the fact is that machines are far faster than they were even five years ago. Over the years, the compute power in environments has risen significantly as compared to storage speeds. So it becomes much more appealing to spend some CPU cycles to reduce data before putting it into storage.

Sunshine: Wow. I’ve never heard it put in quite this way.

Michael Callahan: The way I look at is that ten years ago, someone might’ve proposed something along the lines of Ocarina, but it would’ve been harder to justify. Today in systems like our ExDS9100 — the Extreme Data Storage System and the space where we partner with Ocarina — we use industry standard components. That is, HP blade technology. So we’re able to leverage an engine that is building incredibly powerful, frugal blade systems with a lot of compute power, right in the storage system. Customers should be able to deploy that compute power into the storage tier very effectively in order to make it do interesting things such as Ocarina.  And because the ExDS9100 is based on Linux, we can run the Ocarina software right within the storage system itself.

Sunshine: What do you see as the advantage of your storage system?

Michael Callahan: We feel we have huge advantages in being able to build systems that leverage the fact that HP has the most successful, widely-used industry standard blade infrastructure.

People are asking, what can we do to be efficient in the way we spend our dollars for storage, and utilize our data center space? Actually, this goes beyond cost. It’s not just the money, but also the space for storage. There is literally no place to put the storage even if you can pay for it. So in light of that, there’s this incredible push for some set of tools that will allow customers to optimize their use of storage.

One thing that we at HP do is to build a system in the ExDS that is in itself very dense, power efficient, and simple to manage. That’s a good starting point. But then it’s very compelling to be able to go beyond that and say furthermore, we’re able to do some very advanced things with Ocarina around compressing the types of data that tend to show up on these huge data sets.

We think we have a better integration with Ocarina than the other systems out there. For the customers who will be using Ocarina with our solution, there’s no box that says Ocarina. The same blade that’s in the storage system would have Ocarina running as software within it.

In our design, the expectation is that you’ll have lots of disk storage, and then you’ll want some number of blades. The architecture allows the choice about how much storage and how many blades to be made completely independently of each other, and to be revised without having to do any complicated repartitioning or migration of data.

Sunshine: That’s the PolyServe aspect of the architecture?

Michael Callahan: Yes, and the relevance to Ocarina is that Ocarina is a capability that consumes CPU cycles to process your data, and you might well need to have some flexibility in the amount of CPU power in a system. Suppose you have a system where you’re going to load up some huge data set–100s of terabytes of data–but then access it at a relatively low rate.

In our ExDS system, you can put 16 servers into one rack unit-each of those blades has 2 CPUs, each has 4 cores. You can actually have as many as 128 cores in one compact box. If you’re using Ocarina, you might choose to have some relatively large number of blades, because you’re compressing data and there’s significant computation involved in that. So during the ingest process, you can configure the system in such a way to optimize computation. But then once it’s all filled up, there’s no more need for lots of CPU to support the (Ocarina) Optimizer. So, at that point, you can take those blades out of the system and just run the Ocarina Reader on many fewer blades, proportional to the rate it’s being accessed.

In our approach there’s no partitioning of the data–every blade can access every part of that data set completely equally. You’re not required to go through some horrific rebalancing process to accommodate the number of blades. So it’s really natural fit.

Michael Callahan is Chief Technologist for network-attached storage in the HP StorageWorks Unified Storage Division. Previously, he was Chief Technology Officer at PolyServe, a software company that delivered scalable, highly-available shared data clustering solutions, from its founding until it was acquired by HP in April 2007. Before that he led advanced development at Ask Jeeves and did mathematics research at the Mathematical Sciences Research Institute in Berkeley. He has a BA from Harvard University and was a Rhodes Scholar and Junior Research Fellow in Mathematics at Oxford University.

Dedupe the News of the Week

Posted by Sunshine On May - 21 - 2009

What a week this has been. For those of us who are in the middle of the deduplication market, it’s amazing to see just how much ink is being spilled to discuss the various possible scenarios that could unfold as a result of the NetApp-Data Domain merger announced yesterday. And of course, everyone is having fun on Twitter cracking jokes about NetApp deduping its cash, and so on and so forth. Yet, no one can deny that this puts a spotlight on this technology like never before.

Many of the articles raise the question of where Ocarina might fit into the competitive landscape. Here are a few that caught my eye. Please feel free to add any others of note in the comments field:

InformationWeek, Antone Gonsalves - NetApp Buying Data Domain for $1.5 Billion

“…there’s a growing trend in storage in which network-attached storage vendors are teaming up with deduplication companies in order to make a stronger offering. In this case, NetApp plans to strengthen its deduplication capabilities by buying a competitor. But examples of partnerships include NAS suppliers like BlueArc, Isilon, and Hewlett-Packard (NYSE: HPQ) partnering with Ocarina Networks and others.”

George Crump, Storage Switzerland - NetApp Buys Data Domain - Storage Impact

“We know that Ocarina can dedupe post process and move the resulting optimized data to any tier and manufacturer of storage, we know that Storwize can compress inline with little to no performance impact, but it remains to be seen if NetApp can or even tries to do both, and while they can support a limited number of other types of storage with their V-Series they have yet to master transparent data moves between classes of storage.”

Chris Mellor, The Register -Dedupe This: NetApp Buying Data Domain

“So far it [Data Domain] has no answer to Ocarina’s content-aware compression, the ability to dedupe graphic images and videos such as JPEG and MPEG files that traditional dedupe can’t touch.”

Storage Newsletter - NetApp Buys Data Domain for $1.5 Billion

“NetApp acquired Data Domain too late and will have difficulties to realize a return of this huge investment, even if the acquired company is one of the current most successful entity of the worldwide storage industry, with probably the best de-dupe technology (with others including EMC/Avamar, Exagrid, IBM/Diligent, Ocarina, Quantum, Riverbed or Sepaton).”

NetApp-Data Domain: A Sign of the Times

Posted by Carter George On May - 20 - 2009

Today’s revelation that NetApp will acquire Data Domain for $1.5 billion in cash and stock has many in the industry reeling in shock. It’s a huge acquisition, and it shows how important dedupe is to the storage industry. It also leaves one wondering now how HP, Dell, EMC and others are going to compete on storage efficiency, data reduction, and dedupe over the next years.

Of course, at Ocarina we think this is great - the market is going to need leading-edge data reduction products and technologies, and now one vendor, NetApp, has the market leadership position in both dedupe for online with NetApp Dedupe and dedupe for backup with Data Domain.

This clearly creates a situation where the rest of the storage industry is going to have to respond to stay competitive, and we think we have the best products, technology, and IP to enable the industry to compete with what is now the NetApp juggernaut.

The Data Domain technology gives NetApp an immediate leadership position in Dedupe for Backup, but the DDUP technology will not translate easily in to dedupe for primary, where NetApp Dedupe already exists in the filer. It will be interesting to see over time whether NetApp keeps Data Domain focused strictly on backup and archive markets, or tries to incorporate some of their IP in to dedupe for online over time.

At $25 a share, NetApp paid a 40% premium on yesterday’s closing DDUP price of about $18/share. This is a great comparable for not only the excitement around data reduction, but also the difficulty of doing it right. In short, if it was easy to copy, then NetApp would not have paid the premium. It’s a win for the Data Domain team, which is a top notch group of people who have built a great product and executed extremely well in all aspects of their business.

We’ll be very interested to see what happens in the coming weeks and months following the acquisition.

Ocarina Bests NetApp

Posted by Sunshine On May - 20 - 2009

In case you haven’t heard yet, the news is out that Ocarina Networks provides better data reduction results than NetApp. An independent study from Storage Switzerland, commissioned by Ocarina, pitted the two solutions in a head-to-head trial. It found that the complete Ocarina solution, a combination of content-aware compression and deduplication, beat NetApp dedupe by as much as 57x. The results of this test were not subtle. You can do your own calculations to consider the space savings you would gain by using Ocarina.

This is yet another validation that a next generation, content-aware approach to data reduction is what’s called for when it comes to online data sets. I have created this graph to illustrate the respective reduction results by percentage. See below:

image002

Here’s the breakdown by data set comparing NetApp’s complete solution against Ocarina’s complete solution in terms of the amount of reduction each one was able to achieve:

Home Shares:

NetApp: 27%

Ocarina: 54%

Internet Media:

NetApp: 2%

Ocarina: 51%

Media and Entertainment:

NetApp: 21%

Ocarina: 49%

Oil and Gas

NetApp: 0%

Ocarina: 48%

Life Sciences

NetApp: 6%

Ocarina: 46%

For more information and to download the complete study, please visit: http://www.storage-switzerland.com/Articles/Entries/2009/5/11_Lab_Report_Overview_- _The_Deduplication_of_Primary_Storage.html.

The report is also available on the Ocarina resources page.

Storage Community Pulls Together to Save a Life

Posted by Sunshine On May - 19 - 2009

nick_glasgownick_banner1

Sometimes it seems that storage bloggers do nothing but bicker. But this week, competitive differences were all set aside as everyone pulled together to help save a man’s life.

As EMCers Chuck Hollis and Storagezilla have both posted, an EMC employee based in their Pleasanton, California office is in need of an emergency bone marrow transplant. With lightning speed, this story has made its way across Twitter and, we hope, into the living room of someone who can help him.

Update as of 5/21 - there is now a blog that consolidates the effort, at: http://markfredrickson.wordpress.com/.

As Chuck explains:

“Nick Glasgow, of our Renewals team in Pleasanton, needs a bone marrow transplant.  Nick was a healthy 27 year old when he came down with what was at first believed to be strep throat about  nine to ten  weeks ago.  Eight weeks ago, he was informed that he has Leukemia and was admitted to the hospital immediately for chemotherapy.  Nick has endured two rounds of chemotherapy, and received blood, platelet, saline, and antibiotic infusions.  These have all failed to put Nick into remission.  Nick’s white cell blood count is too low for another round of chemo, and has been sent home as antibiotics have been stopped and his immune system is compromised … The doctors have advised that they think it is highly unlikely that they can find a match for Nick as a match would need to be 3/4 Caucasian and 1/4 Asian.”

What the doctors don’t know is that we’re using the power of social media to get the word out far and wide. Let’s make sure this works. If you or someone you know could be a match for Nick, please consider going to: http://www.marrow.org/ and finding out how to make a donation.

Put Your Storage on a Diet!

Posted by Carter George On May - 18 - 2009

Attractive young businesswoman

As anyone who has tried to lose weight knows, it’s no easy feat to get and stay slim. No doubt you’ve seen the barrage of ads that promise you a quick fix (hydroxycut anyone?), but most of us know these fad programs don’t work in the long run. Rather, the key is a combination of eating less (and making healthy choices about what we do eat) and exercising.

Well, something similar could be said of your storage budget. If you are like the vast majority of enterprises today, you’re dealing with what can be described as data obesity. Whatever type of array you’re running, it’s likely gaining files every day. Many are no doubt those pesky rich media and compound Office documents like PowerPoint and PDF that add more and more weight–stressing your resources to the breaking point.

To keep up, you have to cut the flab out of your storage. This, too, calls for a two-pronged approach. First, there’s the “diet” element. This means doing a better job of tiering, and keeping files only as long as you really need them. Of course, there are many examples of near-active and inactive data that nevertheless must be kept online for compliance and other similar reasons. But the plain fact is that many of the files that are clogging up storage systems at small- to medium-sized enterprises have nothing to do with their core business at all. They can be everything from family photos to funny videos to old documents that no one will EVER look at again.

The second part is the “exercise” element of keeping your storage slim and trim. That is, run a storage efficiency tool –may we suggest Ocarina as one example — that will efficiently trim the fat out of your data. That kind of combination means that you really can tighten your belt on your storage budget. With our solution, for example, a typical enterprise will reclaim half again as much storage space. We can shrink down your files to a slim and trim size — even on the types that stymie most conventional dedupe solutions.

To find out more about how Ocarina stacks up against the leading dedupe solution, give this recent article by Chris Mellor in the UK Register a read: Ocarina Dedupes Better than NetApp.