Content feed Comments Feed

Online Storage Optimization

Exploring Next Generation Storage Solutions

Archive for June, 2009

The Jacko Effect

Posted by Sunshine On June - 30 - 2009

As news spread of Michael Jackson’s death, the internet also went into cardiac arrest. News reports came in that show that spikes in just about every type of online traffic pushed the Web to its limits — from text messaging to news sites to celebrity gossip sites and more. The latest is that the pop star’s death is even yielding new malware and spam. Venturebeat is calling the whole debacle a “wake-up call for the web, and for those who are building its infrastructure and plumbing for it.”

In short, this unexpected moment of shared mourning has revealed weaknesses in the network at many levels.

As Chris Preimesberger points out in a recent eWeek post, we must also deal with the short- and long-term storage consequences of a massive number of photos and YouTube videos of Jackson being posted in the wake of his death. Writes Preimesberger:

“While all the surge talk has been about Web servers going nuts, not much has been written about the storage that has to handle all these new documents, most of which will be kept forever somewhere. We’re not just talking about all the billions of text messages, e-mails, Facebook and Twitter messages, and the like. How about all the photos and YouTube videos of Jacko, plus all the affiliated videos being posted that he’s not even in?”

There’s truth to this: I even caught myself passing around a few videos, and to be frank, I’m not much of a fan.

Data Center Knowledge offers a graphical representation of the slowing of news sites during the course of the cycle (thanks @skenniston for link). In a post entitled “The Web Creaks as Jackson Fans Mourn” it shows that the availability of news sites dropped dramatically as the news leaked out of Jackson’s demise. Some have said this is a perfect argument for a more cohesive cloud computing strategy.

The post cites Reuven Cohen at Elastic Vapor, who writes: “There is no longer any good reason for a professional website property to go down because of load … Cloud computing provides an almost infinite supply of computing capacity, be it a infrastructure as a service or platform as a service or even a traditional CDN. Not have a cloud bursting strategy in the age of cloud computing isn’t just wrong - it’s idiotic.”

In light of this experience, is it time to take stock?

Um… can I have an extension on that?

Posted by Sunshine On June - 29 - 2009

I remember asking the above question time and time again during my college years. Well, now EMC is doing the same thing in its offer to Data Domain.

EMC extended its deadline, which was set to expire today, to July 10 as a way of enticing it away from its other suitor, NetApp.

Will it make a difference?

Have a Green Storage Day

Posted by Sunshine On June - 26 - 2009

Even in these recessionary times. Green IT continues to gain momentum. This, according to a recent survey from Symantec, which states: “Virtually all the companies surveyed are discussing their Green strategy. They are not just talking, either. Green budgets are on the rise and IT is more than willing to pay a premium on energy efficient products…”

Citing this survey, analyst and consultant George Crump argues in an article in Byte and Switch today that energy savings are possible in many areas of IT, including storage.

Update: also found this article on TechTarget that’s a little incomplete, but worth a read nevertheless — Green storage best practices control costs, increase energy efficiency.

Writes Crump:

“I admit it, I was wrong. I assumed that green IT initiatives would be put on the back burner as we slogged our way through the current recession but according to a recent survey by Symantec apparently just the opposite is happening. Will green storage be a key part of the green IT effort?”

The answer could well be “yes.” In fact, there is a green side to data reduction, which as many are aware has been a very hot topic lately–especially in light of the recent battle between storage giants NetApp and EMC over archive deduplication leader Data Domain.

Crump argues that one of the greenest forms of storage is tape, which doesn’t require any power or cooling.

He also notes: “While disk archives that leverage clustered storage will have some difficulty in powering down drives since data is distributed across nodes in the cluster, they could power manage the nodes in the cluster itself. Of course, they gain power efficiency through greater density per node - bigger drives, compression and deduplication.”

Crump is discussing archive storage here, and cites Permabit as a solution for that realm. There is no doubt that data reduction for primary storage is a particularly key way to reduce one’s storage footprint. Primary storage, by its very nature doesn’t lend itself to tape, because the data is being kept online. Dedupe is therefore becoming a must-have, with all the leading vendor offering some type of it for their primary storage. For those who would like notably better results, a next-generation solution such as Ocarina is a way to further shrink files. The results–lower power, cooling and space usage. Green indeed.

Storage News and Views - June 25

Posted by Sunshine On June - 25 - 2009

It’s almost July 4, and I hope all of you are starting to stock up on hot dogs, buns, American flags and sparklers. And for all you Florida folks, enjoy setting your own fireworks. When I was a kid, we had an uncle who lived in Miami who brought them up to us every year–completely terrifying me with a far too close up and personal fireworks display. But I digress. Here are some of the storage and IT industry headlines that caught my eye this week.

Chris Mellor, The Register (UK) - Adaptec Adds NAND Cache to Raid Cards

Blog Stu - The Summer of FCoE

Beth Pariseau, Storage Soup - New DR SaaS startup buddies up with Data Domain, offers SLA

Tony’s Blog Bytes - Intelligent Tiering - Recent Discussions

Happy Tuesday everyone.

Dedupe for Primary - Recent Coverage

Posted by Sunshine On June - 22 - 2009

As we keep noting on this blog, data reduction is becoming the topic du jour as storage budgets are squeezed and deduplication becomes more and more viable and effective. Dave Simpson, Editor-in-Chief of Infostor, came out with a very thorough article today on primary storage optimization. It’s a practical guide for customers who may be struggling to understand the differences between key vendors’ offerings in this new and exciting data reduction arena. They are: NetApp, EMC, Ocarina (this blog’s parent), Storwize, Hifn, and greenBytes.

According to Simpson’s article, performance is a key issue to consider when assessing primary storage optimization products. He also quotes Eric Burgener, formerly an analyst with Taneja Group (now with InMage), who notes that often time the much touted differences in reduction rates can be overplayed.

“… a handful of vendors are addressing the performance requirements associated with data compression and de-duplication on primary storage, and … users should understand that there’s not a huge difference between, say, an 8:1 data-reduction ratio and a 20:1 ratio.”

An interesting point, and one that is often overlooked in the race to show results. As we have reported on this blog in the past, the real comparisons should be about the percentage of difference, not the ratios, which can be misleading. So for example, the Ocarina ECOsystem had 200% better results on a typical home shares file mix than NetApp dedupe, with 54% reduction vs. NetApp’s 27%. These are real numbers that can give you a sense of the amount of storage space you’re likely to reclaim when deploying one of these solutions.

And by the way, Eric Burgener had a really nice post back in February when he was still at Taneja Group.  Called Pulling One Out of the Hat, it gives great advice and details about how to make the best use of your primary storage budget in these times. Definitely worth a read.

Happy Monday everyone!

Why Dedupe? It’s the Economy, Stupid

Posted by Sunshine On June - 19 - 2009

Beth Pariseau at TechTarget has an interesting article this week about how a combination of storage tiering and dedupe are just right in these recessionary times. It talks in some depth about Ocarina’s deployment at Rainmaker Entertainment on BlueArc and Isilon.

The article is worth a read all the way through, as it gives some real-life examples of two very different ways that tiered storage and data deduplication together added up to storage savings. For example, Clackamas County in Oregon was able to reduce storage costs by utilizing a combination of F5 for migration to lower tiers and Data Domain to dedupe archives.

Christopher Fricke, senior IT administrator for the county is quoted in the article saying: “…It helps us not have to chase capacity while we go through a budget crunch — we can focus on performance rather than capacity…”

The article also delves into how a combination of migration and dedupe/compression can greatly reduce storage costs and simplify life at entertainment studios. Rainmaker, a digital animation studio, deployed Ocarina in order to ensure that they could keep all their files online, rather than having to back up to tape while in the midst of a project.

The article quotes Ron Stinson, Rainmaker’s director of IT and operations, who said: “We’re looking at compressing 6 terabytes down to two, and possibly storing 300 terabytes on the Isilon system in the future.”

A very interesting set of use cases that help highlight the value of dedupe in very practical ways.

The Dedupe (R)evolution

Posted by Ocarina On June - 18 - 2009

The fight between NetApp and EMC over Data Domain has put deduplication into the spotlight as never before. Yet, dedupe for backup is now a mature market, and in many ways it represents the past. The next step, in my view, is that dedupe must evolve for object stores and fixed content archives. Ultimately, a dedupe product is going to have to work across the entire spectrum of data storage - hot primary data to archive to backup and beyond.

So while dedupe for backups is where it began, next up was dedupe for primary storage. As we at Ocarina quickly discovered, online storage is quite a bit different than backups. None of the successful backup dedupe vendors - Data Domain, Diligent, Quantum and others - has been able to make a mark in dedupe for primary storage. Rather, NetApp -with its dedupe in the WAFL file system - and Ocarina have emerged as the two vendors who really have a data reduction and dedupe strategy that works for primary storage.

There’s a third, rapidly-growing segment of storage that is explicitly archive storage. While the lines between nearline, archive, and backup can sometimes be blurry, true archives tend to have certain characteristics.  They are mostly object stores, they have object rather than file system interfaces (get/put/post rather than read/write), they have WORM and persistence guarantees, and they have features that allow you to manage compliance with various regulations. Examples of true archive storage are HDS’ HCAP, EMC’s Centera (and Atmos, for that matter), Caringo, and several of the cloud storage offerings like Amazon S3 or Iron Mountain Digital.

Archive storage is a prime candidate is some ways for deduplication and compression.  The nature of the storage is that things are put in for long term storage, the archives grow and grow over time – so keeping costs down and making room for more data is important – and the access patterns tend to be WORN (not WORM) - that is, “write once, read (almost) never.”

But object stores in general, and archives in particular, present some interesting issues for dedupe technology, and I think that just taking backup dedupe or primary dedupe and applying it against object stores is not going to work. First of all, archives often offer guarantees of immutability. That means, if I put something in an archive, I am guaranteed that it will remain unchanged. What does that really mean? If an archive is going to last for 100 years, or even 30 years, you can be assured that the hardware it is deployed on will change 5 or more times during the life of the object archived.

So if I move an object from an archive from Vendor A with block size of 4K to a new archive from Vendor B with a block size of 8K, has that object changed? Or is how I store it separate from the guarantee that the contents of the object have not changed? It’s a key question, because it is at the heart of whether it is okay to compress and/or dedupe objects in an archive.  If I put an object in an object store and take a 512 bit checksum of that object’s contents, then I compress it, is that OK as long as when I decompress it, the 512 bit checksum tells me that the file is bit-for-bit identical to when it was put in?

What about dedupe? In dedupe, the whole idea is that I store redundant information only once.  In an archive, a user may be checking in a memo from the CEO that has to be kept for Sarbox compliance – but it turns out that a graphic in the memo was already stored from a PowerPoint.  So it could be deduped. That would save space. To ensure that I can always get back the CEO’s memo, though, any dedupe solution must keep reference counts on how many objects are referencing a given object.

A reference count is a dynamic value –it can and must change every time a reference to an object is either added or decremented.  But if an object is immuatable, am I allowed to change a field on it, such a metadata value for ref count? Is it okay to mix dedupe domains across the Sarbox archive and the email archive or the medical images archive?  What does XAM mean for dedupe?

Some of these are not necessarily technical problems; they are legal ones. But the architecture of how you do dedupe for object stores, how dedupe works in a get/put environment with no standard file system, and how portability of objects across many years of hardware refresh would still work correctly if the object store has been compressed and deduped all point to some fundamentally different things that are going to have to be done for a vendor to have a true “Dedupe for Archive” or “Dedupe for Object Store” solution. It’s the third leg of the stool, along with dedupe for primary, and dedupe for backups. The holy grail, which we’ll post about soon, is end-to-end dedupe – how you keep file data in its most space-optimal form throughout its lifecycle. But you can’t get to the holy grail until you have viable appropriate solutions for each of the main types and tiers of storage.

Storage News and Views - June 17

Posted by Sunshine On June - 17 - 2009

Summer is almost here, and despite rumors of a recession, the malls are filling up with shoppers seeking bathing suits, sunblock, iPhones, and other de rigueur gear of the season. Here in storage land, the latest industry news continues to amaze, amuse and baffle.

Here are a few headlines that caught our eyes:

Data Domain Board Rejects EMC Takeover Offer - Computerword, Lucas Mearian

HDS Expands Thin Provisioning - Search Storage, Beth Pariseau

VCs and IT Execs Discuss IT’s Brave New World in Boston - Storage Soup, Beth Pariseau

VMWare and HP Announce Co-developed Plan - Byte and Switch, Mike Fratto

And, for a little summertime diversion, one 3Par storage rapper’s response to the strange and wondrous EMC-NetApp-Data Domain tale:

3P’s Open Rap to Data Domain Employees - Storagerap, Marc Farley

Blog Review - Storagebod

Posted by Sunshine On June - 16 - 2009

Note: this is the first in a series of posts on the blogs that make up the Online Storage Optimization blogroll. Please look out for future reviews of other storage bloggers.

Every once in awhile I find myself enjoying a blog so much that I end up reading several posts in one sitting. Such was the case today with Storagebod’s Blog. Who else, I thought, could integrate references to Winnie-the-Pooh with cloud storage while making subtle points about storage infrastructure costs? This must be a sign I’m becoming a fan.

Storagebod, whose real name is Martin Glassborow, is an independent storage blogger whose topics cover a wide swath of storage and tech-related topics. His bio states that he’s responsible for storage infrastructure for a large UK Media company, which he doesn’t name. He also says in posts that he utilizes both EMC and NetApp storage, which puts him in an interesting position vis a vis the two competitors.

I’ve gotten chatting with Martin on Twitter on several occasions (as have some other contributors to this blog), and one thing that stands out about him is that while he has strong opinions about storage products, they always seem to come from a customer perspective — that is, he’s not interested in slamming a vendor for its own sake. Rather, he takes a pragmatic approach that speaks to a larger mission of helping other storage and IT professionals who are also struggling to control costs, keep data safe, and so on.

So, even while mocking IBM’s latest cloud offerings with his Milne-inspired ditty, he gives it the benefit of the doubt, saying, “…I’ve been a bit unfair, it’s not just tin, it comes with a raft of management software as well…”

Another recent post about a recent Amazon AWS outage doesn’t slam the company for losing a data center, but instead argues for better planning for such an eventuality.

“When Amazon lose a data-centre in their cloud, this should not be news! It will happen, it may be a whole data centre, it may be a partial loss. This not a failure of the Cloud as a concept; it is not even a failure of the public Cloud…”

In short, this is a blogger I recommend for anyone who would like to read spirited, opinionated yet fair coverage of storage from the point of view of someone who knows your pain. And while he never seems to quite find the best way to alleviate it, the process he goes through should be enlightening to many, both within and outside the industry.

EMC-NetApp: The Blog-Off

Posted by Sunshine On June - 15 - 2009

Storage giants EMC and NetApp have been fierce competitors for some time now, and so it’s not unusual to see dueling blog posts between them. However, nothing beats the back and forth blog posts we’ve been seeing since the bidding war began to heat up between them for deduplication specialist Data Domain. And even in light of today’s news that the Data Domain board is rejecting EMC’s offer in favor of NetApp, the dance might still continue, at least according to eWeek’s Chris Preimesberger.

While some of the bloggers are throwing zingers at each other over which is the better acquirer/place to work, the net effect, I think, has been a very interesting discussion around deduplication–which is fast becoming recognized as the most significant technology in these “do more with less” times.

Here are some of the most notable entries, first from EMC:

Storagezilla - Data Domain Plus Plus

In this post, ‘Zilla argues for the Data Domain acqui by laying out a potential scenario in which its deduplication technology becomes the “cornerstone” of a complete backup and recovery division for EMC.

Chuck’s Blog - Why do I work for EMC?

In response to a letter to Data Domain employees by Joe Tucci, CEO of EMC, Chuck Hollis writes quite eloquently about what makes EMC a great place to work.

The Backup Blog - Putting the Pieces Together: Deduplication Technologies

In this blog, Scott Waterhouse gets into the question of what is needed to get beyond tape backups. Or, as he succinctly puts it, “backup sucks.” (For our take on backups, please read Carter George’s post “Backup to the Future.”)

NetApp, meanwhile, came out swinging with these posts:

Jay’s Blog - Deduplicating Customer Choice

In this, he lays out an argument as to why the merger will make NetApp a stronger player, with an across the board deduplication solution for both nearline and backups. He also argues that EMC would have the overwhelming market share in deduplication for both VTLs and backup appliances if it won Data Domain. An interesting read overall.

Exposed Blog - It’s Always Calmest Before the Storm

In this, NetApp blogger Val Bercovici goes after EMC on the question of which is the better place to work–complete with a comparison table.

Extensible NetApp Blog - Ex-Chain Smokers

In this post, NetApper Kostadis Roussos posts a video that has also been making the rounds on Twitter from an all-hands NetApp meeting immediately following the initial announement that NetApp would be acquiring Data Domain. In it, Frank Slootman, CEO of Data Domain talks about why the acqui makes sense to him. He also makes some disaparaging remarks about EMC.