Content feed Comments Feed

Online Storage Optimization

Exploring Next Generation Storage Solutions

Make the right call

Posted by Sunshine On March - 10 - 2010

Four out of five college students agree, this is not the way to deal with data growth. How about this instead?

stuffed-phonebooth


Fast and Effective Dedupe

Posted by Ocarina On March - 3 - 2010

I’ve noticed a few blog posts recently about speed of deduplication in the modern data center. I agree that speed is an important factor, but keep in mind that not all dedupe is created equal. That is to say, fast is good, but only if you are also effective. One of the tricky things has been that the easiest data to compress is also usually the most carefully performance tuned. A great example of this is a database. This is because databases are comprised of simple alphanumeric fields and sparse tables. All of that is easy to reduce in size.

However, a company’s core transactional database is the most conservative asset in the data center. Introducing compression would save space, for sure, but you could only use very fast, simple compressors there. At the same time, customers will be hesitant to deploy a new layer of processing in their most sensitive application.

So, where is most data growth? In fact, it’s being driven by unstructured data – Office documents, rich media, email with attachments, PDFs, Flash videos, and so forth. This complex data does not lend itself to fast simple compressors. But perhaps we should back up for a moment and think about how customers have been behaving all along.

Throughout the history of storage, there have always been tradeoffs available between fast expensive storage, and slower but cheaper alternatives. This is not a bad thing. It gives users alternatives based on their priorities and budgets. Back in the old mainframe days, these choices were between very expensive mainframe memory and “offline” storage like drums, cards, and tapes. Today the technology is all much bigger, faster, cheaper and sexier. But really, the tradeoffs are the same.

Data reduction technology adds another layer of choice above and beyond the traditional hardware choices. Now in addition to choosing whether you want fast, expensive solid state disk (SSD) or slower but very cost-effective SATA, you can also choose whether you want to compress and/or deduplicate the data that is stored on those disks.

Just like physical disks, compression and dedupe come in a range of speeds and capabilities.
There are simple and very fast compressors that are essentially invisible in terms of their impact on storage performance. There are more complex compressors that get better results, but which may take longer, either to compress or to decompress the data. Deduplication, done well, should always be pretty fast, and streaming dedupe rates of well of 300MB/sec are now available from many vendors (including Data Domain and Ocarina).

The emergence of tools to automatically tier data to its appropriate place help make the use of all of these technologies more feasible. That applies as much to solid state disks as it does to dedupe and compression. When data tiering can be made invisible to end-users and applications, then implementing multiple physical and logical tiers of storage becomes practical.  Good examples would include EMC’s new FAST tools, Compellent’s “Fluid Data Storage”, and HDS’s Data Migrator. When users or administrators have to move data by hand to get it to a compressed tier or a solid state disk, then the operational costs offset the capital savings.

You might want to be wary when someone’s biggest claim to fame is fast dedupe. Just as the old mainframe admin had to decide whether something was important enough to live in RAM, or could be stored on cheaper tapes instead, today’s IT shops have to decide where it is most important to try to get data reduction, and what tool will get the most bang for the buck for that kind of data. You need the whole story, and then you can decide based on your own priorities.

News from the Holodeck

Posted by Sunshine On February - 16 - 2010

what_happens_in_the_holodeckAs regular readers of this blog know, we’re obsessed with out there tech. Anything that smacks of Star Trekkian futurism gets our blood pumping. This week, Deep Storage’s Howard Marks reports on something we’ve been watching for some time: holographic storage.

The news is sad. The company that was developing it, InPhase, is out of business. Their web site is still up, but according to the article, the company, a Bell Labs spin-off, was shuttered in early February and the Colorado Dept. of Revenue is now seizing its assets. As he points out, for now, technologies like deduplication make it hard to justify spending $10K on holographic drive.

Despite this terrible setback I for one don’t want to believe this idea will die out entirely. It promises a new generation in storage at a time when data growth is spiraling out of control, threatening to overtake data centers worldwide. And who says we can’t add compression and deduplication on top of that? Howard and I both predict that sooner or later someone else will follow the holographic storage clarion call. As he so succinctly put it: “It’s just so cool.”

Image from: Geek Stuff

Tagged Gets Shrunk

Posted by Sunshine On January - 29 - 2010

tag

Interesting story from the vault of the Ocarina case study library. Social network Tagged is the third largest social network in the U.S. It has seen traffic increase 10x over the past two years. With its focus on making new friends rather than simply getting to know existing ones, it has carved out a successful niche and is building an international subscriber base of over 80 million members.

The cost of this success? Data growth. Tagged’s storage infrastructure has been doubling every single year. With 1 million new photos uploaded every single day, Tagged needed a way to expand capacity and fast.

Compression with Ocarina meant about 10 TB of additional free space, which in turn meant they could put off buying new NAS equipment by several months. The lower average image size also meant reduced bandwidth and 15%-20% reduced monthly content delivery network (CDN) costs.

The company chose to go with Ocarina’s newest specialized image reduction technique, native format optimization (NFO). This is visually lossless compression of images that nevertheless delivers significant space savings–a technology that’s perfectly suited to the social networking environment.

The other crucial benefit to reducing image size was improvements in site responsiveness. “We’re sure that using Ocarina to reduce image sizes has helped improve our page rendering times,” said company CTO Johann Schleier -Smith. “That’s a big deal because it creates a better user experience, which means improved customer loyalty and higher market share.”

Read the entire case study by clicking here. Or visit the Ocarina resources page and click on the Case Studies tab, where you’ll find several others.

Databases - Compression Targets?

Posted by Ocarina On January - 16 - 2010

The headline of this post poses a question that was raised in a recent comments discussion between Dave Vellante of Wikibon and myself on this blog. Dave wanted to know if there are use cases in which generic compression might still be useful. As I wrote in my post, most of the storage industry still relies on generic, or LZ compression. This is a shame, because it’s severely limited compared to possibilities inherent in more advanced, file type specific compression algorithms such as we at Ocarina use. My main point was that the more advanced, file type specific compression algorithms can be applied to the bulk of the files one finds in the modern data center–MS Office, Zip, PDF, video, images, and so on.

However, Dave was interested in hearing whether there are use cases in which generic compression could be commercially viable. My response was that data sets that are made of entirely of text files, and databases are the two examples in which it really doesn’t matter what type of compression you use–the generic type will work fine because essentially all you have to do is reduce text and/or alphanumeric data. But, I added, databases aren’t likely to be a compression target because there is too much of a performance trade-off. Also, this is unlikely to be a good commercial target as databases are the most conservative part of the data center. Dave pressed his case. He wanted to know if perhaps there are times when compressing a database would make sense.

He wrote: “I agree with your comments on a production database but what % of an organization’s database storage would you consider the ‘family jewels’ vs. copies of the database for things like decision support/data warehousing, snapshots, and other copies/clones for recovery purposes? If I can compress those supporting copies down 50-80%…why not?”

My answer: it varies by organization, but sometimes a large percentage of database data is in star schema data warehouses. Those databases, unlike the transactional databases, tend to support frequent whole table scans. That is, instead of fast small writes (transactions) in to the middle of a table, they see very large reads of everything in a table. Databases tend to be very compressible, and if you can compress them and still support the I/O rates you need for performance, by all means do so!
Transactional database performance tends to be measured in TPS (transactions per second) and TPS in turn is largely bounded by the speed at which the database can do direct I/O writes of transaction logs to stable store. Putting compression or dedupe in that path is risky. I’m not saying it can’t be done, but people will want to be quite sure it doesn’t mess up years of performance tuning. With data warehouses, you may have hundreds of Terabytes of data in simple so-called star schema databases, and the kinds of queries run against these databases tend to go through and read every row in every table.

Consequently, performance is bound by the ability of disk systems to sustain sequential reads of very large data sets. In this case, as long as decompression can happen at the rate of physical disk reads, then I see no reason not to compress or dedupe those databases. As I mentioned earlier, data in databases is largely alphanumeric. That means that both compression and decompression on that kind of data can be very fast - it lends itself to coprocessors like HiFN, for example. If your architecture provides a place to insert something like that, or if you have CPU cycles free enough on your database servers, I think data warehouses can be good candidates for both compression and dedupe.

With all that said, the future of compression is in reducing unstructured data. Why? Because this is where the greatest data growth is occurring. In order to address this problem, we’ll have to start looking at far more advanced algorithms than those that did the trick in the past.

Storage Industry Lags Behind Advances in Compression

Posted by Ocarina On January - 13 - 2010

There’s a lot of talk about compression these days, but how much do we know about it? Well, for one thing, compression as a research area for mathematics has evolved much faster than most people realize. The thing is, most compressors used in computer products, including dedupe appliances, use generic algorithms rather than making use of these advances.

Most storage products use Lempel-Ziv (LZ) or derivatives, and try to use that single compressor to compress everything. These algorithms have been around forever, and for the most part, have not evolved much in the last ten years other than in the area of performance. This is too bad, because compression has advanced in exciting ways. LZ and its cousins work well on the kinds of data that were around 10 or 20 years ago - plain text, plain numbers, or combinations of those things. They do not work so well on a lot of modern data - images, video, Office documents, PDF’s, already-compressed files like Zip, encrypted data, etc. What’s important to understand is that all the most notable advances in compression that apply to storage have taken place not in generic compression algorithms, but in file type specific ones. File type specific compressors can, in fact, deal with all those modern data types.

Compression is all about pattern recognition and prediction. You look for patterns in a file and if you can find those patterns you try to predict their occurrence. If you can predict a pattern, you can compress it. So understanding the kinds of patterns that might show up in a file - video, a Zip file, music, and a PowerPoint are all very different - is the key to building a compressor for that file type.

What’s especially relevant is that the most important thing in compression of data today is recompression. Almost all of the file formats that are driving data growth, and taking up the most space on backups, are already compressed. Think of a file type that’s eating up space, and it’s likely to an already-compressed format: JPEG, video, Office, PDF, mp3, medical images … all compressed already.

A generic compressor won’t get any results at all on an already-compressed file. That’s because the first compression obscures the patterns that a compressor would look for. That’s why if you try to compress, say, a Zip file, if anything you’re likely to make it bigger. Recompression means first decompressing the file and then recompressing it with a better compressor. To do that, you have to recognize what kind of file it is, what kind of compression has been applied, and how to decompress it. By first decompressing it, you are able to see and process the patterns that make better prediction and compression possible.

Almost every market has a set of well-defined file types that make up the bulk of its unstructured data. In medical imaging, it’s Dicom (which in turns contains JPEG 2000, JPEG LS, and TIFF). In seismic, it’s seg-y. In satellite imaging, it’s NTF, MrSID, GeoTIFF and a few others. In the average business, it’s Office, PDF, photos and video.

In specific industries, you see very advanced compression implemented in the application layer, not in storage. Video is a great example - the whole concept of the video codec is all about compression. Whole companies exists specifically to do better video compression (On2 is a good example), but this compression is done primarily for transmission, and implemented as part of the video application workflow, not as a storage technology.

In a world that had all plain ASCII text data, generic compressors would be great. But that’s not the world we live in. For compression to have any meaningful impact on today’s data sets, you have to have file type aware recompression.

It’s a shame that most storage products today have not implemented the most exciting advances in modern compression mathematics. My company Ocarina is quite frankly one of the few exceptions. The compressors found in tape drives or in dedupe appliances represent the best of the evolution of the generic compressor. The thing to look for going forward is the emergence in storage products of the next generation set of file type aware compressors, which is where all the action has been over the last ten years.

The Year in Images

Posted by Sunshine On December - 30 - 2009

This past year, we at Online Storage Op gathered all manner of images to illustrate our posts. So as a way of looking back at 2009, here are some of the ones we liked the best–and the stories that went with them:

HolodeckHolodeck fun:

In February, Robin Harris at StorageMojo wrote about a potential breakthrough in storage technology that could change the landscape forever: quantum holographic storage. Online Storage Op was on the scene. It also gave us a chance to upload a pic of a Geordi La Forge doll. Admit it… this is one cool toy.

dna2-webSqueezing into your Genes:

This blog’s parent Ocarina had quite a year–inking partnerships with a number of major storage vendors and becoming a noted player in the hot dedupe space. It was also the year that genomics labs woke up to the need for better data reduction to deal with the coming onslaught of genetic data. In short, compression can be a matter of life and death. We reported on it here, and our readers got to relive their 10th grade biology class by looking at images like the one above.

marathon

Racing for Dedupe

As many pundits are now opining, dedupe really was one of the biggest stories of 2009, not least because of the high profile battle for Data Domain between storage titans EMC and NetApp. In the end, EMC nabbed the dedupe specialist for an eye-popping $2.1 billion.

boothbabeBooth Babe Mania:

We know our readers are sophisticated types who come here only to absorb information and opinion, and to better themselves for the benefit of all humankind. But for some odd reason we saw a major traffic spike the day we ran our post on the great Booth Babe Controversy. When we asked, everyone quickly told us, “I read the articles.” Mmmhmm!

VMworld a hit

And speaking of images that make storage folks drool, one of the most mesmerizing sights of the year was at VMworld, held in August in San Francisco. Participants descended the escalator to be greeted by gleaming rack of servers and storage–which we later learned was the result of a plan drawn on a napkin by the VMware GETO team. In any case, this year’s VMworld was a major event–and as we rightly noted, it foretold more economic activity in storage and virtualization.

nick_banner

Industry puts aside differences to try to save a life

This is one of the saddest stories of 2009, and one that demonstrates an activist and caring streak in the storage community. When word got out in May 2009 that EMC employee Nick Glasgow was in need of a bone marrow transplant, folks within the storage industry put aside competitive differences and pulled together to find him a match. Sadly, Nick passed away in October. The degree to which he inspired others will not be forgotten.

And, finally…

We never did have an egg and spoon race, but…
In November, Ocarina participated in the first ever Gestalt IT Tech Field Day, which brought independent bloggers from around the world to Silicon Valley for two days of tech deep dives. Our “bring out your data” challenge started tongues wagging well before the event began. Participants brought us their toughest data sets, and aside from those who used archaic encryption software to stump our algorithms, the results were impressive–an average of about 30% reduction on these tougher-than-tough data sets. Plus, the whole event was just a ton of fun. And it didn’t even require that we slog around the mud clapping coconut shells together.
bring-out-your-dead

Dedupe - The Big News in 2009

Posted by Sunshine On December - 7 - 2009

niketigerswoosh

It’s been a tough year — a worldwide recession, a sluggish housing market, rising unemployment … and on top of all that, the tarnished image of one of sports’ most squeaky clean players. Well, actually, there have been some bright spots. As DCIG blogger and storage analyst Jerome Wendt notes while looking back at the past year, “Deduplication is the Big Success Story of 2009.”

Wendt writes: “Deduplication is arguably one of the most notable trends of 2009 as it has been widely adopted by users after bursting onto the scene just a few years ago and has grown to be included in both software and hardware products.”

Wendt focuses on dedupe for backups, where there has been much publicized activity over the past year. The big storage story of 2009 was of course the battle between storage titans EMC and NetApp over backup dedupe specialist Data Domain. He cites an industry survey from SearchDataBackup that indicates that 41% of enterprises either are or are seriously considering dedupe to control data growth and costs. He also notes that the despite the predicted demise of Quantum, that dedupe company remains strong.

Dedupe for backups is one part of the cost reduction puzzle. Another part is to reduce data at the source, in primary storage. This is of course the specialty of this blog’s parent Ocarina, which implements a unique combination of content-aware dedupe and compression to achieve startling results. It focuses on the very types of unstructured data that are driving storage growth today–emails, images, documents, and so on. The company has been partnering with almost every leading storage provider, including HP, EMC, HDS, BlueArc, and Isilon. Another  leader in this space is NetApp, which has a strong dedupe for primary offering that has also garnered a great deal of attention.

Here’s the thing, the economy might be slowing down, but data growth continues apace. This is one reason that the storage industry has been thriving this year. But rather than standing still, what is spells is a concerted effort to keep that data under control. As Wendt notes, another of the year’s big trends is cloud storage, which offers companies more flexibility for storing some percentage of their data. I would also add that virtualization has taken a huge leap forward, not only in terms of the technology itself, but also in terms of adoption over the past year. Yet another way to attack the problem.

So if 2009 was all about dedupe for backups, I’m going to guess that 2010 will be very much about data reduction at all points on the data life cycle. What do you predict?

Image: Gizmodo

Dedupe Deep Dive - Video

Posted by Sunshine On November - 25 - 2009

Lots of special treats awaited the participants of Gestalt IT Tech Field Day. While visiting the offices on November 13 Goutham Rao, CTO of Ocarina Networks stood at the whiteboard and offered a deep dive into the technology behind the company. It was a big hit with the participants. For those who would like a peek under the covers to discover what content-aware dedupe and compression entail, this video is quite a find. Thanks to Simon Seagrave at TechHead for allowing us to repost this video, which he took during the event. We hope you enjoy it.

Ocarina Networks - De-duplication & Compression Deep Dive from Simon Seagrave on Vimeo.

For the entire library of Tech Field Day videos, go to this Vimeo page.

Bring Out Your Data - The Deets

Posted by Sunshine On November - 4 - 2009

Lots of speculation this past week in the storage tweet-o-blog-0sphere around our “Bring Out Your Data” Challenge for Tech Field Day. We can’t wait to see what these smart and savvy participants bring us, and we’re confident about the results. There will be prizes awarded for those who stymie us and those who get the greatest reduction. This morning, we sent out a brief email giving a few more details about it. In the spirit of transparency, here is what we sent to the attendees:

Dear Tech Field Day attendee,

Ocarina Networks has issued a challenge to you for Tech Field Day: bring out your data. In brief, we’re asking you to arrive on November 13 at our offices with a thumb drive containing your toughest data set. We will compress and dedupe that data for you right in front of your eyes. This will be a chance for you to see the Ocarina ECOsystem in action so that you can assess data reduction and performance for yourself in real time.

Here are a few guidelines.

1. Try to keep it under 2 GB. This is to ensure that as many participants as possible have an opportunity to shrink their data during the four-hour time period you will be at the Ocarina offices.

2. If you would like to see both deduplication and compression, we recommend that you bring data that includes duplicates. In other words, one 2GB file is not going to be deduplicatable, but several different files that have shared objects will show much more interesting results. If you’re only interested in seeing our compression capabilities, then this isn’t necessary, but please keep in mind that the results you get in that case won’t reflect the deduplication feature.

3. Give us a mix of files from your local hard drive.

4. Label your stick. Put your name somewhere on the physical thumb drive. Also, give the directory your own first and last name.

A final note: we will return your flash drive to you at the end of the day, but please don’t bring us a sole copy of an important piece of data, as we may return it to you with the data in a compressed format.

Thanks for you participation in Tech Field Day, and we look forward to meeting you next week!

Best wishes,

The Ocarina Social Media Team

Carter George, Mike Davis, Sunshine Mugrabi, and Helen Miller-Montana