Content feed Comments Feed

Online Storage Optimization

Exploring Next Generation Storage Solutions

Archive for the ‘capacity optimization’ Category

Make the right call

Posted by Sunshine On March - 10 - 2010

Four out of five college students agree, this is not the way to deal with data growth. How about this instead?

stuffed-phonebooth


Fast and Effective Dedupe

Posted by Ocarina On March - 3 - 2010

I’ve noticed a few blog posts recently about speed of deduplication in the modern data center. I agree that speed is an important factor, but keep in mind that not all dedupe is created equal. That is to say, fast is good, but only if you are also effective. One of the tricky things has been that the easiest data to compress is also usually the most carefully performance tuned. A great example of this is a database. This is because databases are comprised of simple alphanumeric fields and sparse tables. All of that is easy to reduce in size.

However, a company’s core transactional database is the most conservative asset in the data center. Introducing compression would save space, for sure, but you could only use very fast, simple compressors there. At the same time, customers will be hesitant to deploy a new layer of processing in their most sensitive application.

So, where is most data growth? In fact, it’s being driven by unstructured data – Office documents, rich media, email with attachments, PDFs, Flash videos, and so forth. This complex data does not lend itself to fast simple compressors. But perhaps we should back up for a moment and think about how customers have been behaving all along.

Throughout the history of storage, there have always been tradeoffs available between fast expensive storage, and slower but cheaper alternatives. This is not a bad thing. It gives users alternatives based on their priorities and budgets. Back in the old mainframe days, these choices were between very expensive mainframe memory and “offline” storage like drums, cards, and tapes. Today the technology is all much bigger, faster, cheaper and sexier. But really, the tradeoffs are the same.

Data reduction technology adds another layer of choice above and beyond the traditional hardware choices. Now in addition to choosing whether you want fast, expensive solid state disk (SSD) or slower but very cost-effective SATA, you can also choose whether you want to compress and/or deduplicate the data that is stored on those disks.

Just like physical disks, compression and dedupe come in a range of speeds and capabilities.
There are simple and very fast compressors that are essentially invisible in terms of their impact on storage performance. There are more complex compressors that get better results, but which may take longer, either to compress or to decompress the data. Deduplication, done well, should always be pretty fast, and streaming dedupe rates of well of 300MB/sec are now available from many vendors (including Data Domain and Ocarina).

The emergence of tools to automatically tier data to its appropriate place help make the use of all of these technologies more feasible. That applies as much to solid state disks as it does to dedupe and compression. When data tiering can be made invisible to end-users and applications, then implementing multiple physical and logical tiers of storage becomes practical.  Good examples would include EMC’s new FAST tools, Compellent’s “Fluid Data Storage”, and HDS’s Data Migrator. When users or administrators have to move data by hand to get it to a compressed tier or a solid state disk, then the operational costs offset the capital savings.

You might want to be wary when someone’s biggest claim to fame is fast dedupe. Just as the old mainframe admin had to decide whether something was important enough to live in RAM, or could be stored on cheaper tapes instead, today’s IT shops have to decide where it is most important to try to get data reduction, and what tool will get the most bang for the buck for that kind of data. You need the whole story, and then you can decide based on your own priorities.

Tagged Gets Shrunk

Posted by Sunshine On January - 29 - 2010

tag

Interesting story from the vault of the Ocarina case study library. Social network Tagged is the third largest social network in the U.S. It has seen traffic increase 10x over the past two years. With its focus on making new friends rather than simply getting to know existing ones, it has carved out a successful niche and is building an international subscriber base of over 80 million members.

The cost of this success? Data growth. Tagged’s storage infrastructure has been doubling every single year. With 1 million new photos uploaded every single day, Tagged needed a way to expand capacity and fast.

Compression with Ocarina meant about 10 TB of additional free space, which in turn meant they could put off buying new NAS equipment by several months. The lower average image size also meant reduced bandwidth and 15%-20% reduced monthly content delivery network (CDN) costs.

The company chose to go with Ocarina’s newest specialized image reduction technique, native format optimization (NFO). This is visually lossless compression of images that nevertheless delivers significant space savings–a technology that’s perfectly suited to the social networking environment.

The other crucial benefit to reducing image size was improvements in site responsiveness. “We’re sure that using Ocarina to reduce image sizes has helped improve our page rendering times,” said company CTO Johann Schleier -Smith. “That’s a big deal because it creates a better user experience, which means improved customer loyalty and higher market share.”

Read the entire case study by clicking here. Or visit the Ocarina resources page and click on the Case Studies tab, where you’ll find several others.

Dedupe - The Big News in 2009

Posted by Sunshine On December - 7 - 2009

niketigerswoosh

It’s been a tough year — a worldwide recession, a sluggish housing market, rising unemployment … and on top of all that, the tarnished image of one of sports’ most squeaky clean players. Well, actually, there have been some bright spots. As DCIG blogger and storage analyst Jerome Wendt notes while looking back at the past year, “Deduplication is the Big Success Story of 2009.”

Wendt writes: “Deduplication is arguably one of the most notable trends of 2009 as it has been widely adopted by users after bursting onto the scene just a few years ago and has grown to be included in both software and hardware products.”

Wendt focuses on dedupe for backups, where there has been much publicized activity over the past year. The big storage story of 2009 was of course the battle between storage titans EMC and NetApp over backup dedupe specialist Data Domain. He cites an industry survey from SearchDataBackup that indicates that 41% of enterprises either are or are seriously considering dedupe to control data growth and costs. He also notes that the despite the predicted demise of Quantum, that dedupe company remains strong.

Dedupe for backups is one part of the cost reduction puzzle. Another part is to reduce data at the source, in primary storage. This is of course the specialty of this blog’s parent Ocarina, which implements a unique combination of content-aware dedupe and compression to achieve startling results. It focuses on the very types of unstructured data that are driving storage growth today–emails, images, documents, and so on. The company has been partnering with almost every leading storage provider, including HP, EMC, HDS, BlueArc, and Isilon. Another  leader in this space is NetApp, which has a strong dedupe for primary offering that has also garnered a great deal of attention.

Here’s the thing, the economy might be slowing down, but data growth continues apace. This is one reason that the storage industry has been thriving this year. But rather than standing still, what is spells is a concerted effort to keep that data under control. As Wendt notes, another of the year’s big trends is cloud storage, which offers companies more flexibility for storing some percentage of their data. I would also add that virtualization has taken a huge leap forward, not only in terms of the technology itself, but also in terms of adoption over the past year. Yet another way to attack the problem.

So if 2009 was all about dedupe for backups, I’m going to guess that 2010 will be very much about data reduction at all points on the data life cycle. What do you predict?

Image: Gizmodo

Data Deluge - Are you Prepared?

Posted by Sunshine On December - 4 - 2009

stuffed-phonebooth

Dell’s Inside Enterprise IT blog has identified “10 Trends to watch carefully.” The post is from the Gartner Data Center conference that wraps up today in Las Vegas. One of the biggest and most important trends? The coming “data deluge” that will pile onto company IT departments like a load of bricks.

Over the next five years, enterprise data growth will increase by a whopping 650%. And here’s the kicker: 80% of this data will be unstructured. That means emails, documents, photos, and all the other files not in databases. The answer, according to the experts: attack with virtualization and deduplication.

Might we also suggest that a combination of content-aware deduplication and compression would yield even better results? The modern enterprise is dealing with all manner of data. These are the types of files that often stymie traditional block-level dedupe. What can it do with images, video, audio–not to mention compound documents such as PDFs and Zip files? Often, very little.

As we showed at a recent event, Tech Field Day, Ocarina reduced the toughest data sets by an average of about 30%. (Final results will be out soon.) In fact, the only ones that stymied the system were those that were deliberately encrypted using unusual or outdated methods–not a typical use case to say the least!

What this means is that over the next few years, the flow of data will not only increase, but it will become far more complex to handle. If you think about the speed of innovation, there’s a strong chance that there will be files we aren’t even aware exist yet. What do you think? How is your enterprise handling unstructured data now? What will it do differently in the next five years? Comments encouraged!

Tech Field Day - Video

Posted by Sunshine On November - 16 - 2009

Tech Field Day may be over, but it lives on in digital form–scattered like so many tiny shreds of confetti across the interwebs. One of the delegates at the event, Rod Haywood, put together this video on his Musings of Rodos blog about Day 2 of the event, featuring interviews with Ocarina’s own Goutham Rao, plus Peter Pistek of Nirvanix, W. Curtis Preston of Truth in IT, and Jim Sherhart of Data Robotics.  Rod was kind enough to allow me to repost it, and so here it is for your viewing pleasure:

Gestalt IT Field Day 2 from Rodney Haywood on Vimeo.

Bring Out Your Data - The Deets

Posted by Sunshine On November - 4 - 2009

Lots of speculation this past week in the storage tweet-o-blog-0sphere around our “Bring Out Your Data” Challenge for Tech Field Day. We can’t wait to see what these smart and savvy participants bring us, and we’re confident about the results. There will be prizes awarded for those who stymie us and those who get the greatest reduction. This morning, we sent out a brief email giving a few more details about it. In the spirit of transparency, here is what we sent to the attendees:

Dear Tech Field Day attendee,

Ocarina Networks has issued a challenge to you for Tech Field Day: bring out your data. In brief, we’re asking you to arrive on November 13 at our offices with a thumb drive containing your toughest data set. We will compress and dedupe that data for you right in front of your eyes. This will be a chance for you to see the Ocarina ECOsystem in action so that you can assess data reduction and performance for yourself in real time.

Here are a few guidelines.

1. Try to keep it under 2 GB. This is to ensure that as many participants as possible have an opportunity to shrink their data during the four-hour time period you will be at the Ocarina offices.

2. If you would like to see both deduplication and compression, we recommend that you bring data that includes duplicates. In other words, one 2GB file is not going to be deduplicatable, but several different files that have shared objects will show much more interesting results. If you’re only interested in seeing our compression capabilities, then this isn’t necessary, but please keep in mind that the results you get in that case won’t reflect the deduplication feature.

3. Give us a mix of files from your local hard drive.

4. Label your stick. Put your name somewhere on the physical thumb drive. Also, give the directory your own first and last name.

A final note: we will return your flash drive to you at the end of the day, but please don’t bring us a sole copy of an important piece of data, as we may return it to you with the data in a compressed format.

Thanks for you participation in Tech Field Day, and we look forward to meeting you next week!

Best wishes,

The Ocarina Social Media Team

Carter George, Mike Davis, Sunshine Mugrabi, and Helen Miller-Montana

Going Native CIFS

Posted by Ocarina On November - 2 - 2009

A recent comment on this blog got me thinking, and this post is the result. The commenter, who identified him or herself only as “Sto Rage” asked: “When can we expect native CIFS support on the Ocarina platforms? The current implementation is outright clunky. So until you have a working CIFS implementation, I don’t think you can compete with NetApp. You may get better compression results, but it works only for NFS data.”

It’s a good point to raise–although I disagree with the “clunky” characterization. But as to the CIFS issue, I wish the answer was as simple as “it’s in the next release,” but this is actually one of the more complex and interesting topics in storage. So hold on to your hats, I’m going to go through Ocarina and CIFS in some detail.

Here’s the short answer: We give you native CIFS support on EMC, BlueArc, HDS, and HP.
Several more NAS vendors will be putting “Ocarina Inside” soon. We give you native CIFS support if you can use our Native Format Optimization. For those customers who use our appliance as a CIFS proxy, we provide good but not perfect CIFS support today, with a roadmap of continual improvement, including the possibility of a native CIFS stack inside the appliance in the first half of next year.

Here’s the longer and more detailed answer.

Ocarina can be deployed in one of three ways:

“Ocarina Inside”: Ocarina is embedded inside or alongside a NAS vendor’s solution.
Ocarina Appliance: A split-band appliance
Ocarina Native Format Optimization (NFO): files are optimized in their native format

Each one of these deployment options has different implications for the CIFS client.

In the “Ocarina Inside” case, the NAS vendor handles all the protocol stacks, and the client gets the full, rich native CIFS implementation of each vendor. Ocarina only uses dedupe or compress for the data stream.  We are not involved in the protocol traffic at all.  Examples of “Ocarina Inside” are EMC Celerra, HP Enterprise NAS, BlueArc, and HDS HNAS.  Additional “Ocarina Inside” partners will be announced soon. This is the best form of integration, because it makes deduplication and compression completely transparent to users and applications, and lets each storage vendor deliver all their full value-add, including in the CIFS protocol stack.

In the Ocarina Appliance case, Ocarina’s optimization happens out of the customer data path, but in order to expand files to their original state upon user access, the Ocarina intercepts read requests in-band. If an I/O (over CIFS or NFS) is to an Ocarina-optimized file, we step in, rehydrate the file, and pass it on to the user. This involves being a proxy for NFS and CIFS (and other protocols including WebDAV and http).   It’s fairly easy to be a proxy for NFS and http, but CIFS is more challenging. Ocarina has done a lot of infrastructure work to ensure that we preserve all of the Windows file attributes necessary for good CIFS integration – ACL’s, Extended Attributes, Alternate Data Streams, Windows share modes and oplocks, etc.

However, we have not written our own CIFS protocol, so our Windows semantic completeness is only as good as the protocol implementation that we sit on. On the appliance, today, that is Samba. Samba has improved a great deal over the last few years, but it is still not a “native” implementation of CIFS. While many storage vendors use variants of Samba for their CIFS stack, it is admittedly not as rich as, say, CIFS on Windows (the only true native CIFS) or CIFS on NetApp.

Ocarina has multiple customers who have implemented Ocarina using both NFS and CIFS on our appliance, and while there may be corner cases where it’s just not as good as the richest CIFS implementations, it’s not “outright clunky” either. There is room for improvement, though, and this is an area of primary focus for our next set of releases. It’s probably a topic for an entirely separate post, but there is a lot going on in the CIFS world these days, and we see some pretty exciting opportunities emerging in this space.

The third case is “Native Format Optimization.” This is a special use of Ocarina where we take certain rich media file types – photos, images and video – and compress them in a special way. What we do is compress them, but have the output be a new, smaller file but in the same native format it started out in. We’ll take a JPEG photo, compress it, and produce as output another perfectly formed JPEG photo….just smaller. The same is true for example for Flash videos. Now in this case, there is no need for a decompressor or for Ocarina to be in the read path or on the protocol at all. We can read files from your NetApp, shrink them, write them back on to your NetApp and Ocarina need not be involved at all when users or applications go to access those files.

In fact, we have a major Fortune 100 company who uses our technology on a large farm of NetApp filers in just this way. In this case, users access the files over all the native protocols that the NetApp supports, including NFS, “native” CIFS, and dual protocol support (NFS and CIFS at the same time). NFO only applies to certain file types, and so it is not the right fit for every data set. However, it is worth pointing out that one of the complaints you see about other deduplication and compression solutions for primary storage is that you save space at the cost of slowing performance down. With NFO, since there is NO decompression, just a smaller file in its original native format, performance is actually and always better.  There are simply fewer bytes to read off disk, fewer bytes to move over a network and no extra hop or decompression step to go through.  It’s a fantastic solution for customers with lots of image, photo, or video data, and it works with all native CIFS implementations.

So there you have it. CIFS support in more detail than you probably ever dreamed or imagined. We look forward to your further comments.

Dedupe Misconceptions

Posted by Ocarina On October - 20 - 2009

As most in the industry are aware, dedupe has becoming a standard offering from every major vendor. Dedupe for primary has become the technology of the moment, and for good reason–the rising tide of unstructured data is forcing data centers worldwide to rethink capacity planning, tiering, and storage efficiency. But there are still a few lone voices out there who are clinging to the notion that dedupe is unnecessary.

Take for example this recent post from Compellent’s Bruce Kornfeld,Is dedupe the only answer?” Kornfeld is responding to a recent SearchStorage article “Is Data Duplication Right For Your Primary Storage?

Dedupe and compression can both be applied directly to primary data, and the savings there can be comparable to what’s seen in backup. On backup data, vendors claim 20x data reduction, and on primary data we think that most customers will see about 5x.

So, you say, “That means that you get four times more space savings on backup, right?” Wrong! Actually, 20x means a savings of 95% against the size of the original data set. Actually, 5X means a savings of 80%. There’s only a 15% difference - and an 80% space reduction is a huge win for the primary storage user. Of course, vendors who do not have a dedupe solution are likely to tell you you don’t need it anyway. There are some valid concerns about dedupe for primary, but there are also some misperceptions, and there’s no reason to let misinformation be propagated.

The biggest difference between dedupe for backup and dedupe for primary is that in backup, you dedupe all of the data. There’s no reason not to. In primary data, you might not want to dedupe everything - there are some data sets it does not make sense for. That’s not a knock on dedupe for primary. It just means you should choose which data sets make sense to dedupe.

The first common misperception about dedupe for primary data is that performance will be worse. But this is really not the case. When primary data has been deduped (but not compressed), an application asks the storage for a block, and that block is retrieved. There is one lookup to map the logical block request to the physical one - but those kinds of lookups are already being done in every storage array that has any kind of storage virtualization, such as thin provisioning. The response time on a block read for deduped data is hardly different than for un-deduped data, and this is true for all primary dedupe solutions - including both NetApp and Ocarina. There’s no more overhead to retrieving a deduped block than there would be in any other block read I/O on any intelligent array –and Compellent, being a leader in arrays with lots of smarts, is well aware of this. The fact that another file may also be sharing that block has zero impact on the time it takes to read it.

It’s true that for sets of blocks that are changing all the time, you won’t get as much benefit from dedupe. That’s not because the performance will be bad. It is because when you change a block, it’s no longer a dupe. Therefore it has to be stored again as a new block. If you read a deduped block, modify it, and write it back out, it would have been a write in an un-deduped case anyway, so performance, again, is even-steven between deduped and non-deduped volumes. Everyone doing dedupe for primary - NetApp and Ocarina - does the deduplication as a post-process, so there’s no impact at all to write performance. No one is trying to dedupe that block as it is being written.

What is different, though, is that In a high rate-of-change application like a transactional database, you won’t see as much space savings with dedupe. That’s because if most of your blocks are either new or have just been changed, they won’t be dupes. Here’s misperception number 2: while there are some applications in primary storage where dedupe does not apply (the hot tablespaces in Oracle or SQL Server, for example) , what you’ll find is that most data is a good candidate for dedupe on primary and nearline storage. In fact, much more data is stored in files that are good candidates for dedupe than not. All of the typical file/print files are great candidates for dedupe, but the misperception is that applications like Exchange and virtual machines shouldn’t be deduped. As it turns out, both are great candidates for dedupe (and compression, for that matter). Let’s take a look at VM’s.

In a virtual machine environment, a storage array may be storing thousands of VMDK’s, the VMware files that store a given virtual machine. Inside each VMDK file is a complete virtual machine image, including the operating system, application files and user data. If you have 1,000 VMDK’s that holds virtual Windows machine, you’ll have tens of thousands of “files” inside that VMDK file, including a copy of Microsoft Windows, the application you are running the in the virtual machine, and often the data for that application as well. How much of the Windows operating system do you suppose is duplicated across the 1,000 VMDK’s in this example? Well, almost all of it. What’s more, the thousands of files that make up Windows do not change - are not changeable, in fact, unless you do an OS upgrade.

Large parts of the VMDK file are duplicate with others, and they stay the same, day after day. Perfect candidates for dedupe. Sure, the user data in a VMDK may change, but any competent dedupe solution is not deduplicating whole files - the dedupe solution is deduplicating something at sub-file granularity: blocks, objects, chunks, etc. NetApp dedupes 4K WAFL file system blocks. Ocarina dedupes sub-file objects. The point is, regardless of which approach you take, if most of a VMDK file stays the same, and some part changes, dedupe will work great. The parts of the VMDK file that are changing won’t be deduped, and the vast majority of the file - the OS and application binaries - will be deduped. The space savings on your storage is great, and the performance impact minimal.

In important ways, dedupe for primary storage is the perfect complement to thin provisioning. In thin provisioning, a storage solution virtualizes (i.e., lies about) the amount disk space unused. With dedupe, the same storage solution can virtualize (ie, lie about) how much space is used. The two together provide the maximum storage efficiency.

Ocarina-EMC vs. NetApp: No Contest

Posted by Ocarina On October - 6 - 2009

newsboy11

Great news–Ocarina has announced that its solution for EMC Celerra is available immediately, offering its advanced, content-aware dedupe and compression to EMC NAS customers. While Ocarina already had a solution for EMC Celerra, this announcement means that has been admitted to the EMC Velocity Technology and ISV Program. A major step forward.

EMC has other dedupe options available for Celerra, but Ocarina gives them a distinct competitive edge against NetApp and NetApp dedupe. This EMC-specific release is also one of the more elegant implementations of Ocarina.  We use EMC’s mature FileMover interface to be able to insert Ocarina completely transparently on the Celerra. Users access their files on the Celerra through all supported protocols – including both CIFS and NFS. Like our BlueArc and HDS releases, we are called from within the file system, rather than intercepting calls on their way in to it.

Ocarina optimizes files out of band, and is only called on reads and writes when an optimized file is accessed. This means we are not in the path at all for accesses to hot files. EMC has a rigorous set of tests you have to pass to work with FileMover, and getting through those tests was a good validation exercise for Ocarina. I think EMC customers can feel very comfortable about how solid this solution is.

One technical feature worth noting is that using Ocarina to optimize a Celerra volume means that it is possible to greatly increase the logical size of that volume. Like NetApp, the current release of EMC Celerra has a 16TB volume size limit. FileMover – as the name implies – lets you move the contents of a file somewhere other than the original volume. The FileMover stub that is left behind makes it appear to applications and users that the file is still in the original volume. The way Ocarina works, we read the file from the original volume, optimize it, and store it in another volume on the same Celerra.  We’re not really moving it off the filer at all, but we are using FileMover to allow you to spread files out across multiple volumes on your Celerra.

A FileMover stub is left behind in the original volume. The stub does not take much physical space. So if you had a 16TB volume called volume A, and Ocarina started optimizing files and storing them in volumes B, C, and D, you could keep creating new files in volume A using the free space we just created. As you create those new files in volume A, we could keep optimizing and moving them. We can also simply move them, without optimizing them. This is completely policy-based.

The net effect, over time, is that a user could mount a single share, Volume A, and have direct access to much much more storage than 16TB.
Let’s say we get 75% optimization on average across a set of files. That means you could store 64TB in one volume. With FileMover and the example above where we are using 3 volumes as targets, you could store 192TB in Volume A (though the file contents would actually be distributed across Volumes B, C, and D). This works extremely well for all typical NAS file data. The Celerra also supports unified storage, where Celerra volumes are used for iSCSI and for databases. And while Ocarina is not targeting our solution for those use cases yet, please do stay tuned.

And here’s a quick note that may interest those of you who are already Ocarina EMC shops–or who have a customer or client that is. If your company or client has benefited from the groundbreaking Ocarina solution, we are initiating a special program that may be of great interest to you. For more information about this new opportunity for a data reduction package at an exceptional price, contact: info@ocarinanetworks.com.