Content feed Comments Feed

Online Storage Optimization

Exploring Next Generation Storage Solutions

Archive for the ‘Featured’ Category

I dream of data reduction

Posted by Sunshine On March - 29 - 2010

jeannie

Data is growing at a dizzying rate. We need only look at our home computers to get a sense of how easy it is to fill our hard drives to overflowing with all manner of flotsam and jetsam. From family photos to LOLcats to videos of our kids, we’re finding it difficult if not impossible to keep down the rising tide of files.

There is a cost to this, as many if not most enterprises are now recognizing. Recently, InfoWorld launched a special section, Data Explosion that guides companies through the myriad problems that arise from having too much data to handle. With headlines like: “The big data addiction,” the new section promises to address the issue with step-by-step guides, white papers, and other instructional pieces.

Infoworld blogger Matt Prigge delves into the topic in a post today, “The high cost of lazy storage.” He says that users need to take responsibility for keeping their data under control. Despite this admonishment, he admits that he himself is an “excellent example of the problem.” He saves all of his email, because he never knows what he might need later. Sound familiar? If someone whose blog is called “Information Overload” can’t get control of his personal data, it’s hard to imagine how anyone else can.

Prigge writes, “The bigger that data gets, the more effort required to put the genie back in the bottle.” He pushes the metaphor even further (and more gruesomely) by suggesting that at some point it’s easier to kill the genie and throw away the bottle. Now, that does strike us here at Online Sto Op as rather extreme. Why not simply put the genie back into that nice, compact bottle where she was living perfectly happily for so many years?

As we all know from 70s TV, those bottles were well-upholstered and downright comfortable living spaces for many a genie. And while it’s true that some genies (or Jeannies) would get so angry they’d stomp their feet when they were magically sent back there, they eventually settled back onto the purple pillows, kicked off their metallic platform heels, dug their toes into the shag carpeting and relaxed. Same goes for data reduction. A combination of approaches seems the most sensible answer. Data needs to be managed. There is something that is known as 100% compression–it’s called “deletion.” But short of that, there are ways to reduce data by as much as 90%. There are solutions for reducing the types of files that are driving the fastest storage growth, such as JPEGs, documents, videos, graphics, and other large files. An intelligent, content aware approaches that includes both deduplication and compression is what this blog’s parent Ocarina provides.

Storing Health Care

Posted by Sunshine On March - 22 - 2010

This Monday morning we all awoke to the news that U.S. House of Representatives passed H.R. 3590, a major health care reform package. Known as the “Patient Protection and Affordable Care Act,” it aims to ensure that more Americans will be covered by health insurance. It also makes it more difficult for insurance companies to deny coverage. Whatever your views on the debate around this legislation, one thing is clear: this reform shines a light on the many IT and storage challenges associated with the health care industries. On the most basic level, making changes and adding millions to the insured will mean an influx of new paperwork, and that translates to files that must be stored and managed.

As ZDNet’s Dana Blankenhorn reports, at least one investor, Bill Miller of Legg Mason, is bullish on healthcare IT as a result of this bill. As he sees it, this reform actually gives insurers a boost because it requires a raft of new signups. This will benefit IBM and GE, among many others. It’s easy to forget that a year ago, our industry was in a lather over the fact that President Obama was committing stimulus money to the health care industry for a push to digitize medical records. We are still sorting out the myriad implications of this new mandate, which sets aside nearly $20 billion for the effort. We do know, for example that tech giant Microsoft jumped all over it by introducing something called HealthVault Community Connect–specialized software to manage these records.

Taking a step backwards, the new landscape of medical research has put a strong demand on storage resources. As our blogger Mike Davis reported from a recent “next generation sequencing” conference, the pace of cutting edge genomic research, and the types of files it produces has led to an upsurge in the need for storage capacity. As it happens, Ocarina just put out a case study about the work it’s doing with Cornell University Center for Advanced Computing, which handles a massive influx of genomics files on a daily basis. For the full case study, go to the company Resources page and click on Cornell Case Study (may require signup).

What do you think? Are there storage challenges you or your organization predict as a result of this new legislation? We’d like to hear. The forum (below, in comments) is open!

Tech Field Day Redux

Posted by Sunshine On March - 22 - 2010

It’s back… That’s right, on Gestalt IT you can now find details on the upcoming Tech Field Day, to be held in Boston April 8 and 9. This event brings together bloggers from around the world for two days of deep dives at tech companies. The result is expected to be a multitude of tweets, blog posts, videos and photos. The concept is a clever one, and as participants in the first Tech Field Day, we’re thrilled to see it continue. The upcoming Boston event has an impressive list of presenting sponsors: EMC, Cisco, Data Robotics, HP, and something called VKernel, a VM optimization company.

Today I spoke to organizer and Gestalt IT Publisher Stephen Foskett about his plans. I couldn’t help but wonder, when looking at the list of companies that will be presenting, whether the profusion of big names represents a new chapter for Tech Field Day.

“I was trying to find interesting companies in Boston and they just happened to be more of the big IT companies,” said Stephen. “As always, I’m inviting great companies. That’s the real story.”

There is plenty of new blood among the invited bloggers. About half of the delegates weren’t at the last Tech Field Day. The list for Boston includes some well-known virtualization bloggers: Jason Boche, David Davis, Edward Haletky, and Gabrie Van Zanten. Another new face is Matt Simmons, a system administrator with a popular blog called Standalone SysAdmin. Meanwhile, the previous delegates developed some documents designed to help both sponsors and bloggers. These are now posted on Gestalt IT as well, at this link.

I also got the scoop on the upcoming Seattle Tech Field Day, slated for mid-July. While he declined to name the companies, Stephen did say it’s already half booked with presenting sponsors.

“Redmond is home to a huge number of tech companies. I underestimated the number,” he said. “Some of the companies have lots of different product lines and want to do repeats.”

For the many of us who aren’t going to be able to attend, the Twitter hash tag #techfieldday is the way to get a sense of the proceedings in real time. There’s also a handy Twitter list of the Boston delegates. We’ll be watching!

Where are the big chunks of storage space?

Posted by Sunshine On March - 5 - 2010

shrink-my-fork1This headline doesn’t refer to data in any kind of virtual sense of the word. Rather, there is an interesting factoid buried in a piece on the site Data Center Knowledge. Companies are finding it difficult to find big chunks of contiguous floor space, despite a growing demand.

Citing a recent survey by Digital Realty Trust, the article reads: “… 70 percent of companies planning data center expansions say they envision large projects of at least 15,000 square feet in size or 2 megwatts or more of power.”

Of all the companies surveyed, a whopping 83 percent said they plan to expand their data centers in the coming 12-24 months. Yet, the availability of this space is dropping precipitously. This could lead to a serious supply and demand crunch, according to Data Center Knowledge. Not only that, but the cost of powering these data centers is already the number one concern for many companies.

What do you think? Is this a concern for your company or those with whom you partner or serve?

Fast and Effective Dedupe

Posted by Carter George On March - 3 - 2010

I’ve noticed a few blog posts recently about speed of deduplication in the modern data center. I agree that speed is an important factor, but keep in mind that not all dedupe is created equal. That is to say, fast is good, but only if you are also effective. One of the tricky things has been that the easiest data to compress is also usually the most carefully performance tuned. A great example of this is a database. This is because databases are comprised of simple alphanumeric fields and sparse tables. All of that is easy to reduce in size.

However, a company’s core transactional database is the most conservative asset in the data center. Introducing compression would save space, for sure, but you could only use very fast, simple compressors there. At the same time, customers will be hesitant to deploy a new layer of processing in their most sensitive application.

So, where is most data growth? In fact, it’s being driven by unstructured data – Office documents, rich media, email with attachments, PDFs, Flash videos, and so forth. This complex data does not lend itself to fast simple compressors. But perhaps we should back up for a moment and think about how customers have been behaving all along.

Throughout the history of storage, there have always been tradeoffs available between fast expensive storage, and slower but cheaper alternatives. This is not a bad thing. It gives users alternatives based on their priorities and budgets. Back in the old mainframe days, these choices were between very expensive mainframe memory and “offline” storage like drums, cards, and tapes. Today the technology is all much bigger, faster, cheaper and sexier. But really, the tradeoffs are the same.

Data reduction technology adds another layer of choice above and beyond the traditional hardware choices. Now in addition to choosing whether you want fast, expensive solid state disk (SSD) or slower but very cost-effective SATA, you can also choose whether you want to compress and/or deduplicate the data that is stored on those disks.

Just like physical disks, compression and dedupe come in a range of speeds and capabilities.
There are simple and very fast compressors that are essentially invisible in terms of their impact on storage performance. There are more complex compressors that get better results, but which may take longer, either to compress or to decompress the data. Deduplication, done well, should always be pretty fast, and streaming dedupe rates of well of 300MB/sec are now available from many vendors (including Data Domain and Ocarina).

The emergence of tools to automatically tier data to its appropriate place help make the use of all of these technologies more feasible. That applies as much to solid state disks as it does to dedupe and compression. When data tiering can be made invisible to end-users and applications, then implementing multiple physical and logical tiers of storage becomes practical.  Good examples would include EMC’s new FAST tools, Compellent’s “Fluid Data Storage”, and HDS’s Data Migrator. When users or administrators have to move data by hand to get it to a compressed tier or a solid state disk, then the operational costs offset the capital savings.

You might want to be wary when someone’s biggest claim to fame is fast dedupe. Just as the old mainframe admin had to decide whether something was important enough to live in RAM, or could be stored on cheaper tapes instead, today’s IT shops have to decide where it is most important to try to get data reduction, and what tool will get the most bang for the buck for that kind of data. You need the whole story, and then you can decide based on your own priorities.

The Environment Still Matters

Posted by Sunshine On February - 22 - 2010

With all the talk about the data inconsistencies around climate change theory, one issue that I’d hate to see lost in the shuffle is the actual environment. That is, while I personally have been skeptical for some time about the alarmist tone many scientists took regarding global warming, it would be a shame if there was such a backlash that people forget about the much more crucial, larger issue at stake. That is, we need to look at all the ways –on macro- and micro-scales–that we can reduce the overall pollution we generate through our daily habits.

One of the persistent myths about the Internet is that it is clean and green. We overestimate the value of going “paperless” while lowballing the effect on the environment of data centers. One need only look at an online pub like Data Center Knowledge to see that one of the most talked about issues in data centers today is how to reduce rack space, cooling and other energy costs associated with storage. (Another great resource is Greg Schulz’s StorageI/O blog.) This is particularly true of the data being generated through our new Web 2.0 sharing habits. Jon Toigo can laugh about the exploding digital universe all he likes, but it’s still the case that data growth is going like gangbusters in this socially networked era. Recession or no recession, there is a growing demand for ways to make storage more efficient.

Large players in this space are all too aware of the environmental and financial costs of such rapid data growth. Every time you share a photo or video, you’re contributing to it. And who among us doesn’t do this nowadays? In response. companies are experimenting with all kinds of techniques, including new building designs making use of outside air, reducing overall rack space usage with data reduction such as is offered by this blog’s parent Ocarina, cloud adoption, and so on and so forth. Companies like Google, Yahoo and Facebook are also creating next generation storage architectures that are more efficient for handling the realities of today’s internet. In short, let’s be sure, as we discuss the fallout from the latest global warming debate that we don’t start acting too lax about the effect of our actions on the planet.

Storage Trends - Customer is King

Posted by Sunshine On February - 4 - 2010

kingcustomerLast week’s BD Event was more than just a deal making event. It was a chance to learn about new product releases and trend in the storage industry. The big picture: gone are the days when end users had to accept whatever the storage industry handed down to them. Today’s small-to-medium-sized storage operations are all about designing systems in response to customer needs. Whether that’s developing end-to-end dedupe, refining and improving processes for data recovery management, delivering automated marketing tools, improving data migration, or creating storage that is more energy efficient, the push is towards designing systems with real world customer needs in mind.

The BD Event organizers’ deep connections within the storage arena meant that the two-day conference in Palo Alto drew a who’s who of industry folks. I was particularly pleased to see the number of analysts and consultants on site, including Jerome Wendt, George Crump, Deni Connor, Dave Vellante, Stephen Foskett and Tony Asaro (who unveiled his new project, Voices of IT). I also spoke at length with storage writer Howard Marks, who has a new project called DeepStorage.net that looks very promising for companies seeking solid research that they can use as outbound marketing.

Pleasingly enough, this blog’s parent Ocarina was very much the talk of the conference after kicking off the first day’s emerging vendor showcase. Carter George, VP Products gave away the fact that end-to-end dedupe is becoming a part of the overall strategy for the company. This information set tongues wagging. As DCIG’s Jerome Wendt later blogged: “Ocarina Networks is another company that is adapting to new demands from its customers. Originally it started out doing post-process deduplication of large image files (JPGs, MPEGs, etc.) that had been dormant for 30 days or more - great stuff! But now its customers and even OEMs (Ocarina did not say who) are coming to it and asking for it to do end-to-end data deduplication from primary disk to backup disk without ever reconstructing it. After all, once the data it deduplicated on primary storage, why reconstruct it to then deduplicate it again when it is backed up?”

A good question, and one that was hotly debated and discussed among those in attendance. As Jerome notes, this is a perfect example of the customer responsiveness trend. It’s also an acknowledgment of something that’s been obvious to end users for some time–data reduction shouldn’t have to be isolated within each storage sector. In this day and age you really shouldn’t have to buy separate products to dedupe within primary, nearline, and backup. It’s like having to buy a separate dishwasher for your pre-rinse, wash, and dry cycles.

Other standouts at the event included Bocada, which has updated its DR management software by introducing a new product, Prism. I plan to have the CEO Nancy Hurley on my podcast, and so will learn more about how this update is serving its existing and new customers. I confess that I went to her presentation mainly because I wanted her on my show, but I quickly realized that there was something here of note. That is, the company is addressing a real gap in how well these processes are managed and improved, a key consideration with a crucial component like data protection. She gave a brief overview of the user interface, and on its face it seemed intuitive and flexible.

TechValidate also served as a great example of a company that has evolved based on customer needs. As CEO and founder Brad O’Neill explained during his emerging vendor presentation, originally the company was formed to serve companies that were having trouble getting customer references. These all-important testimonials are sometimes difficult to get–as many industries are gun shy about trumpeting their connections with too many IT and storage vendors. However, O’Neill soon recognized a larger need among its customers for usable marketing materials that could be generated from the information they were gathering. Now, the company has a wide range of customers across numerous industries that are using it as a way to serve up marketing publications.

One final highlight of the event–I got to speak with the NetApp blogger known as “Dr. Dedupe,” Larry Freeman. Larry is best known for running around in a lab coat and stethoscope asking people if they know anything about dedupe. The videos of these shenanigans are posted on his blog and on NetApp TV on YouTube. I suppose in a sense he and I are competitors. Turns out, he’s been writing a book, “Evolution of the Storage Brain” and posting it as he writes it, chapter by chapter, on his blog. This means that readers have a chance to comment on it and shape it as it goes along. Check it out!

Tagged Gets Shrunk

Posted by Sunshine On January - 29 - 2010

tag

Interesting story from the vault of the Ocarina case study library. Social network Tagged is the third largest social network in the U.S. It has seen traffic increase 10x over the past two years. With its focus on making new friends rather than simply getting to know existing ones, it has carved out a successful niche and is building an international subscriber base of over 80 million members.

The cost of this success? Data growth. Tagged’s storage infrastructure has been doubling every single year. With 1 million new photos uploaded every single day, Tagged needed a way to expand capacity and fast.

Compression with Ocarina meant about 10 TB of additional free space, which in turn meant they could put off buying new NAS equipment by several months. The lower average image size also meant reduced bandwidth and 15%-20% reduced monthly content delivery network (CDN) costs.

The company chose to go with Ocarina’s newest specialized image reduction technique, native format optimization (NFO). This is visually lossless compression of images that nevertheless delivers significant space savings–a technology that’s perfectly suited to the social networking environment.

The other crucial benefit to reducing image size was improvements in site responsiveness. “We’re sure that using Ocarina to reduce image sizes has helped improve our page rendering times,” said company CTO Johann Schleier -Smith. “That’s a big deal because it creates a better user experience, which means improved customer loyalty and higher market share.”

Read the entire case study by clicking here. Or visit the Ocarina resources page and click on the Case Studies tab, where you’ll find several others.

Databases - Compression Targets?

Posted by Carter George On January - 16 - 2010

The headline of this post poses a question that was raised in a recent comments discussion between Dave Vellante of Wikibon and myself on this blog. Dave wanted to know if there are use cases in which generic compression might still be useful. As I wrote in my post, most of the storage industry still relies on generic, or LZ compression. This is a shame, because it’s severely limited compared to possibilities inherent in more advanced, file type specific compression algorithms such as we at Ocarina use. My main point was that the more advanced, file type specific compression algorithms can be applied to the bulk of the files one finds in the modern data center–MS Office, Zip, PDF, video, images, and so on.

However, Dave was interested in hearing whether there are use cases in which generic compression could be commercially viable. My response was that data sets that are made of entirely of text files, and databases are the two examples in which it really doesn’t matter what type of compression you use–the generic type will work fine because essentially all you have to do is reduce text and/or alphanumeric data. But, I added, databases aren’t likely to be a compression target because there is too much of a performance trade-off. Also, this is unlikely to be a good commercial target as databases are the most conservative part of the data center. Dave pressed his case. He wanted to know if perhaps there are times when compressing a database would make sense.

He wrote: “I agree with your comments on a production database but what % of an organization’s database storage would you consider the ‘family jewels’ vs. copies of the database for things like decision support/data warehousing, snapshots, and other copies/clones for recovery purposes? If I can compress those supporting copies down 50-80%…why not?”

My answer: it varies by organization, but sometimes a large percentage of database data is in star schema data warehouses. Those databases, unlike the transactional databases, tend to support frequent whole table scans. That is, instead of fast small writes (transactions) in to the middle of a table, they see very large reads of everything in a table. Databases tend to be very compressible, and if you can compress them and still support the I/O rates you need for performance, by all means do so!
Transactional database performance tends to be measured in TPS (transactions per second) and TPS in turn is largely bounded by the speed at which the database can do direct I/O writes of transaction logs to stable store. Putting compression or dedupe in that path is risky. I’m not saying it can’t be done, but people will want to be quite sure it doesn’t mess up years of performance tuning. With data warehouses, you may have hundreds of Terabytes of data in simple so-called star schema databases, and the kinds of queries run against these databases tend to go through and read every row in every table.

Consequently, performance is bound by the ability of disk systems to sustain sequential reads of very large data sets. In this case, as long as decompression can happen at the rate of physical disk reads, then I see no reason not to compress or dedupe those databases. As I mentioned earlier, data in databases is largely alphanumeric. That means that both compression and decompression on that kind of data can be very fast - it lends itself to coprocessors like HiFN, for example. If your architecture provides a place to insert something like that, or if you have CPU cycles free enough on your database servers, I think data warehouses can be good candidates for both compression and dedupe.

With all that said, the future of compression is in reducing unstructured data. Why? Because this is where the greatest data growth is occurring. In order to address this problem, we’ll have to start looking at far more advanced algorithms than those that did the trick in the past.

Shameless Plug - Vote Online Sto Op Today!

Posted by Sunshine On January - 14 - 2010

picture-5It’s that time again. Storage Monkeys is a running a contest for the Top 10 Vendor Blogs, and once again, Online Storage Optimization is a nominee! Even more exciting, this blogger is listed on there, making it the only entrant with a woman blogger. Not to play the gender card or anything, but to me this is good news for the industry. And no doubt next year, there will be even more diversity represented in the list.

Here’s the full list of blogs that have been nominated. It really is an honor to be listed among these top bloggers such as Stephen Foskett, Marc Farley, Vaughn Stewart, Mark Twomey, Chuck Hollis, Hu Yoshida and so on. If you think there are some they’ve missed, it’s not too late to put a suggestion in the comments field at the bottom of the page. Note: you must be a member of Storage Monkeys to vote. Which, quite frankly, you should already be–this is a fantastic community site for sharing tips, information and opinions about storage.

Marc Farley (3Par) - http://www.storagerap.com/
Mark Twomey / Storagezilla (EMC) - http://storagezilla.typepad.com/
Chuck Hollis (EMC) - http://chucksblog.emc.com/
Stephen Foskett (Nirvanix) - http://developer.nirvanix.com/blogs/strategies/default.aspx
Barry Burke (EMC) - http://thestorageanarchist.typepad.com/
Hu Yoshida (HDS) - http://blogs.hds.com/hu/
Zetta Blog - http://www.zetta.net/blog.php/
Dave Graham (EMC) - http://flickerdown.com/
Val Bercovici (NetApp) -http://blogs.netapp.com/exposed/
Vaughn Stewart (NetApp) - http://blogs.netapp.com/virtualstorageguy
HP StorageWorks Blog - http://www.hp.com/storage/blog
Barry Whyte (IBM) - http://bit.ly/glxKh
Carter George and Sunshine Mugrabi (Ocarina) - http://onlinestorageoptimization.com/
Xiotech Blog - http://blog.xiotech.com/blog/
Cleversafe blog - http://dev.cleversafe.org/weblog/
Pete Steege (Seagate) - http://storageeffect.media.seagate.com/
Jay Livens (Sepaton) - http://www.aboutrestore.com/
Nick Triantos (NetApp) - http://blogs.netapp.com/storage_nuts_n_bolts/
Dave Hitz (NetApp) http://blogs.netapp.com/dave/
Michael Hay (HDS) - http://blogs.hds.com/michael/
David Merril (HDS) - http://blogs.hds.com/david/
Chris Poelker (FalconStor) - http://blog.falconstor.com/ChrisPoelker/
Pete Gerr (HDS) - http://blogs.hds.com/pete/
Storage Efficiency Insights (NetApp) - http://blogs.netapp.com/efficiency/
Larry Freeman (NetApp) - http://blogs.netapp.com/drdedupe/
Mike Workman (Pillar) - http://blog.pillardata.com/
Moshe Yanai (IBM) - http://www.xivstorage.com/blog/
Alex McDonald (NetApp) - http://blogs.netapp.com/shadeofblue/
Steve Klinkner (NetApp) - http://blogs.netapp.com/simple_steve/