Content feed Comments Feed

Online Storage Optimization

Exploring Next Generation Storage Solutions

Archive for January, 2010

The BD Event Video

Posted by Sunshine On January - 31 - 2010

Last week, a group of us participated in a groundbreaking new anti-trade show, The Business Development Event. Organized by industry veterans Greg and VaNessa Duplessie, the event was the second of its kind and the first in the Silicon Valley area. Held in Palo Alto, California, it drew dozens of storage industry members who spent three days talking, networking, making deals happen and sharing their skills and expertise.

Online Storage Optimization was on the scene–tweeting, talking and taking the occasional sip from a glass of wine that happened our way. Our parent Ocarina Networks was also featured in the “emerging vendors” showcase. And this blogger was on a panel on social media with VaNessa and Stephen Foskett, director of consulting at Nirvanix and publisher of Gestalt IT.

Here’s a small video “tribute” to the event that I hope gives a sense of it:

The BD Event, January 2010 from Sunshine Mugrabi on Vimeo.

In this video:

Nancy Hurley, CEO Bocada

Bill Basinas, Dir. of Business Development Tarmin

Alan Atkinson, Pres. & CEO Xiotech

Jerome M. Wendt, DCIG

Steve Sicola, CTO, Xiotech

Camberley Bates, Managing Director, Evaluator Group

Julie Ryan, Director of Alliances, Engenio Storage Group LSI

The next BD Event will be held in Boston this summer. Woe betide any east coast storage folks foolish enough to miss it!

Tagged Gets Shrunk

Posted by Sunshine On January - 29 - 2010

tag

Interesting story from the vault of the Ocarina case study library. Social network Tagged is the third largest social network in the U.S. It has seen traffic increase 10x over the past two years. With its focus on making new friends rather than simply getting to know existing ones, it has carved out a successful niche and is building an international subscriber base of over 80 million members.

The cost of this success? Data growth. Tagged’s storage infrastructure has been doubling every single year. With 1 million new photos uploaded every single day, Tagged needed a way to expand capacity and fast.

Compression with Ocarina meant about 10 TB of additional free space, which in turn meant they could put off buying new NAS equipment by several months. The lower average image size also meant reduced bandwidth and 15%-20% reduced monthly content delivery network (CDN) costs.

The company chose to go with Ocarina’s newest specialized image reduction technique, native format optimization (NFO). This is visually lossless compression of images that nevertheless delivers significant space savings–a technology that’s perfectly suited to the social networking environment.

The other crucial benefit to reducing image size was improvements in site responsiveness. “We’re sure that using Ocarina to reduce image sizes has helped improve our page rendering times,” said company CTO Johann Schleier -Smith. “That’s a big deal because it creates a better user experience, which means improved customer loyalty and higher market share.”

Read the entire case study by clicking here. Or visit the Ocarina resources page and click on the Case Studies tab, where you’ll find several others.

One small bit for mankind…

Posted by Sunshine On January - 22 - 2010

Thanks to Data Center Knowledge for picking up this science geeky piece of news–turns out that the SNAFU-prone CERN Large Hadron Collider is quite the data beast. The detectors built into the giant science experiment are coughing out gigabytes of data every second. One detector has 100 million readout channels. The below video is a mind blowing journey into the data center that is powering this experiment. As someone points out in the comments field, despite the multi-petabytes of data being generated there, the collider experiment all really comes down to one bit of data. That is, the Higgs boson that the gigantic experiment is designed to produce.

Of course, some of us are still wondering if our grandchildren are trying to stop us discovering it.

Meanwhile, enjoy this video.

Storage News and Views - January 19

Posted by Sunshine On January - 19 - 2010

Bleary-eyed, the storage industry has begun to wake up from its holiday stupor. VMware has decided to go into the email business. EMC continues to vacuum up talent like a Roomba on a tear through the world’s biggest living room. Meanwhile, the jokers over at Gestalt IT are picking up the “Fake Steve Jobs” meme and running with it. Their version is actually funnier than the original — at least to this blogger, perhaps because I know the players and situations.

The increasingly crowded and competitive Storage Monkeys Top Vendor Blogs contest is about to screech to its exciting conclusion. Voting ends Friday. Front runners are EMC bloggers Chuck Hollis and Storagezilla. Third place at the moment is the HP Storageworks blog, helmed by fearless blogger Calvin Zito. This puts early front runner Marc Farley, founder of the vaunted Steering Wheel Camera Society of America in fourth place. Step on it, Marc! In fifth right now is the Storage Anarchist, Barry Burke, who is just barely edging out NetApp’s Val Bercovici. Well, it ain’t over till it’s over–these things can change fast.

Last Night Santa Cruz - The Opera Lady at the tail of the parade

So before the week is out–why not VOTE?

Speaking of which, blogger extraordinaire Stephen Foskett has started a series that delves into the whole vendor blogging question. He has two posts up on the topic, “Vendor Bloggers 1: Why Does It Matter?” and “The Spectrum of Vendor Blogs.” Mr. F cites none other than Online Storage Op as an example of a hybrid “independent-seeming official” blog, but credits us for being transparent about the fact that our parent is a company. No doubt Stephen and I will hash this out further when we give a talk on social media to a group of storage industry pros at The BD Event in Palo Alto next Wednesday.

But wait… there’s yet more news, and this is actual news:

Nexsan and FalconStor are teaming up to try to defeat rival Data Domain. It can get really interesting when two vendors come up with a combination product that serves a larger purpose than they would’ve had if they acted alone. Two pieces on the topic caught my eye this week:

Beth Pariseau, SearchDataBackup - Nexsan and FalconStor gun for EMC Data Domain with Dedupe SG 2 data deduplication backup device

Writes Beth: “Analysts say a series of updates to Dedupe SG — comprised of FalconStor’s dedupe software and Nexsan enterprise data storage systems — put it into closer competition with the 800-pound gorilla Data Domain.”

She quotes ESG’s Lauren Whitehouse, who says that the high-availability config on this combo is a poke in Data Domain’s eye. And Dave Vellante of Wikibon calls the bundle the “best of both worlds” due to the fact that it’s compatible with existing home office systems and reduces data over the WAN–though he questions how it will do in real world deployments.

Joseph Kovar, ChannelWeb - Nexsan, FalconStor Join Forces On Newest Backup Appliance

Joe, for his part talked to Greg Knieriemen at Chi Corp., which partners with both Nexsan and FalconStor who is impressed with among other things the potential inherent in its 10-GB-ethernet option. Hmm, where have I heard that name Greg Knieriemen before?

Well, that’s all for now folks. Maybe next time I talk to you I’ll be checking my zMail.

Databases - Compression Targets?

Posted by Ocarina On January - 16 - 2010

The headline of this post poses a question that was raised in a recent comments discussion between Dave Vellante of Wikibon and myself on this blog. Dave wanted to know if there are use cases in which generic compression might still be useful. As I wrote in my post, most of the storage industry still relies on generic, or LZ compression. This is a shame, because it’s severely limited compared to possibilities inherent in more advanced, file type specific compression algorithms such as we at Ocarina use. My main point was that the more advanced, file type specific compression algorithms can be applied to the bulk of the files one finds in the modern data center–MS Office, Zip, PDF, video, images, and so on.

However, Dave was interested in hearing whether there are use cases in which generic compression could be commercially viable. My response was that data sets that are made of entirely of text files, and databases are the two examples in which it really doesn’t matter what type of compression you use–the generic type will work fine because essentially all you have to do is reduce text and/or alphanumeric data. But, I added, databases aren’t likely to be a compression target because there is too much of a performance trade-off. Also, this is unlikely to be a good commercial target as databases are the most conservative part of the data center. Dave pressed his case. He wanted to know if perhaps there are times when compressing a database would make sense.

He wrote: “I agree with your comments on a production database but what % of an organization’s database storage would you consider the ‘family jewels’ vs. copies of the database for things like decision support/data warehousing, snapshots, and other copies/clones for recovery purposes? If I can compress those supporting copies down 50-80%…why not?”

My answer: it varies by organization, but sometimes a large percentage of database data is in star schema data warehouses. Those databases, unlike the transactional databases, tend to support frequent whole table scans. That is, instead of fast small writes (transactions) in to the middle of a table, they see very large reads of everything in a table. Databases tend to be very compressible, and if you can compress them and still support the I/O rates you need for performance, by all means do so!
Transactional database performance tends to be measured in TPS (transactions per second) and TPS in turn is largely bounded by the speed at which the database can do direct I/O writes of transaction logs to stable store. Putting compression or dedupe in that path is risky. I’m not saying it can’t be done, but people will want to be quite sure it doesn’t mess up years of performance tuning. With data warehouses, you may have hundreds of Terabytes of data in simple so-called star schema databases, and the kinds of queries run against these databases tend to go through and read every row in every table.

Consequently, performance is bound by the ability of disk systems to sustain sequential reads of very large data sets. In this case, as long as decompression can happen at the rate of physical disk reads, then I see no reason not to compress or dedupe those databases. As I mentioned earlier, data in databases is largely alphanumeric. That means that both compression and decompression on that kind of data can be very fast - it lends itself to coprocessors like HiFN, for example. If your architecture provides a place to insert something like that, or if you have CPU cycles free enough on your database servers, I think data warehouses can be good candidates for both compression and dedupe.

With all that said, the future of compression is in reducing unstructured data. Why? Because this is where the greatest data growth is occurring. In order to address this problem, we’ll have to start looking at far more advanced algorithms than those that did the trick in the past.

Shameless Plug - Vote Online Sto Op Today!

Posted by Sunshine On January - 14 - 2010

picture-5It’s that time again. Storage Monkeys is a running a contest for the Top 10 Vendor Blogs, and once again, Online Storage Optimization is a nominee! Even more exciting, this blogger is listed on there, making it the only entrant with a woman blogger. Not to play the gender card or anything, but to me this is good news for the industry. And no doubt next year, there will be even more diversity represented in the list.

Here’s the full list of blogs that have been nominated. It really is an honor to be listed among these top bloggers such as Stephen Foskett, Marc Farley, Vaughn Stewart, Mark Twomey, Chuck Hollis, Hu Yoshida and so on. If you think there are some they’ve missed, it’s not too late to put a suggestion in the comments field at the bottom of the page. Note: you must be a member of Storage Monkeys to vote. Which, quite frankly, you should already be–this is a fantastic community site for sharing tips, information and opinions about storage.

Marc Farley (3Par) - http://www.storagerap.com/
Mark Twomey / Storagezilla (EMC) - http://storagezilla.typepad.com/
Chuck Hollis (EMC) - http://chucksblog.emc.com/
Stephen Foskett (Nirvanix) - http://developer.nirvanix.com/blogs/strategies/default.aspx
Barry Burke (EMC) - http://thestorageanarchist.typepad.com/
Hu Yoshida (HDS) - http://blogs.hds.com/hu/
Zetta Blog - http://www.zetta.net/blog.php/
Dave Graham (EMC) - http://flickerdown.com/
Val Bercovici (NetApp) -http://blogs.netapp.com/exposed/
Vaughn Stewart (NetApp) - http://blogs.netapp.com/virtualstorageguy
HP StorageWorks Blog - http://www.hp.com/storage/blog
Barry Whyte (IBM) - http://bit.ly/glxKh
Carter George and Sunshine Mugrabi (Ocarina) - http://onlinestorageoptimization.com/
Xiotech Blog - http://blog.xiotech.com/blog/
Cleversafe blog - http://dev.cleversafe.org/weblog/
Pete Steege (Seagate) - http://storageeffect.media.seagate.com/
Jay Livens (Sepaton) - http://www.aboutrestore.com/
Nick Triantos (NetApp) - http://blogs.netapp.com/storage_nuts_n_bolts/
Dave Hitz (NetApp) http://blogs.netapp.com/dave/
Michael Hay (HDS) - http://blogs.hds.com/michael/
David Merril (HDS) - http://blogs.hds.com/david/
Chris Poelker (FalconStor) - http://blog.falconstor.com/ChrisPoelker/
Pete Gerr (HDS) - http://blogs.hds.com/pete/
Storage Efficiency Insights (NetApp) - http://blogs.netapp.com/efficiency/
Larry Freeman (NetApp) - http://blogs.netapp.com/drdedupe/
Mike Workman (Pillar) - http://blog.pillardata.com/
Moshe Yanai (IBM) - http://www.xivstorage.com/blog/
Alex McDonald (NetApp) - http://blogs.netapp.com/shadeofblue/
Steve Klinkner (NetApp) - http://blogs.netapp.com/simple_steve/

Storage Industry Lags Behind Advances in Compression

Posted by Ocarina On January - 13 - 2010

There’s a lot of talk about compression these days, but how much do we know about it? Well, for one thing, compression as a research area for mathematics has evolved much faster than most people realize. The thing is, most compressors used in computer products, including dedupe appliances, use generic algorithms rather than making use of these advances.

Most storage products use Lempel-Ziv (LZ) or derivatives, and try to use that single compressor to compress everything. These algorithms have been around forever, and for the most part, have not evolved much in the last ten years other than in the area of performance. This is too bad, because compression has advanced in exciting ways. LZ and its cousins work well on the kinds of data that were around 10 or 20 years ago - plain text, plain numbers, or combinations of those things. They do not work so well on a lot of modern data - images, video, Office documents, PDF’s, already-compressed files like Zip, encrypted data, etc. What’s important to understand is that all the most notable advances in compression that apply to storage have taken place not in generic compression algorithms, but in file type specific ones. File type specific compressors can, in fact, deal with all those modern data types.

Compression is all about pattern recognition and prediction. You look for patterns in a file and if you can find those patterns you try to predict their occurrence. If you can predict a pattern, you can compress it. So understanding the kinds of patterns that might show up in a file - video, a Zip file, music, and a PowerPoint are all very different - is the key to building a compressor for that file type.

What’s especially relevant is that the most important thing in compression of data today is recompression. Almost all of the file formats that are driving data growth, and taking up the most space on backups, are already compressed. Think of a file type that’s eating up space, and it’s likely to an already-compressed format: JPEG, video, Office, PDF, mp3, medical images … all compressed already.

A generic compressor won’t get any results at all on an already-compressed file. That’s because the first compression obscures the patterns that a compressor would look for. That’s why if you try to compress, say, a Zip file, if anything you’re likely to make it bigger. Recompression means first decompressing the file and then recompressing it with a better compressor. To do that, you have to recognize what kind of file it is, what kind of compression has been applied, and how to decompress it. By first decompressing it, you are able to see and process the patterns that make better prediction and compression possible.

Almost every market has a set of well-defined file types that make up the bulk of its unstructured data. In medical imaging, it’s Dicom (which in turns contains JPEG 2000, JPEG LS, and TIFF). In seismic, it’s seg-y. In satellite imaging, it’s NTF, MrSID, GeoTIFF and a few others. In the average business, it’s Office, PDF, photos and video.

In specific industries, you see very advanced compression implemented in the application layer, not in storage. Video is a great example - the whole concept of the video codec is all about compression. Whole companies exists specifically to do better video compression (On2 is a good example), but this compression is done primarily for transmission, and implemented as part of the video application workflow, not as a storage technology.

In a world that had all plain ASCII text data, generic compressors would be great. But that’s not the world we live in. For compression to have any meaningful impact on today’s data sets, you have to have file type aware recompression.

It’s a shame that most storage products today have not implemented the most exciting advances in modern compression mathematics. My company Ocarina is quite frankly one of the few exceptions. The compressors found in tape drives or in dedupe appliances represent the best of the evolution of the generic compressor. The thing to look for going forward is the emergence in storage products of the next generation set of file type aware compressors, which is where all the action has been over the last ten years.

Going Social - EMC stands tall

Posted by Sunshine On January - 11 - 2010

If you haven’t yet tuned into the weekly podcast known as Infosmack on community site Storage Monkeys, you’re missing out. Every Monday, hosts Greg Knieriemen and Marc Farley bring on guests to dish about the latest storage industry news–and they do so in a very entertaining and informative way. This week’s show was particularly enjoyable, as they moved on from the usual format and turned the mirror around, so to speak–discussing social media such as blogging and Twitter, and how well big companies like EMC, HDS, IBM and HP are doing on that front.

The guests this week were Louis Gray and Mark Twomey–two guys who have made a serious mark on the social media landscape. Louis, who blogs daily at LouisGray.com is a recognized social media expert whose reputation extends far beyond the storage and networking industries. He is the co-founder of social media consulting firm Paladin Advisors and was at BlueArc for many years. He now advises such diverse clients as Emulex, My6Sense, Brazen Careerist, and Simler on social media strategy. I interviewed Louis on video recently–check out Part 1 and Part 2 to get his views on the latest social media debates.

Mark, who goes by the moniker Storagezilla, is a trailblazer at EMC with his controversial blog and Twitter persona. His blog, he explains on this week’s podcast, was originally written anonymously. He was then “outed” by someone at HDS. Meanwhile, within EMC there were forces that tried to suppress him. But nothing seems to stop this saurian storage monster from terrorizing anyone who shows the slightest sign of hypocrisy, ignorance or self-inflation. (Yes, even this blogger has been a target, but no worries, we’ll get ours back…)

Nowadays, EMC has embraced the new social landscape with a vengeance — in fact, the panelists agreed it’s doing social media better than almost any other big storage company. The secret: go ahead and let your employees blog and tweet to their heart’s content. Though they can be a liability, they’re also the best evangelists for your products and services. Storagezilla has had had his wrist slapped more than once for his NDA-breaking, irreverent blog posts. But he’s also a popular and well-known figure who brings the word of EMC to the masses. Other EMC bloggers like “The Storage Anarchist” Barry Burke are also controversial. And that, in many ways is a good thing. This is no blank, corporate face, but rather one that’s full of lively (sometimes, some might say too lively) discussion and debate.

As it happens, my own podcast TechnoGirlTalk takes up a similar topic on this week’s show. My guests and I discuss the fact that storage titan EMC may well have set the tone for the entire industry–one that is marked by aggressive, intense competitiveness. The Twitter smackdowns that are common among storage folks are easily found as they are rare in other communities. As ESG analyst and EMC alum Terri McClure explained, this is really the history of the storage industry. EMC started out as a tiny David taking on the Goliath known as IBM–a gamble that required it be tough as nails and not pull any punches. Another of the guests, Christina LeBlanc, elaborated on this. She’s on the front lines as an account executive at EMC, and gets “beaten up” out there every day.

When she first attended the EMC tour, she and the other new hires were told that EMC’s original gambit was to hire football players as salesmen. They figured that these guys would be too tough to back down, and wouldn’t know enough to realize how impossible it is to beat IBM. Christina explained that nowadays, sales folks have to know their stuff or they’ll be laughed out of the office. And while it’s still a tough, competitive job, she puts a greater emphasis on being sensitive to the customer’s needs and seeking to serve them.

Even though EMC has come a long way from its bull-headed beginnings, that reputation still hangs over the company like a miasma. As blogger Stephen Foskett writes in a post on Gestalt IT (and his own blog) this week,  “I’ve known literally dozens of IT shops who refused to buy from EMC, even though the sleazy sales tactics that turned them off (and indeed the sales reps themselves) are reportedly long gone from the company.” But, he argues, today’s competitive landscape is so tough that EMC now just seems like one of the crowd. “With the market getting tougher, the tough guy doesn’t look so bad anymore,” he writes.

As Storagezilla and the others on the podcast noted, there’s been something of a detente among storage bloggers of late. The winds of peace might be blowing through the industry. Or, maybe everyone’s just tired from the holidays and will be back out in no time with guns blazing.

And speaking of healthy competition, if you like this blog, why not vote for it on Storage Monkeys this week? They’re once again running their contest for the Top 10 Storage Vendor Blogs and Online Storage Op is one of the finalists. You must be a member of Storage Monkeys to vote–so now’s your chance to sign up and join the conversation there if you haven’t already.

The BD Event - Are you going?

Posted by Sunshine On January - 8 - 2010

Once in a great while someone comes up with an idea that makes you slap your palm to your forehead and ask, “Why didn’t I think of that?” Such is the case with The Business Development Networking Event, or “BD Event.” Organized by storage industry veterans Greg and VaNessa Duplessie, this conference fills a clear need that has arisen for industry insiders to meet and network among themselves. The costs are reasonable, and there are no sponsors or exhibitors to clutter up the place.

The description on the home page of the site sums it up: “Our mission is to create a compelling and inexpensive business development and networking event for industry insiders – one that focuses on networking and building relationships and that does not require exhibiting or catering to end-users.”

The next event will take place in Palo Alto, California January 26-28. The Ocarina crew will be there. The panels all look like they’re designed with a “need to know” agenda in mind. (Full disclosure, this blogger is on one of them–a panel on social media.) They cover such topics as: IT sales strategies, M&A and growth strategies for storage companies, channel partners and so on. The main event is networking. There is plenty of time set aside for it, in both structured and unstructured forms.

Hope to see you there!

Storage News and Views, January 7

Posted by Sunshine On January - 7 - 2010

An earthquake shook up San Jose (not mention Twitter) today. But that didn’t stop the storage industry’s movers and shakers from making all kinds of interesting news. Here’s a quick roundup from where we sit.

A game of musical chairs…

EMC lost storage tech consultant and blogger Steve Kenniston to inline dedupe player Storwize, where he will be Vice President of Technology Strategy. Steve continues to blog avidly and well at The Storage Alchemist. We may find ourselves crossing swords with him occasionally over here at Online Storage Op, but we always read his posts with interest.

And EMC has been no slouch in scooping up major talent:

Gestalt IT contributor Ed Saipetch (known to many of us edsai) started this week at EMC as Senior VCE Specialist. Prior to that, he was a systems engineer at Network Storage.

And Scott Lowe set tongues wagging when he announced last week on his blog that he’ll also be joining EMC, as a VMware-Cisco Solutions Principal. This seems a very shrewd move on their part, as Scott is well-known for his Cisco expertise and virtualization knowhow–both of which no doubt will be extremely handy as UCS takes off.

Nice going, EMC.

And in other news…acquisition fever!

Disk drive array subsystem provider Dot Hill has bagged Israeli storage virtualization company Cloverleaf for $12M in cash and stock, the Register reports. Clearly, they’re locking onto the virtualization and cloud storage trends with a vengeance. This may also give them a new edge in their battles with competitors like LSI and Xyratex.

Plenty of takeover rumors swirling around 3Par, although for now no one’s confirming anything. Whatever happens, it seems that everyone’s impressed by 3Par, the little thin provisioning engine that could. Over here at this blog, we’re consistently impressed and amused by their creative blogger Marc Farley. Financial pub Barron’s, in addition to initiating the speculation, called 3Par a “small but scrappy” possible takeover target. But who will be the suitor? Feel free to add in any and all rumors and speculation in our comments field below.

And don’t let us forget EMC, which in addition to snagging talent has picked up Archer Technologies, an IT governance software company that will be rolled into its overall security offerings for its RSA division. As Beth Pariseau reports on Storage Soup, the acqui could affect some others in the industry.

Writes Beth, “Archer brings with it a business continuity software module, which could affect those who manage disaster recovery in the storage environment. It also extends EMC’s move to inject automation into its software offerings, which we’ve seen in the storage market with last month’s first release of FAST, and is a part of EMC’s vision for archiving and e-Discovery.”

Well, that’s all for now. No doubt we’ll be seeing all manner of intrigue, rumor, speculation and other fascinating stuff now that 2010 is upon us.