This week Online Storage Optimization will be blogging live from the floor of VMWorld. The conference, which kicked off today and runs through Thursday, has become a premier event that draws folks from all corners of the IT and storage industries. And no wonder, considering the transformative power of virtualization. Stay tuned for updates throughout the week. You might also want to follow me on Twitter.
And if you’re wondering where to go and mingle in the evening, there will be a Tweetup for storage folks tomorrow (Sept. 1) at B Restaurant and Bar in downtown San Francisco. To attend, RSVP here.
Data deduplication has become a very hot topic these days, especially in light of EMC’s recent and very high profile acquisition of Data Domain. This week, analyst George Crump of Storage Switzerland made some predictions as to where this technology is heading. His post, The Foundation of DeDupe’s Next Era, asserts that it will require many different approaches–likely from a number of vendors–in order to best reduce the multiple types of data found in primary storage. I agree with much of what he says, but here are some further thoughts on the topic.
First, a general observation. In every new major market, there is always an early winner, and then that early winner is typically leap-frogged by a 2.0 approach that solves the problems of the first wave. There are a number of examples of this. Browsers, for starters. Netscape made the market, only to be wiped out by Internet Explorer. In the file serving market, Auspex created the market, but NetApp blew them away. The list goes on.
With that in mind, there are four elements that I believe will define the winning architecture in Dedupe 2.0:
1. Global dedupe: Deduplication will find duplicates across multiple nodes and multiple storage pools. No matter where a data stream comes in to the solution, if it has a dupe, it will be found.
2. Post-Process: The second wave of dedupe will be a post-process architecture. Data Domain tells us as much when they focus so much of their marketing on their latest product (the 800 series) on why in-band is the right answer. They’re the market leader, they have a smoking fast new product – why are they so worried about post-processing that they make it the focus of their release messaging? Who are they worried about? Not the vendors they’ve already beaten. No, they’re worried because they know the 2.0 generation will be done this way. They are already positioning now for the new competitors they know they’ll see in the future; they’re being defensive, because they understand their own limitations better than anyone else.
There are several reasons dedupe will move to a post-process architecture, but the main one is better results in data reduction. Dedupe 2.0 won’t be just dedupe – it will be dedupe plus content-aware compression. This means two- and three-dimensional compressors need to see the context of data, not just the small window of data passing through memory in an in-band appliance. Done right, there’s no reason why post-processing can’t be just as fast as in-band, and data reduction will be dramatically better.
3. Scale-out Processing: In Dedupe 2.0 you will be able to scale out throughput by adding more nodes to your dedupe cluster to process in-coming streams. The Dedupe 2.0 cluster will look like one single target to backup (or other) sources. It will have a load-balanced global namespace, but behind that you could have one cheap server or 32 big fast ones. You’ll be able to start small and grow big, without changing anything on the backup software or writer side. Data streams can get load-balanced to any node, and because of global dedupe, any node can dedupe in real time with data coming to any other node. Instead of having to pick which model has the right throughput for you, start with one node, and if you grow from needing half a Terabyte an hour to 5 Terabytes an hour throughput, you add a few more nodes.
4. Scale-out Capacity: As the between backups (with short retention windows) and archives (potentially long retention periods) continue to blur, the dedupe 2.0 store wants to scale out to massive amounts of storage. That should be independent of processing capacity. For example, the shop that does not backup that much every day should not have to buy some top of the line model just so that they can get enough storage to keep their backups online for 7 years.
Just like processing and throughput capability, capacity should scale independently. You also should be able to add as much storage as you want – inside a dedupe 2.0 cluster node, on a SAN, or network-attached – independently of whether you bought the small cheap dedupe node or the big fast one or a cluster of many of them.
Some vendor will deliver a dedupe 2.0 cluster solution that meets these four must-have requirements. Who knows? That might be Data Domain, the winner of the first wave. But it might be someone else, too.
The question of what to do with already-deduped input streams is a separate but interesting topic. For the most part, customers voted with their wallets against doing source dedupe for backups. After all, EMC bought Data Domain even though it already had source-based dedupe technology Avamar.
More and more, file servers and even database servers are going to be doing dedupe of the primary and nearline file systems, not for backup, but for storage efficiency in primary storage. That means that data streams going to the backup solution with dedupe are going to be already deduped in some way.
All of which raises even more questions–which will have to wait for a later post. What’s the right way to deal with that? Is the answer something that needs to be done on the source side or the backup side? Meanwhile, I invite your comments.
Many organizations are struggling to manage their storage in the face of a massive upsurge in the amount of data that must be stored and made accessible. To save on primary storage costs, it’s imperative to make sure that it is being deployed in the most efficient way possible. As we’ve already been discussing at length on this blog and elsewhere, storage tiering is one key way of approaching this.
A few weeks ago, IDC co-presented a webinar with BlueArc and Ocarina on just this topic. It is now online and available for your viewing/listening pleasure at the link above.
Here are some of the highlights:
Noemi Greyzdorf, Research Manager, Storage Software, IDC discusses the process for putting a successful tiered storage infrastructure in place. She gives information on how significant savings can be gained in both acquisition and operating costs.
Victoria Koepnick, Sr. Manager, Product Management, BlueArc talks about how her company has developed technologies to fulfill these goals. She shows how intelligent tiering that is policy based can save on storage costs and ensure ease of use. Using the example of the recent death of pop star Michael Jackson, she talked about how with dynamic read caching, the many images could have been immediately pulled from the archive so that successive accesses to the files would’ve been immediate under such emergency circumstances. Great example!
Eric Scollard, VP of Sales at Ocarina gets down to brass tacks on the results from storage optimization and tiering. He shows how having a toolbox of data reduction techniques at one’s disposal can make a vast difference when it comes to the amount of money spent on storage.
We hope you enjoy the webinar, and please feel free to comment on it below!
We’re just one week away from VMWorld, to be held here in San Francisco Aug. 31-Sept. 3 and the storage blogo-tweet-osphere is lighting up like a Christmas tree. This blogger will be there with her trusty iPhone, ready to send out tweets on every imaginable topic, rumor, random thought, and food item she encounters.
We’re also hoping to make it to the VMWorld Tweetup on the first night of the conference, Monday, August 31. As with most such events, a major reason to attend is to sit around and gossip with industry folks. For me, this will also mean a chance to meet some people in person that I have so far only encountered virtually.
It’s obviously a big year for EMC, the majority owner of VMware and we expect to see many of its bloggers there, offering up plenty of blow-by-blows on what’s going on in each of the labs at the conference. We’ve already got word that Dave Graham will be in attendance at one of the EMC booths. We’ll see who else is on the ground.
We’ve also been told by HP’s Calvin Zito that he will be at VMWorld, and will be easily identifiable in his highly informative polo shirt. Let’s hope he has two or three of them or an in-room washer/dryer at his hotel.
On another topic, StorageRap’s Marc Farley has managed to turn the storage blogosphere into one big singing contest with his newest and goofiest video creation. See below for a screen grab–to watch the video you must visit his site:
Among other bloggers, yours truly makes an appearance (in avatar form), representing one of the few female storage bloggers. Still trying to understand the connection with former U.S. Presidents, but perhaps this is just my Monday brain. Thanks to Marc for starting the week out with a laugh. OK everyone, back to work.
As many people know, Ocarina Networks has been living up to its name lately. It really is becoming a “network”-oriented company, inking partnerships with just about all of the top storage vendors–HP, BlueArc, Isilon, and so on and so forth. This is great news for storage customers, who can now depend on the very best in data reduction, slashing storage costs.
For those who want to get a quick and entertaining hit on how one of these partnerships works–this one with BlueArc, might we suggest this new animated demo on the Ocarina site? This demo offers a case study in how a world-class CGI animation studio, Rainmaker Entertainment, deployed BlueArc storage with Ocarina to achieve astounding compression results. (For more on this, you might also want to take a look at our Q&A with Shmuel Shottan, CTO of BlueArc from last February.)
And for more on how Ocarina is joining forces with the top storage vendors to help media and entertainment companies maximize storage capacity, check out these recent news stories:
There’s been a lot of discussion about tiered storage lately. Most notably, Stephen Foskett has written a series of posts on the topic on his Nirvanix blog, Enterprise Storage Strategies. In his latest post, he essentially argues that tiered storage hasn’t turned out to be cost effective and that cloud storage could be the best option for the lower tier.
We certainly agree with him that unstructured data has become unmanageable due to the proliferation of rich media and other large files. We also agree that tiered storage hasn’t lived up to its promise to a large extent. However, let’s not be too quick to throw out the baby with the bathwater. As Hu Yoshida has discussed in a recent post, tiering has come a long way in light of new technologies, particularly virtualization. In our view, by combining virtual tiering at the block level (as described in Hu’s post) with virtual tiering at the file level you can get the best of both worlds.
Tiered storage used to be about moving data from one physical storage place to another. The premise there was that some storage was fast and expensive, and other storage was slower but cheaper, and that you could save a lot of money by moving data to the appropriate place.
This was a good idea in theory, but as it turned out there were a number of unforeseen problems. First, the tools for moving files were themselves sometimes expensive. There goes your cost savings. On top of that, they were sometimes good at moving the files but not at getting them back. And further, in situations where the fast tier and the cheap tier were not from the same vendor, it often proved difficult to make finding files that had been moved transparent to users and applications. As you can guess, these types of problems often made the whole thing more trouble than it was worth.
The fact remains, though, that most files are stored on storage that has more performance, and costs more, than is necessary for that file. Most storage admins know that 80% of their files could be stored on a cheaper tier, if it wasn’t a hassle or too expensive to do so.
One solution with immense potential is to have virtual tiers within a single filer or namespace. Virtual tiers are levels of dedupe and compression applied to a file, making it cheaper to store because it’s taking up less space. In a virtual tier, the file does not have to move anywhere – it can stay right where it is, but you reduce the cost of storing it by shrinking it. With dedupe and compression, there are lots of choices for trading off performance versus space savings.
Sun’s file system ZFS allows this, and cloud storage like Nirvanix can do this too — having the advantage of using the latest technology, and that the technology behind the cloud interface is invisible to the user of the cloud. Either way, let’s look at how you can implement virtual tiers while keeping files in the same place that they were created in.
Let’s say Tier 1 is for your fast hot files - they live on your Tier 1 filer, uncompressed. In that case, you might have a Virtual Tier 2 be all the files that have not been modified in 7 days, and Tier 2 would be that same filer, same volume, but with a policy that those files that meet the Tier 2 definition are deduped. No compression, just dedupe. In that case, read back times will be quite fast. Maybe not exactly as fast as reading the original un-deduped file, but almost.
A Virtual Tier 3 might be “files that have not been modified in 30 days” and the tier might be defined as dedupe plus light compression. Read back will be a tad slower, but space savings greater than dedupe alone. Finally, you might have a virtual Tier 4 – dedupe and maximum compression. This might fire more complex compressors that take longer to compress (and decompress) a file, but will get excellent space savings. Read back performance for tier 4 might be quite a bit slower, but the space savings might be 90% or more reduction in the file sizes in that tier.
Here’s the kicker: All of this can be done without moving a file off the filer it started on. Users and applications can still find the file right where it always was. If they access the file, the optimization solution will transparently “rehydrate” the file.
There are different solutions that can do some or all of these things today. NetApp’s dedupe can only dedupe all of the files in a volume or none, so it can’t be used today to create logical or virtual tiers within a volume. But other solutions, like the Ocarina ECOsystem, are policy-based and can be used to create multiple different logical (or virtual) tiers within a single filer or volume, with multiple dedupe settings (including Ocarina’s patented Object Dedupe) and multiple levels of compression, with choices of over 100 compressors for different file types.
Ocarina has been tightly integrated with certain types of storage – including cloud solutions like Nirvanix – and the most transparent virtual tiers would be with the combination of Ocarina and one the filer choices that have tight integration with Ocarina: BlueArc, EMC, HDS HNAS, HP, Isilon and Nirvanix (in alphabetical order – no vendor prefences implied!).
Of course, virtual tiers can be combined with real physical tiers, so that you can combine the level of storage optimization (dedupe, compression) with storage of different physical characteristics (expensive filers, cheap filers, cloud storage) to provide an environment that is not just a simple two-tiered model but a policy-driven environment of possibly a dozen or more logical tiers, with files being tiered-in-place or migrated-and-optimized automatically based on policy with little or no storage admin involvement.
As you can see, there is vast potential in this new approach to tiering. Even better, it can be achieved in such a way that storage admins’ jobs become easier, rather than harder. Like a lot of things, storage tiering has always been a good idea, but sometimes the technology has to catch up with the idea for implementation to become a good idea. Given the growth of storage, and the improvements in physical and virtual tiering, I think doing a better job of tiering must rank close to top of the list for many customers.
Steve Duplessie has a very interesting post up on his blog today. It is part of a series he’s doing on scarcity. He seeks to understand what made deduplication vendor Data Domain so attractive to both EMC and NetApp. What he comes to is that scarcity is now focused around power, cooling and rackspace. At one time, these were hardly considerations. Nowadays you can’t always get the amount of power you need at any cost, he says.
Writes Duplessie: “Today, things like power/cooling and floor space are the new scarcity factors of record. If CPU or capacity or bandwidth are essentially free, then the other considerations take precedence. Now in many major metropolitan areas, you can’t buy any more power - no matter how rich you are. That is a scarcity issue that drives value.”
I have heard this kind of thing firsthand from Ocarina customers. When I spoke with Graham Hobson, CTO of Photobox, Europe’s most popular photo sharing service, he told me power costs have quadrupled in the past few years in Europe. Meanwhile, many data centers aren’t really equipped to handle their 32 amps/rack storage systems. Rather, they were designed for telecom, where 8 or 16 amps are the norm. Ocarina’s data compression was an absolute lifesaver for his company, he said.
From Steve’s point of view, there won’t be a need for technologies such as dedupe once this is fixed at the source. Perhaps, but I doubt it. The more likely scenario is that there will be need to be a number of different approaches that all converge on the problem at once, and data reduction will continue to play a key role for the foreseeable future.
As it happens, there’s a piece in Forbes today by HP’s Chief Strategy and Technology Officer Shane Robison that takes up the data center power issue in greater depth. HP is certainly investing in dedupe–in fact, it recently announced a partnership with Ocarina in order to introduce data reduction into its NAS offerings.
Robison, in his article, describes HP’s newest energy saving data center, which seems likely to serve as a model for many others to come. Located in Wynard, a village on the Northeast coast of England, the new data center makes use of just about every power-saving approach out there. In addition to techniques such as virtualization and intelligent software, the building itself is equipped with an entire system of environmental designs inside and out. This includes using cool air from the North Sea to lower equipment temperatures–a trend we’ve been following with great interest on this blog.
So to me the answer isn’t to home in on one solution, but to make use of the multiplicity of options that are out there — lowering costs and benefiting the environment at the same time.
InfoStor editor Dave Simpson has a post on his storage blog about how Storage Networking World (SNW) can get back some of its lost luster. As he notes, trade shows in general are feeling the pinch in the current recession and SNW-which just posted the agenda for its upcoming fall conference–is no exception. For the post, he spoke with Mike Alvarado, a consultant in the storage industry and former Storage Networking Industry Association (SNIA) board member.
Alvarado has some pretty radical new ideas. Mainly, he recommends that SNW shift its focus away from end users towards channel partners, or so-called VARs as a way of bringing some energy back to the show.
Says Alvarado: “Resellers and integrators are a vital network storage industry segment. I have seen many great conversations take place between vendors and these partners at different SNWs; those exchanges represent opportunity to drive great value for our industry. I believe if SNW focused on optimizing the interaction between vendors and resellers/integrators, it would pay large dividends. Calling the show something else or founding a new show with different sponsorship would be needed, but whatever it takes the sooner this happens the better.”
Wow. A new name, even? This would be a major change of direction for the show. One question I had in reading this, is would this benefit innovation in the industry? As we noted in a post following last Spring’s show, one of the more disappointing aspects of it was the dearth of startups.
As our lead blogger Carter George wrote last April: “(Startups were) … always an exciting part of SNW for me: a peek into the future and a chance to see where the big guys might be headed next. As recently as last year I recall seeing maybe 15 or 20 small vendors taking up floorspace, all of them hoping to not only find end user leads, but also to catch the eye of potential big partners.”
Overall, it is good to see that people are talking about ways to keep the show going–and relevant. I can’t help thinking however that it might be worthwhile to wait out the recession and see if the energy just naturally returns to the show before making it turn in a whole new direction.
A few of the Ocarina crew recently returned from Siggraph2009, the 36th International Conference and Exhibition on Computer Graphics and Interactive Techniques. Held in New Orleans the week of August 3-7, the event drew participants from around the world. We were newbies to the event, and so decided get some perspective from an industry veteran, who has been attending Siggraph every year for the past decade and a half.
Michael Zachary Huber is an animator and educator who has worked with the top studios, including director James Cameron, Digital Domain, and Electronic Arts (EA). He’s an assistant professor at Cogswell Polytechnical College an animation and engineering school in Sunnyvale. Over the years he’s witnessed peaks and valleys when it comes to Hollywood’s love affair with visual effects.
“In the 1990s they were a novelty,” he said. “It was similar to the Internet, which came into its own later in that same decade. People were just on fire!”
The early 1990s were truly the heyday of Siggraph, he recalled. It wasn’t unusual to see stars like Danny DeVito and Arnold Schwarzenegger in attendance, and there was tremendous “buzz” in Hollywood. Yet some studios were burned by spending millions of dollars for lavish effects for movies that flopped.
He compares the new animation-driven effects such as CGI to the craze around 3-D movies in the 1950s. That type of special effect didn’t make movies better, just more novel.
“In some ways visual effects are the same thing. They are a tool, not an end unto themselves. The movie still has to be good for audiences to respond,” said Huber.
Nowadays directors and studios are getting smarter about where visual effects need to be used and where they don’t, he said.
Siggraph itself has been something of a barometer as to how the animation and effects side of the industry is faring. And this year, the attendance was far lower than in recent years, perhaps by as much as 25-50% in his estimation.
Huber’s interpretation–it’s not that the recession has meant that the industry is in real trouble, simply that this is a year in which studios are more cautious, but are still very much investing in the coming year. Said Huber, “I firmly believe there’s going to be a nice rebound.”
An exciting and fun part of the conference he said, is the Computer Animation Festival, where participants from around the world screen their latest work. Check out this extremely cool vid showing some of the work:
Huber himself has a short animation film, a co-production with Cogswell that he plans to show at next year’s event. Called “The Offering,” it’s a story that draws from Hindu legend yet includes elements from everything from Marvel Comics to Bollywood. It was made at the school, with students playing a large role in its creation. (The poster for the movie is shown at the top of this page.)
So, what about the geeky side of Siggraph?
“The conference definitely gets a steady stream of techno fans–people who are interested in the technology and want to come for the white papers,” he reported, adding that the need for efficient storage is something that many animation houses are recognizing.
“Looking at the first Transformers movie, which was made four or five years ago, Industrial Light and Magic (ILM), which did most of the effects, used about 30 TB of space for all of the files they needed during the production process. Compare that with the latest Transformers movie, which took up closer to 150-250 TB of space,” said Huber.
Even his own short film took up three terabytes of space to make, he said. So even small institutions that have limited budgets could benefit from some kind of compression or optimization technology.
Indeed, he said, the computer animation industry is perhaps the art form most closely tied to technology. This, he said, is another reason that this year’s Siggraph conference was less well attended–the tech industry is smarting from the effects of the recession.
Yet it’s also the pace of technological innovation that drives visual effects/animation studios to continually improve what they’re able to achieve. Effects such as 3-D animation are getting more complex, he said, especially now that studios have moved from 8-bit to 16- or 32-bit technology. One exciting new innovation is OpenEXR, a high dynamic-range (HDR) image file format developed by ILM. This allows studios to ingrain details in visual effects like never before, according to Huber.
“You’re not going to get the true richness unless you get the file formats like that,” he said.
However, he added that such new file formats are demanding and require a lot of space. This is no doubt why so many post-production and animation studios are looking for ways to optimize files in order to save disk space. As it happens, Ocarina stands alone in that it has algorithms that are designed specifically for over 100 file types–OpenEXR included. As many studios are already discovering, Ocarina is their ticket to space savings of as much as 80%.
Overall, Huber sees a bright future for his industry.
“We see films today that seem amazingly more complex and rich. In ten years, all of that will be topped by what is coming.”
Well, I for one can’t wait for those coming attractions.
The storage press has sniffed out a good story recently. Today, Beth Pariseau has a piece up on her Storage Soup blog that hones in on the drama surrounding the technology du jour–deduplication.
The post, “HP to EMC/Data Domain: Bring it On” has a headline that’s reminiscent of the sort of fighting words we heard from our former president.
Pariseau writes: “Admittedly late to the data deduplication game, Hewlett-Packard Co. is brewing new dedupe offerings to compete with the market’s new 800-pound gorilla — EMC/Data Domain. … HP partners with Sepaton for high-end VTLs and Ocarina for primary storage data reduction, but also develops deduplication software for its entry-level disk backup devices.”
Earlier this week, Chris Mellor at The Register covered the HP-Ocarina partnership news, also talking about it in terms of the rising competition for a complete dedupe solution. His article “HP Makes Ocarina Music” has a subhead that speaks volumes: “Ocarina close to clean sweep of file vendors.”
Mellor writes: “Ocarina has similar partnerships with BlueArc, EMC and Isilon. It looks almost inevitable that every other filer supplier must be looking at the Ocarina product and thinking a reseller deal might be a good idea. Otherwise, it could lose sales to the competition when a lot of image-type data is being stored.”
It will be interesting to see how this story unfolds. We vendor bloggers are already chattering about the recent partnership announcement, such as this post on the HP Storageworks “Around The Storage Block” blog. The post, by Pete Brey, WW Extreme Storage Business Development Manager, homes in on two recent HP announcements. First, its recent acquisition of IBRIX, and second its partnership with Ocarina.
Brey writes: “Now multi-petabyte systems are great when you have zillions of files that need to be stored but so is a multi-petabyte system that is optimized so that in the same space tens of zillions can be contained. This is where Ocarina’s ECOsystem software adds its value to our NAS products. The ECOsystem software transforms your storage with its content-aware storage optimization that compresses data up to 10:1 with added features such as deduplication, ECOsnap snapshots, and its own global name space capability. The unique thing about our reseller partnership is that HP can run the ECOsystem software right on our NAS nodes, further optimizing your infrastructure. Now there aren’t too many storage vendors out there who can talk about that now, are there?”
Bragging rights, indeed.
Clearly, it’s too soon to say exactly how each player in this space will benefit and/or lose out. As Mellor’s piece obliquely refers to, this isn’t about Ocarina setting itself in opposition to any vendors–in fact, it has a partnership with EMC. Rather, it shows how each provides a piece of the puzzle. In the big picture, there needs to be a shift in thinking towards something more along the lines of end-to-end dedupe–something that our lead blogger Carter George talked about at length in his popular post, “The Dedupe (R)evolution.” But in the short-run it’s certainly good to see how each vendor is distinguishing itself, and working hard to provide the most efficient, cost-effective storage options to its customers.