
The term “deduplication for primary storage” has become the latest industry buzz phrase. Yet, how much do any of us know about it? This week, I sat down with Shmuel Shottan, CTO of BlueArc, and learned a great deal about what makes this emerging technology such a crucial one at this time.
I was impressed by his soft-spokenness and ability to discuss these new innovations in layman’s terms. This is clearly a great year for BlueArc, which was recently awarded the Gold prize in the SearchStorage Product of the Year Award, for the Titan 3200. Our conversation is below.
Sunshine: Who are your customers, and what is the biggest problem that you solve for them?
Shmuel Shottan: BlueArc’s customers are mainly focused on high performance applications and are in data intensive markets. To be more specific, for many of our customers, BlueArc helps in driving value creation and revenue generation–for example, drug discovery, computer generated effects, and design and simulation. These are all customers that appreciate the value created by deploying a BlueArc system, which accelerates their applications. Another example is focused on consolidation. By consolidating many storage islands through the deployment of a BlueArc system, the customers lower their total cost of ownership and simplify their infrastructure.
Sunshine: There’s been a lot of talk about “dedupe for primary.” What are your thoughts about this technology, and where do you see it going?
Shottan: Dedupe for primary lagged dedupe for backup. Dedupe for backup became an embedded feature in every VTL system. Dedupe for backups lends itself because you have duplication. Primary was the next wave, because everybody started to talk about OpEx–operational expenses–or how much power, cooling, and space it takes to store all the data, which was increasing by a factor of 10 for many enterprises.
Data has a lifecycle. If you look at it holistically, you can apply the 80/20 rule. Only 20% of data needs to be accessed at the kind of performance level BlueArc realizes, while 80% of data is rarely accessed after a certain time period. Yet, all of it must be available 24/7. We’re solving the problem by adding a virtual tier of storage that is online, but which is compressed, and therefore is 1/3 or even 1/4 the cost.
Dedupe for primary is more challenging than dedupe for backup. It lends itself less to repetition, because it’s all different files. In tape libraries, you can do 30x because it isn’t all backup. The easiest way for a storage appliance is file-based dedupe, but that is not very efficient. Depending on the data set, you might get 80%, or maybe as much as 50% efficiency. The technology most relevant is the one taken from the backup dedupe, variable block. So while the need was there, the ratios were not good enough to justify the compression in many cases.
Sunshine: What types of industry verticals are facing the biggest storage challenges?
Shottan: Some of the industries that need this kind of accessible data storage include: media and entertainment, oil and gas, and bioinformatics. The data that is being collected and the applications that manipulate it are huge. “Huge” is a technical term, of course. (Laughs.)
Our solution includes primary storage compression, for which we have partnered with Ocarina Networks. Ocarina represents the next generation solution, which goes beyond block-level or file-based deduplication, and is far, far more effective at increasing capacity.
Sunshine: What are the specific storage needs of these industries?
Shottan: At a very high level, those are industries which have the following two requirements: Firstly they are data intensive industries, which means lots of primary storage needs. Secondly, their ability to efficiently run their business depends on how fast their applications run. This is why the two key attributes of the BlueArc system: performance and scalability, apply well.
Let me tie this need with the reason we have partnered with Ocarina. For industries such as oil and gas and bioinformatics, the situation is this: upon completion of a processing run, all the data needs to be kept around. However, for successive processing runs, the data can be compressed. This is where our multi-tiered storage comes in, and beyond that, we’ve been able to seamlessly integrate with Ocarina’s appliance to achieve this compression without having to invest in any new storage. Ocarina is the only offering we found that could successfully compress media rich files such as those created in genomics labs.
Sunshine: What about movies? Why are they so data intensive?
Shmuel Shottan: The biggest storage costs for the media and entertainment industry are in production. We work with studios that do animated films, and while the final product fits on a DVD, the production phase can be tens or even hundreds of terabytes. And it is dynamic data, not static data. The rendering time or the processing of rendering a scene to a movie takes time. And time is money.
For example, it can take almost a week to render hair on top of the head of an animated creature. Say you want to reuse that hair. If you as the animator has to recreate it from scratch because it’s now on tape, that’s weeks of extra work. This is why you want to keep that data online, because once you put something on backup–whether tape or VTL–you no longer can easily access previous scenes.
Sunshine: I’m guessing that for production, OpEx costs are extremely high.
Shottan: Yes, and we realized that this is a perfect situation for compression to Ocarina. They have designed compression algorithms for industry specific file types, including those used by Hollywood film studios. We’ve been able to integrate our two offerings and significantly reduce their storage footprint.
Sunshine: Thanks for taking the time to speak with me.
Shottan: It was my pleasure.
Shmuel Shottan is an industry veteran with thorough experience in the research and development of hardware and software, and in engineering management for firms ranging from start-ups to Fortune 500 companies. He holds a BSEE degree from the Technion, Israel Institute of Technology. His full bio is here.
BlueArc is a leading provider of primary storage solutions to enterprise markets, as well as such data intensive markets as electronic discovery, entertainment, federal government, higher education, Internet services, oil and gas and life sciences.




Having been previously involved in the Ocarina and BlueArc integration effort, my belief perspective is, the Ocarina solution exhibits potential. Exposure to real world applications and specific field deployments will ultimately judge its worthiness.