
We’ve been hearing about high capacity storage systems, such as HP’s Extreme Data Storage 9100 System (ExDS9100) for the past year or so. There’s clearly huge potential for using these types of systems to manage many terabytes of data. We decided to sit down with Michael Callahan, chief technologist for network-attached storage in the HP StorageWorks Unified Storage Division, and get his views on the trends towards larger capacity storage, deduplication and other responses to the rising tide of data. Below is our interview.
Sunshine: What kinds of trends are you seeing in storage, from where you stand?
Michael Callahan: We feel like we’re very well aligned with two of the big trends that are underway. The first trend is that storage systems are consolidating. Many enterprises are realizing they’re creating lots of individual silos, and data is spreading across many filers. While it’s natural for this to happen, it’s very inefficient. You wake up one morning and realize you’re dealing with immense and costly complexity, as well as poor utilization. That’s why there’s been a huge amount of interest in consolidation as an IT project, really across all industries. It’s just now that people are getting focused on consolidation as regards storage in some of these environments.
The second trend, of course, is around data reduction and other similar efficiencies.
Sunshine: Why is dedupe such a hot topic these days, in your view?
Michael Callahan: Well, obviously it has to do with economics. Much of the last decade has been focused on cost control — being able to do more with less. The more geeky answer is that on the technology side, there is the fact is that machines are far faster than they were even five years ago. Over the years, the compute power in environments has risen significantly as compared to storage speeds. So it becomes much more appealing to spend some CPU cycles to reduce data before putting it into storage.
Sunshine: Wow. I’ve never heard it put in quite this way.
Michael Callahan: The way I look at is that ten years ago, someone might’ve proposed something along the lines of Ocarina, but it would’ve been harder to justify. Today in systems like our ExDS9100 — the Extreme Data Storage System and the space where we partner with Ocarina — we use industry standard components. That is, HP blade technology. So we’re able to leverage an engine that is building incredibly powerful, frugal blade systems with a lot of compute power, right in the storage system. Customers should be able to deploy that compute power into the storage tier very effectively in order to make it do interesting things such as Ocarina. And because the ExDS9100 is based on Linux, we can run the Ocarina software right within the storage system itself.
Sunshine: What do you see as the advantage of your storage system?
Michael Callahan: We feel we have huge advantages in being able to build systems that leverage the fact that HP has the most successful, widely-used industry standard blade infrastructure.
People are asking, what can we do to be efficient in the way we spend our dollars for storage, and utilize our data center space? Actually, this goes beyond cost. It’s not just the money, but also the space for storage. There is literally no place to put the storage even if you can pay for it. So in light of that, there’s this incredible push for some set of tools that will allow customers to optimize their use of storage.
One thing that we at HP do is to build a system in the ExDS that is in itself very dense, power efficient, and simple to manage. That’s a good starting point. But then it’s very compelling to be able to go beyond that and say furthermore, we’re able to do some very advanced things with Ocarina around compressing the types of data that tend to show up on these huge data sets.
We think we have a better integration with Ocarina than the other systems out there. For the customers who will be using Ocarina with our solution, there’s no box that says Ocarina. The same blade that’s in the storage system would have Ocarina running as software within it.
In our design, the expectation is that you’ll have lots of disk storage, and then you’ll want some number of blades. The architecture allows the choice about how much storage and how many blades to be made completely independently of each other, and to be revised without having to do any complicated repartitioning or migration of data.
Sunshine: That’s the PolyServe aspect of the architecture?
Michael Callahan: Yes, and the relevance to Ocarina is that Ocarina is a capability that consumes CPU cycles to process your data, and you might well need to have some flexibility in the amount of CPU power in a system. Suppose you have a system where you’re going to load up some huge data set–100s of terabytes of data–but then access it at a relatively low rate.
In our ExDS system, you can put 16 servers into one rack unit-each of those blades has 2 CPUs, each has 4 cores. You can actually have as many as 128 cores in one compact box. If you’re using Ocarina, you might choose to have some relatively large number of blades, because you’re compressing data and there’s significant computation involved in that. So during the ingest process, you can configure the system in such a way to optimize computation. But then once it’s all filled up, there’s no more need for lots of CPU to support the (Ocarina) Optimizer. So, at that point, you can take those blades out of the system and just run the Ocarina Reader on many fewer blades, proportional to the rate it’s being accessed.
In our approach there’s no partitioning of the data–every blade can access every part of that data set completely equally. You’re not required to go through some horrific rebalancing process to accommodate the number of blades. So it’s really natural fit.
Michael Callahan is Chief Technologist for network-attached storage in the HP StorageWorks Unified Storage Division. Previously, he was Chief Technology Officer at PolyServe, a software company that delivered scalable, highly-available shared data clustering solutions, from its founding until it was acquired by HP in April 2007. Before that he led advanced development at Ask Jeeves and did mathematics research at the Mathematical Sciences Research Institute in Berkeley. He has a BA from Harvard University and was a Rhodes Scholar and Junior Research Fellow in Mathematics at Oxford University.
