I just got back from the Exploring Next Generation Sequencing conference held at the Rhode Island Conference Center in Providence, RH. I was pleased to encounter a number of folks who were extremely enthusiastic about the Ocarina solution for Life Sciences. In a way, I shouldn’t have been all that surprised. We’re the only vendor in the industry to have delivered industry-specific data-reduction solution. In other words we understand their file types: SRF, TIFF, FASTA, etc. and therefore have the power to reduce those files far more effectively than other data reduction options.
It’s a very exciting time for the entire Life Sciences industry, which is in a period of rapid advancement that could soon translate to cures for some of the most deadly diseases of our times, such as heart disease, cancer, and Parkinson’s. This is mainly due to the fact that there have been quantum leaps in gene sequencing computing that enable genetic data to be analyzed to levels that weren’t possible even a few years ago.
As I mentioned in the Luncheon Keynote that I delivered, the genomics space is in many ways parallel to film-making circa 1998. (See our recent interview with animator Mike Huber.) There are a few early adopters with big capex investments, but the digital workflow is really just emerging, and has yet is only narrowly deployed in the commercial marketplace (pharma, healthcare, agriculture, etc).
Here are some of the facts about this industry that made me sit up and take notice:
- 80% of all sequencing is performed as part of research and 20% as a commercial effort.
- We still see strong investments in vertical integration: For example, Complete Genomics is designing a scanner for their internal use only.
- Thus far, only 12 people have had their DNA completely sequenced.
- No one knows what the variation is across the human genome, or what the “normal” human sequence is.
- There are already efforts to resequence known genomes to improve accuracy to acceptable levels.
- Entire fields of genomic research are just emerging: Genotyping, large-scale correlation (GWAS), Transcriptomics and RNA, agriculture, genealogy, pharmacogenomics, elective sequencing, discovering the microbiome, etc.
- Within 2 years, the cost to sequence an entire individual’s genome will be less than the cost of many specific chromosomal deficiency tests today.
- Within 3 years, some hospitals will generate a sequence for every incoming patient, partly mitigate the risk of adverse drug reactions.
- Only an extremely small fraction of physicians have the knowledge to meaningfully leverage genomics in caring for patients.
- Life Sciences storage per employee has grown 500% year-over-year [Corporate Tech survey].
- Sequencing throughput in 2005 shifted from linear (2-3x growth per year) to logarithmic growth (10x per year), leading to dramatically reduced cost per sequence and anticipated explosion of usage models. [Duke University preso]
- Existing sequencer models use still-image based data acquisition (typically 8-14bit TIFF images), which are commonly thrown away to relieve storage load.
- Next generation sequencers from companies like Pacific Biosciences will use video-based data acquisition.
The conclusions you should draw from these factoids are the following:
1) This is a highly immature industry (in the MBA sense, not the people sense!) with dramatic market growth anticipated as adoption begins to take off in non-research markets, and;
2) Data growth at any given customer will be nonlinear and dramatic, with the cost of storage infrastructure taking an increasing share of the IT budget.
Clearly Ocarina has an important role to play in the sequencing workflow, not only to reduce storage consumption for inactive data, but also simplifying and improving data packaging, movement, replication, and backup. In short, there’s going to be a lot of opportunities to add value. Fortunately we’re getting a lot of good feedback from the industry leaders like Illumina and Roche/454 to keep us working on the right things. Stay tuned and come see us at Bio-IT world Oct 6-8!

In what seems to be a theme this week in Online Storage Optimization–we’re all about what’s next, and what’s “out there” in the stratosphere. We’re also not beneath making as many pop cultural sci fi references as we can. That’s just the way we roll.
