Everyone knows data is growing at exponential rates. In fact, the numbers can be mind-numbing. That’s certainly the case when it comes to genomic data where 40,000PB of storage each year will be needed by 2025. Understanding, managing and storing this massive amount of data was the topic at our SNIA Cloud Storage Technologies Initiative webcast “Moving Genomics to the Cloud: Compute and Storage Considerations.” If you missed the live presentation, it’s available on-demand along with presentation slides.
Our live audience asked many interesting questions during the webcast, but we did not have time to answer them all. As promised, our experts, Michael McManus, Torben Kling Petersen and Christopher Davidson have answered them all here.
Q. Human genomes differ only by 1% or so, there’s an immediate 100x improvement in terms of data compression, 2743EB could become 27430PB, that’s 2.743M HDDs of 10TB each. We have ~200 countries for the 7.8B people, and each country could have 10 sequencing centers on average, each center would need a mere 1.4K HDDs, is there really a big challenge here?
A. The problem is not that simple unfortunately. The location of genetic differences and the size of the genetic differences vary a lot across people. Still, there are compression methods like CRAM and PetaGene that can save a lot of space. Also consider all of the sequencing for rare disease, cancer, single cell sequencing, etc. plus sequencing for agricultural products.
Q. What’s the best compression ratio for human genome data? Read More