SCHEDULE: NOV 15-20, 2015

Big Omics Data Experience

SESSION: State of the Practice: Infrastructure Management


EVENT TAG(S): State of Practice, Facilities, Workflows, Data-Intensive Computing, Scientific Computing

TIME: 11:00AM - 11:30AM

SESSION CHAIR(S): Naoya Maruyama

AUTHOR(S):Patricia Kovatch, Anthony Costa, Zachary Giles, Eugene Fluder, Hyung Min Cho, Svetlana Mazurkova



As personalized medicine becomes more integrated into healthcare, the rate at which humans are being sequenced is rising quickly with a concomitant acceleration in compute and data requirements. To achieve the most effective solution for genomic workloads without re-architecting the industry-standard software, we performed a rigorous analysis of usage statistics, benchmarks and available technologies to design a system for maximum throughput. We share our experiences designing a system optimized for Genome Analysis ToolKit pipelines, based on an evaluation of compute, workload and I/O characteristics. The characteristics of genomic-based workloads are vastly different than traditional HPC workloads requiring radically different configurations of the scheduler and I/O to achieve scalability. By understanding how our researchers and clinicians work, we were able to employ new techniques to not only speed their workflow yielding improved and repeatable performance, but we were able to make efficient use of storage and compute nodes.

Chair/Author Details:

Naoya Maruyama (Chair) - RIKEN|

Patricia Kovatch - Icahn School of Medicine at Mount Sinai

Anthony Costa - Icahn School of Medicine at Mount Sinai

Zachary Giles - Icahn School of Medicine at Mount Sinai

Eugene Fluder - Icahn School of Medicine at Mount Sinai

Hyung Min Cho - Icahn School of Medicine at Mount Sinai

Svetlana Mazurkova - Icahn School of Medicine at Mount Sinai

