sponsored byACMIEEE The International Conference for High Performance 
Computing, Networking, Storage and Analysis
FacebookTwitterGoogle PlusLinkedInYouTubeFlickr

SCHEDULE: NOV 15-20, 2015

When viewing the Technical Program schedule, on the far righthand side is a column labeled "PLANNER." Use this planner to build your own schedule. Once you select an event and want to add it to your personal schedule, just click on the calendar icon of your choice (outlook calendar, ical calendar or google calendar) and that event will be stored there. As you select events in this manner, you will have your own schedule to guide you through the week.

A Practical Approach to Reconciling Availability, Performance, and Capacity in Provisioning Extreme-Scale Storage Systems

SESSION: State of the Practice: Measuring Systems

EVENT TYPE: Papers

EVENT TAG(S): Storage, State of Practice, Facilities, Performance

TIME: 2:30PM - 3:00PM

SESSION CHAIR(S): Neil Chue Hong

AUTHOR(S):Lipeng Wan, Feiyi Wang, Sarp Oral, Devesh Tiwari, Sudharshan S. Vazhkudai, Qing Cao

ROOM:18AB

ABSTRACT:

The increasing data demands from high-performance computing applications significantly accelerate the capacity, capability, and reliability requirements of storage systems. As systems scale, component failures and repair times increase, with significant impact to data availability. A wide array of decision points must be balanced in designing such systems.

We propose a systematic approach that balances and optimizes both initial and continuous spare provisioning based on a detailed investigation of the anatomy and field failure data analysis of extreme-scale storage systems. We consider both the component failure characteristics, the cost and the impact at the system level simultaneously. We build a provisioning tool to evaluate different provisioning schemes and the results demonstrate that our optimized provisioning can reduce the unavailable duration by as much as 52% under a fixed budget. We also observe that non-disk components have much higher failure rates than disks, therefore warrant careful considerations in the overall provisioning process.

Chair/Author Details:

Neil Chue Hong (Chair) - University of Edinburgh|

Lipeng Wan - University of Tennessee, Knoxville

Feiyi Wang - Oak Ridge National Laboratory

Sarp Oral - Oak Ridge National Laboratory

Devesh Tiwari - Oak Ridge National Laboratory

Sudharshan S. Vazhkudai - Oak Ridge National Laboratory

Qing Cao - University of Tennessee, Knoxville

Add to iCal  Click here to download .ics calendar file

Add to Outlook  Click here to download .vcs calendar file

Add to Google Calendarss  Click here to add event to your Google Calendar


Paper provided by the ACM Digital Library

Paper also available from IEEE Computer Society