sponsored byACMIEEE The International Conference for High Performance 
Computing, Networking, Storage and Analysis
FacebookTwitterGoogle PlusLinkedInYouTubeFlickr

SCHEDULE: NOV 15-20, 2015

When viewing the Technical Program schedule, on the far righthand side is a column labeled "PLANNER." Use this planner to build your own schedule. Once you select an event and want to add it to your personal schedule, just click on the calendar icon of your choice (outlook calendar, ical calendar or google calendar) and that event will be stored there. As you select events in this manner, you will have your own schedule to guide you through the week.

VOCL-FT: Introducing Techniques for Efficient Soft Error Coprocessor Recovery

SESSION: Resilience


EVENT TAG(S): Power, Performance, System Software, Resiliency

TIME: 2:00PM - 2:30PM

SESSION CHAIR(S): Frank Mueller

AUTHOR(S):Antonio J. Peña, Wesley Bland, Pavan Balaji



Popular accelerator programming models rely on offloading computation operations and their corresponding data transfers to the coprocessors, leveraging synchronization points where needed. In this paper we identify and explore how such a programming model enables optimization opportunities not utilized in traditional checkpoint/restart systems, and we analyze them as the building blocks for an efficient fault-tolerant system for accelerators. Although we leverage our techniques to protect from detected but uncorrected ECC errors in the device memory in OpenCL-accelerated applications, coprocessor reliability solutions based on different error detectors and similar API semantics can directly adopt the techniques we propose. Adding error detection and protection involves a tradeoff between runtime overhead and recovery time. Although optimal configurations depend on the particular application, the length of the run, the error rate, and the temporary storage speed, our test cases reveal a good balance with significantly reduced runtime overheads.

Chair/Author Details:

Frank Mueller (Chair) - North Carolina State University|

Antonio J. Peña - Argonne National Laboratory

Wesley Bland - Argonne National Laboratory

Pavan Balaji - Argonne National Laboratory

Add to iCal  Click here to download .ics calendar file

Add to Outlook  Click here to download .vcs calendar file

Add to Google Calendarss  Click here to add event to your Google Calendar

Paper provided by the ACM Digital Library

Paper also available from IEEE Computer Society