- Home
- Register
- Attend
- Conference Program
- SC15 Schedule
- Technical Program
- Awards
- Students@SC
- Research with SCinet
- HPC Impact Showcase
- HPC Matters Plenary
- Keynote Address
- Support SC
- SC15 Archive
- Exhibits
- Media
- SCinet
- HPC Matters
SCHEDULE: NOV 15-20, 2015
When viewing the Technical Program schedule, on the far righthand side is a column labeled "PLANNER." Use this planner to build your own schedule. Once you select an event and want to add it to your personal schedule, just click on the calendar icon of your choice (outlook calendar, ical calendar or google calendar) and that event will be stored there. As you select events in this manner, you will have your own schedule to guide you through the week.
VOCL-FT: Introducing Techniques for Efficient Soft Error Coprocessor Recovery
SESSION: Resilience
EVENT TYPE: Papers
EVENT TAG(S): Power, Performance, System Software, Resiliency
TIME: 2:00PM - 2:30PM
SESSION CHAIR(S): Frank Mueller
AUTHOR(S):Antonio J. Peña, Wesley Bland, Pavan Balaji
ROOM:19AB
ABSTRACT:
Popular accelerator programming models rely on offloading computation operations and their corresponding data transfers to the coprocessors, leveraging synchronization points where needed. In this paper we identify and explore how such a programming model enables optimization opportunities not utilized in traditional checkpoint/restart systems, and we analyze them as the building blocks for an efficient fault-tolerant system for accelerators. Although we leverage our techniques to protect from detected but uncorrected ECC errors in the device memory in OpenCL-accelerated applications, coprocessor reliability solutions based on different error detectors and similar API semantics can directly adopt the techniques we propose. Adding error detection and protection involves a tradeoff between runtime overhead and recovery time. Although optimal configurations depend on the particular application, the length of the run, the error rate, and the temporary storage speed, our test cases reveal a good balance with significantly reduced runtime overheads.
Chair/Author Details:
Frank Mueller (Chair) - North Carolina State University|
Antonio J. Peña - Argonne National Laboratory
Wesley Bland - Argonne National Laboratory
Pavan Balaji - Argonne National Laboratory
Click here to download .ics calendar file
Click here to download .vcs calendar file
Click here to add event to your Google Calendar