- Home
- Register
- Attend
- Conference Program
- SC15 Schedule
- Technical Program
- Awards
- Students@SC
- Research with SCinet
- HPC Impact Showcase
- HPC Matters Plenary
- Keynote Address
- Support SC
- SC15 Archive
- Exhibits
- Media
- SCinet
- HPC Matters
SCHEDULE: NOV 15-20, 2015
When viewing the Technical Program schedule, on the far righthand side is a column labeled "PLANNER." Use this planner to build your own schedule. Once you select an event and want to add it to your personal schedule, just click on the calendar icon of your choice (outlook calendar, ical calendar or google calendar) and that event will be stored there. As you select events in this manner, you will have your own schedule to guide you through the week.
Exploiting Asynchrony from Exact Forward Recovery for DUE in Iterative Solvers
SESSION: Linear Algebra
EVENT TYPE: Papers, Best Paper Finalists
EVENT TAG(S): Algorithms, Scientific Computing, Solvers
TIME: 3:30PM - 4:00PM
SESSION CHAIR(S): Gabriel Tanase
AUTHOR(S):Luc Jaulmes, Marc Casas, Miquel Moretó, Eduard Ayguadé, Jesús Labarta, Mateo Valero
ROOM:18AB
ABSTRACT:
This paper presents a method to protect iterative solvers from Detected and Uncorrected Errors (DUE) relying on error detection techniques already available in commodity hardware. Detection operates at the memory page level, which enables the use of simple algorithmic redundancies to correct errors. Such redundancies would be inapplicable under coarse grain error detection, but they become very powerful when the hardware is able to precisely detect errors.
Relations straightforwardly extracted from the solver allow to recover lost data exactly. This method is free of the overheads of backwards recoveries like checkpointing, and does not compromise mathematical convergence properties of the solver as restarting would do. We apply this recovery to three widely used Krylov subspace methods: CG, GMRES and BiCGStab.
We implement and evaluate our resilience techniques on CG, showing very low overheads compared to state-of-the-art solutions. Overlapping recoveries with normal work of the algorithm decreases overheads further.
Chair/Author Details:
Gabriel Tanase (Chair) - IBM Corporation|
Luc Jaulmes - Barcelona Supercomputing Center
Marc Casas - Barcelona Supercomputing Center
Miquel Moretó - Barcelona Supercomputing Center
Eduard Ayguadé - Barcelona Supercomputing Center
Jesús Labarta - Barcelona Supercomputing Center
Mateo Valero - Barcelona Supercomputing Center
Click here to download .ics calendar file
Click here to download .vcs calendar file
Click here to add event to your Google Calendar