sponsored byACMIEEE The International Conference for High Performance 
Computing, Networking, Storage and Analysis
FacebookTwitterGoogle PlusLinkedInYouTubeFlickr

SCHEDULE: NOV 15-20, 2015

When viewing the Technical Program schedule, on the far righthand side is a column labeled "PLANNER." Use this planner to build your own schedule. Once you select an event and want to add it to your personal schedule, just click on the calendar icon of your choice (outlook calendar, ical calendar or google calendar) and that event will be stored there. As you select events in this manner, you will have your own schedule to guide you through the week.

Exploiting Asynchrony from Exact Forward Recovery for DUE in Iterative Solvers

SESSION: Linear Algebra

EVENT TYPE: Papers, Best Paper Finalists

EVENT TAG(S): Algorithms, Scientific Computing, Solvers

TIME: 3:30PM - 4:00PM

SESSION CHAIR(S): Gabriel Tanase

AUTHOR(S):Luc Jaulmes, Marc Casas, Miquel Moretó, Eduard Ayguadé, Jesús Labarta, Mateo Valero

ROOM:18AB

ABSTRACT:

This paper presents a method to protect iterative solvers from Detected and Uncorrected Errors (DUE) relying on error detection techniques already available in commodity hardware. Detection operates at the memory page level, which enables the use of simple algorithmic redundancies to correct errors. Such redundancies would be inapplicable under coarse grain error detection, but they become very powerful when the hardware is able to precisely detect errors.

Relations straightforwardly extracted from the solver allow to recover lost data exactly. This method is free of the overheads of backwards recoveries like checkpointing, and does not compromise mathematical convergence properties of the solver as restarting would do. We apply this recovery to three widely used Krylov subspace methods: CG, GMRES and BiCGStab.

We implement and evaluate our resilience techniques on CG, showing very low overheads compared to state-of-the-art solutions. Overlapping recoveries with normal work of the algorithm decreases overheads further.

Chair/Author Details:

Gabriel Tanase (Chair) - IBM Corporation|

Luc Jaulmes - Barcelona Supercomputing Center

Marc Casas - Barcelona Supercomputing Center

Miquel Moretó - Barcelona Supercomputing Center

Eduard Ayguadé - Barcelona Supercomputing Center

Jesús Labarta - Barcelona Supercomputing Center

Mateo Valero - Barcelona Supercomputing Center

Add to iCal  Click here to download .ics calendar file

Add to Outlook  Click here to download .vcs calendar file

Add to Google Calendarss  Click here to add event to your Google Calendar


Paper provided by the ACM Digital Library

Paper also available from IEEE Computer Society