SC15 Austin, TX

Parallel Execution of Workflows Driven by a Distributed Database Management System

Authors: Renan Souza (Federal University of Rio de Janeiro), VĂ­tor Silva (Federal University of Rio de Janeiro), Daniel de Oliveira (Fluminense Federal University), Patrick Valduriez (French Institute for Research in Computer Science and Automation), Alexandre A. B. Lima (Federal University of Rio de Janeiro), Marta Mattoso (Federal University of Rio de Janeiro)

Abstract: Scientific Workflow Management Systems (SWfMS) that execute large-scale simulations need to manage many task computing in high performance environments. With the scale of tasks and processing cores to be managed, SWfMS require efficient distributed data structures to manage data related to scheduling, data movement and provenance data gathering. Although related systems store these data in multiple log files, some existing approaches store them using a Database Management System (DBMS) at runtime, which provides powerful analytical capabilities, such as execution monitoring, anticipated result analyses, and user steering. Despite these advantages, approaches relying on a centralized DBMS introduce a point of contention, jeopardizing performance in large-scale executions. In this paper, we propose an architecture relying on a distributed DBMS to both manage the parallel execution of tasks and store those data at runtime. Our experiments show an efficiency of over 80% on 1,000 cores without abdicating the analytical capabilities at runtime.

Poster: pdf
Two-page extended abstract: pdf

Poster Index