Forecasting Storms in Parallel File Systems
Student: Ryan McKenna (University of Delaware)
Supervisor: Michela Taufer (University of Delaware)
Abstract: Large-scale scientific applications rely on the parallel file system (PFS) to store checkpoints and outputs. When the PFS is over-utilized, applications can slow down significantly as they compete for scarce bandwidth. To prevent this type of “filesystem storm”, schedulers must avoid running many IO-intensive jobs at the same time. To effectively implement such a strategy, schedulers must predict the IO workload and runtime of future jobs. In this poster, we explore the use of machine learning methods to forecast file system usage and to predict the runtimes of queued jobs using historical data. We show that our runtime predictions achieve over 80% accuracy to within 10 minutes of actual runtime.
Two-page extended abstract: pdf