PEAK: Parallel EM Algorithm using Kd-tree
Student: Laleh Aghababaie Beni (University of California, Irvine)
Supervisor: Aparna Chandramowlishwaran (University of California, Irvine)
Abstract: The data mining community voted Expectation Maximization (EM) algorithm as one of the top ten algorithms having the most impact on data mining research. EM is a popular iterative algorithm for learning mixture models with applications in various areas from computer vision, astronomy, to signal processing. We present a new high-performance parallel algorithm on multicore systems that impacts all stages of EM. We use tree data structures and user-controlled approximations to reduce the asymptotic runtime complexity of EM with significant performance improvements. PEAK utilizes the same tree and algorithmic framework for all the stages of EM.
Experimental results show that our parallel algorithm significantly outperforms the state-of-the-art algorithms and libraries on all dataset configurations (varying number of points, dimensionality of the dataset, and number of mixtures). Looking forward, we identify approaches to extend this idea to a larger scale of similar problems.
Two-page extended abstract: pdf