BEGIN:VCALENDAR PRODID:-//Microsoft Corporation//Outlook MIMEDIR//EN VERSION:2.0 BEGIN:VEVENT DTSTART:20151116T193000Z DTEND:20151116T230000Z LOCATION:14 DESCRIPTION;ENCODING=QUOTED-PRINTABLE:ABSTRACT: Scientific developers face challenges adapting software to leverage increasingly heterogeneous architectures. Many systems feature nodes that couple multi-core processors with GPU-based computational accelerators, like the NVIDIA Kepler, or many-core coprocessors, like the Intel Xeon Phi. In order to effectively utilize these systems, application developers need to demonstrate an extremely high level of parallelism while also coping with the complexities of multiple programming paradigms, including MPI, OpenMP, CUDA, and OpenACC. =0A =0A =0AThis tutorial provides exploration of parallel debugging and optimization focused on techniques that can be used with accelerators and coprocessors. We cover debugging techniques such as grouping, advanced breakpoints and barriers, and MPI message queue graphing. We discuss optimization techniques like profiling, tracing, and cache memory optimization with tools such as Tau, Vtune and the NVIDIA Visual Profiler. Participants will spend approximately half the time doing hands-on GPU and Intel Xeon Phi debugging and profiling. Additionally, up-to-date capabilities in accelerator and coprocessing computing (e.g. OpenMP 4.0 device constructs, CUDA Unified Memory, CUDA core file debugging) and their peculiarities with respect to error finding and optimization will be discussed. For the hands-on sessions SSH and NX clients have to be installed in the attendees laptops. SUMMARY:Debugging and Performance Tools for MPI and OpenMP 4.0 Applications for CPU and Accelerators/Coprocessors PRIORITY:3 END:VEVENT END:VCALENDAR