Lab Home | Phone | Search
Center for Nonlinear Studies  Center for Nonlinear Studies
 Home 
 People 
 Current 
 Affiliates 
 Visitors 
 Students 
 Research 
 ICAM-LANL 
 Publications 
 Conferences 
 Workshops 
 Sponsorship 
 Talks 
 Colloquia 
 Colloquia Archive 
 Seminars 
 Postdoc Seminars Archive 
 Quantum Lunch 
 Quantum Lunch Archive 
 CMS Colloquia 
 Q-Mat Seminars 
 Q-Mat Seminars Archive 
 P/T Colloquia 
 Archive 
 Kac Lectures 
 Kac Fellows 
 Dist. Quant. Lecture 
 Ulam Scholar 
 Colloquia 
 
 Jobs 
 Postdocs 
 CNLS Fellowship Application 
 Students 
 Student Program 
 Visitors 
 Description 
 Past Visitors 
 Services 
 General 
 
 History of CNLS 
 
 Maps, Directions 
 CNLS Office 
 T-Division 
 LANL 
 
Wednesday, August 17, 2016
3:00 PM - 4:00 PM
CNLS Conference Room (TA-3, Bldg 1690)

Seminar

Model-driven Automatic GPU Compilation and Why You Want to Run MPI On Your GPU

Torstten Hoefler
ETH Zurich

Auto-parallelization of programs that have not been developed with parallelism in mind is one of the holy grails in computer science. It allows to stick to a simple programming model but requires understanding the source code's data flow to automatically distribute the data, parallelize the computations, and infer synchronizations where necessary. We introduce a technique to analytically count iterations of a large class of loops. This illustrates how data-flow can be modeled in practice. Then, we discuss our new LLVM-based research compiler Polly-ACC that enables automatic compilation to accelerator devices such as GPUs. Unfortunately, its applicability is limited to codes for which the iteration space and all accesses can be described as affine functions. In the second part of the talk, we will discuss dCUDA, a way to express parallel codes in MPI-RMA, a well-known communication library, to map them automatically to GPU clusters. The dCUDA approach enables simple and portable programming across heterogeneous devices due to programmer-specified locality. Furthermore, dCUDA enables hardware-supported overlap of computation and communication and is applicable to next-generation technologies such as NVLINK. We will demonstrate encouraging initial results and show limitations of current devices in order to start a discussion.

Host: Christoph Junghans