Lab Home | Phone | Search | ||||||||
|
||||||||
Moore’s law has delivered an immense performance improvement for scientific computing over several decades. However, as we approach the limits of semiconductor technology, we face numerous challenges including performance and energy efficiency or maintaining portable efficient software for different compute architectures. Structured performance engineering (PE) is an important tool to address these challenges. The talk will present a PE approach using white-box performance models to guide code optimization and parallelization approaches on the compute node level. Its basic idea and potentials will be demonstrated using a code porting and optimization project for the High- Performance Computational Mechanics code Alya which is one of the two CFD codes in the Unified European Benchmark Suite. In an iterative process an unified code base for CPUs and GPUs has been developed for the assembly of the right-hand term in the incompressible flow module. In comparison to the original version, the final implementation is up to 50x faster and achieves a sustained performance level of 2.5 TF/s on a NVIDIA A100. Bio: Gerhard Wellein is a Professor for High Performance Computing at the Department for Computer Science of the Friedrich-Alexander-Universität Erlangen-Nürnberg (FAU) and holds a PhD in theoretical physics from the University of Bayreuth. Since 2024, he has been a Visiting Professor for HPC at the Delft Institute of Applied Mathematics at the Delft University of Technology. He is the director of the Erlangen National Center for High Performance Computing (NHR@FAU) and a member of the board of directors of the German NHR-Alliance which coordinates the national HPC Tier-2 infrastructure at German universities. Gerhard Wellein has more than twenty years of experience in teaching HPC techniques to students and scientists. His research interests focus on performance modeling and performance engineering, architecture-specific code optimization, novel parallelization approaches, hardware-efficient building blocks for sparse linear algebra, and stencil solvers. He has been conducting and leading numerous national and international HPC research projects and has authored or co-authored more than 100 peer-reviewed publications. Note: selected past CNLS & IC-APT Colloquium presentations are available on the IC-APT Wiki: https://ic-wiki.lanl.gov/Home Host: Anya Matsekh (IC-APT) |