Lab Home | Phone | Search | ||||||||
|
||||||||
In this talk, we will briefly introduce the astrophysics application Octo-Tiger, which simulates the evolution of star systems basedon the fast multipole method on adaptive Octrees. This application is the most advanced HPX application with CUDA and AMD acceleration card support. In the remaining talk, we will talk about the pure CUDA integration and the recently added Kokkos integration to support portability for heterogeneous acceleration cards, especially AMD GPUs. Here, we showcase some scaling results on ORNL’s Summit and SCSC’s PiZ Daint. We show recent performance measurements on Rikken’s Fugaku. Another aspect is performance profiling in asynchronous applications. Here, we recently added CUDA profiling to HPX’s performance framework, APEX. Thus, we can collect distributed combined CPU and GPU profiling to analyze the performance of HPX and Octo-Tiger. Here, we show runs on Summit and Piz Daint to compare the performance on different architectures and GPUs. The final aspect is the overhead introduced by the performance framework. Here, we study the overhead introduced by the CPU-only profiling and the much more expensive overhead added by the CUDA profiling. Host: Christoph Junghans |