Lab Home | Phone | Search | ||||||||
|
||||||||
Because many large scientific algorithms can not efficiently use a memory cache, large scale engineering and science calculations have experienced little in hardware performance improvements over the last decade. However, with the advent of programmable graphics processors four years ago, and a C++ graphics programming paradigm (CUDA) roughly a year ago, it is now possible make up for that deficit and obtain an average of ten to twenty times the calculation throughput of the CPU when solving large scientific problems.
The peculiarities of graphics processor (GPU) hardware and how
it impacts scientific algorithm structure and performance is discussed.
Examples are presented from a range of application domains including:
partial differential equation solution, large sequence matching
(bio-informatics), and graph traversal and manipulation.
The challenges and possibilities of using many GPUs in an
MPI-cluster environment are also presented along with the
performance of a GPU-based desktop supercomputer with 1920
processing cores in a single PC.
|