Lab Home | Phone | Search | ||||||||
|
||||||||
The Reconfigurable Computing Cluster project centers around an experimental parallel computing platform that consists of exclusively of Platform FPGA nodes. Each of these highly configurable devices are capable of hosting Linux/OpenMPI, application-specific hardware accelerators, and an integrated on-chip/off-chip network on a single, power-efficient chip. Current work is investigating the feasibility of scaling this model to tens-of-thousands of nodes (in terms of power, size, and speed) and the benefits of various MPI compute- and communication-assists implemented in hardware. Recently, it was realized that this also is an excellent testbed for experiments in resiliency. Specifically, hardware cores can be readily introduced that: (1) perturb the system in various, reproducible ways that (mirror the undesirable behavior found in very large HPC machines today) and (2) observe system behavior without disturbing the application running at wall clock speed (i.e., not in simulation). The first half of this talk will introduce Spirit, a 64-node FPGA cluster that has been constructed at the University of North Carolina at Charlotte, and the HPC applications that run on it. The second half of the talk will focus on our nascent experiments in resiliency. Host: Nathan DeBardeleben |