Lab Home | Phone | Search
Center for Nonlinear Studies  Center for Nonlinear Studies
 Home 
 People 
 Current 
 Executive Committee 
 Postdocs 
 Visitors 
 Students 
 Research 
 Publications 
 Conferences 
 Workshops 
 Sponsorship 
 Talks 
 Seminars 
 Postdoc Seminars Archive 
 Quantum Lunch 
 Quantum Lunch Archive 
 P/T Colloquia 
 Archive 
 Ulam Scholar 
 
 Postdoc Nominations 
 Student Requests 
 Student Program 
 Visitor Requests 
 Description 
 Past Visitors 
 Services 
 General 
 
 History of CNLS 
 
 Maps, Directions 
 CNLS Office 
 T-Division 
 LANL 
 
Tuesday, July 18, 2017
1:00 PM - 2:00 PM
CNLS Conference Room (TA-3, Bldg 1690)

Seminar

PortHadoop-R: Support the Merging of HPC and Cloud

Xian-He Sun
Illinois Institute of Technology

High Performance Computing (HPC) is becoming data intensive. In the meantime, big data applications are requiring more and more computing power. The merging of HPC and big data analytics is inevitable. However, the conventional HPC ecosystem, represented by MPI and Parallel File Systems (PFS) environments, and the newly emerged Cloud/big data ecosystem, represented by MapReduce/Spark and Hadoop File Systems (HDFS) environments, are designed for different applications and with different design principles. They do not work together naturally. Even worse, by the CAP theory, any of the two ecosystems cannot be extended to have all the merits of the other. In other words, these two ecosystems will co-exist. The best we can have is a merged system which can provide the functionality and merits of both ecosystems. In this study, we provide the PortHadoop-R solution to support the merging of HPC and Cloud at the file level. PortHadoop-R allows data to be read directly from PFS to the memory of Hadoop nodes and integrates the data transfer with R data analysis and visualization. PortHadoop-R is carefully optimized to utilize the merits of PFS and MapReduce to achieve concurrent data transfer and latency hiding. PortHadoop-R is tested on NASA climate modeling applications. Experimental results show PortHadoop-R delivered a 15x speedup. Even without the 15x speedup of PortHadoop-R, the MapReduce environment is already significantly faster than MPI clusters on processing climate data. PortHadoop-R further demonstrates the potential of the merging of HPC and Cloud.

Host: H.B. Chen, 505-665-3591, hbchen@lanl.gov