Lab Home | Phone | Search
Center for Nonlinear Studies  Center for Nonlinear Studies
 Colloquia Archive 
 Postdoc Seminars Archive 
 Quantum Lunch 
 CMS Colloquia 
 Q-Mat Seminars 
 Q-Mat Seminars Archive 
 Kac Lectures 
 Dist. Quant. Lecture 
 Ulam Scholar 
 Summer Research 
 Past Visitors 
 History of CNLS 
 Maps, Directions 
 CNLS Office 
Tuesday, July 07, 2009
3:30 PM - 5:00 PM
CNLS Conference Room (TA-3, Bldg 1690)


Impact of Non-optimal Checkpoint Intervals on Application Efficiency in Cluster Computing

William Jones
Coastal Carolina University

Under certain conditions, an application's optimal checkpoint interval can be determined as a function of the dump time and application mean time to interrupt (AMTTI). In practice, an estimate of AMTTI for each application is therefore necessary to assign an optimal checkpoint interval. This estimate is based on a number of job and system parameters that can be difficult to determine and may even change over time. Errors in estimating AMTTI lead to errors in assigning optimal checkpoint intervals. This in turn impacts average application efficiency. By making use of BeoSim, a discrete-event driven multi-cluster simulator, we study the impact of non-optimal checkpoint intervals on overall application efficiency. Using LANL's Pink cluster and workload to parameterize the simulator, we find that dramatically overestimating the AMTTI has a fairly minor impact on application efficiency. The first two-thirds of the talk will introduce BeoSim and this recent study of non-optimal checkpoint intervals; while the latter third will detail some previous work regarding the use of a checkpoint-migration scheme to mitigate network over-subscription in a grid environment.

Dr. Will Jones is an assistant professor of Computer Science at Coastal Carolina University in Myrtle Beach, South Carolina. He previously held the position of assistant professor of Electrical Engineering at the United States Naval Academy for two years prior to accepting a position at CCU. His research interests include parallel job scheduling and resilience in computational clusters. He earned a Ph.D. in Computer Engineering from Clemson University in 2005.

Host: Nathan DeBardeleben