In Situ Performance Analysis of Event-driven Simulations to Support the Codesign of Extreme-Scale Systems
TimeSunday, November 11th2:42pm - 2:45pm
DescriptionParallel discrete event simulation (PDES) is a productive and cost-effective tool in exploring the design space of high performance computing (HPC) system architectures. In discrete event simulation, the modeled entities (e.g., network router) interact through the exchange of timestamped messages. For parallel simulations, the synchronization protocol ensures causal correctness of the simulation. In PDES, synchronization protocols are classified as conservative or optimistic. Conservative methods ensure that events are processed only when it is safe to do so, while optimistic methods allow for speculative processing of events and provide out-of-order event detection and recovery mechanisms. Because optimistic protocols can better exploit the parallelism inherent in models, optimistic simulations tend to be more scalable than conservative simulations. However, optimizing optimistic simulations to minimize the time spent performing unproductive work (i.e., rolling back the simulation state to fix causality errors) is not a trivial task. The factors that affect optimistic performance exist at multiple levels of the simulation, from the physical hardware running the simulation, communication of MPI ranks, as well as characteristics of the simulated model itself. The interplay of these factors is difficult to understand, making it difficult for optimistic PDES to be used efficiently, especially by simulation users, such as network architects.