DescriptionThis paper reports our efforts on performing 50-m resolution earthquake simulation of the Wenchuan Earthquake (Ms 8.0, China) on Sunway TaihuLight. To accurately capture the surface topography, we adopt a curvilinear grid finite-difference method with a traction image free surface implementation and redesign the algorithm to reduce memory access costs for heterogeneous many-core architectures. We then derive a performance model of our algorithm to guide and drive the further optimization and tuning of various parameters using a genetic algorithm. A data layout transformation is also proposed to improve the direct memory access (DMA) efficiency further. Our efforts improve the simulation efficiency from 0.05% to 7.6%, with a sustained performance of 9.07 Pflops using the entire machine of the Sunway TaihuLight (over 10 million cores), and a large-scale simulation of the Wenchuan earthquake with accurate surface topography and improved coda wave effects.