DescriptionOperating systems have historically had to manage only a single type of memory device. The imminent availability of heterogeneous memory devices based on emerging memory technologies confronts the classic single memory model and opens a new spectrum of possibilities for memory management. Transparent data movement based on access patterns of applications is a desired feature to hide the complexity of memory management to end users. However, capturing memory access patterns of an application at runtime comes at a cost, which is particularly challenging for large scale parallel applications that may be sensitive to system noise.In this work, we focus on the access pattern profiling. We study the feasibility of using Intel’s Processor Event Based Sampling (PEBS) feature to record memory accesses by sampling at runtime and study the overhead at scale. We have implemented a custom PEBS driver in the IHK/McKernel lightweight multi-kernel operating system, one of whose advantages is minimal system interference due to the lightweight kernel’s simple design compared to other OS kernels such as Linux. We present the PEBS overhead of a set of scientific applications and show the access patterns identified in noise sensitive HPC applications. Our results show that clear access patterns can be captured with 10% overhead in the worst case when running on up to 128k CPU cores (2,048 Intel Xeon Phi Knights Landing nodes). We conclude that online memory access profiling using PEBS at large scale is promising for memory management in heterogeneous memory environments.