DescriptionWe present our recent optimizations of the ultra-soft pseudo-potential (USPP) code path of the ab inito molecular dynamics program CPMD (www.cpmd.org). Following the internal instrumentation of CPMD, all relevant USPP routines have been revised to fully support hybrid MPI+OpenMP parallelization. For two time-critical routines, namely the multiple distributed 3D FFTs of the electronic states and a key distributed matrix-matrix multiplication, we have implemented hybrid parallel algorithms with overlapping computation and communication. The achievements in performance and scalability are demonstrated on a small reference system of 128 water molecules and further systems of increasing size. Performance evaluation shows gains of up to one order of magnitude and around 50% peak performance for simulation systems readily used in production.