Training Speech Recognition Models on HPC Infrastructure
TimeSunday, November 11th5pm - 5:30pm
DescriptionAutomatic speech recognition is used extensively in speech interfaces and spoken dialogue systems. To accelerate the development of new speech recognition models and techniques, developers at Mozilla have open sourced a deep learning based Speech-To-Text engine known as project DeepSpeech based on Baidu’s DeepSpeech research. In order to make model training time quicker on CPUs for DeepSpeech distributed training, we have developed optimizations on the Mozilla DeepSpeech code to scale the model training to a large number of Intel® CPU system, including Horovod integration into DeepSpeech. We have also implemented a novel dataset partitioning scheme to mitigate compute imbalance across multiple nodes of an HPC cluster. We demonstrate that we are able to train the DeepSpeech model using the LibriSpeech clean dataset to its state-of-the-art accuracy in 6.45Hrs on 16-Node Intel® Xeon® based HPC cluster.