Authors:
Abstract: This poster presents a fast inference computation library for ARM architecture named as CNNForward. CNNForward is trying to improve the efficiency of inference computation for convolutional neural networks on ARM-based multi-core and many-core architectures using both mathematical formula reconstruction/simplification and in-depth NEON instruction optimization. Experimental results reveal that, forward computation for VGG-16 on a server with 64 ARM A72 cores, CNNForward can scale up to 32 cores with an parallel efficiency of 33%, and achieve 35.4x, 8.7x and 10.6x speedup over Caffe+OpenBlas, Caffe2+Eigen and Caffe2+NNPACK, respectively.
Best Poster Finalist (BP): no
Poster: pdf
Poster summary: PDF
Back to Poster Archive Listing