<span class="var-sub_title">Binarized ImageNet Inference in 29us</span> SC18 Proceedings

The International Conference for High Performance Computing, Networking, Storage, and Analysis

Binarized ImageNet Inference in 29us


Authors: Tong Geng (Boston University, Pacific Northwest National Laboratory), Ang Li (Pacific Northwest National Laboratory), Tianqi Wang (Boston University), Shuaiwen Leon Song (Pacific Northwest National Laboratory), Martin Herbordt (Boston University)

Abstract: We propose a single-FPGA-based accelerator for ultra-low-latency inference of ImageNet in this work. The design can complete the inference of Binarized AlexNet within 29us with accuracy comparable to other BNN implementations. We achieve this performance with the following contributions: 1. We completely remove floating-point from NL through layer fusion. 2. By using model parallelism rather than data parallelism, we can simultaneously configure all layers and the control flow graphs. Also, the design is flexible enough to achieve nearly perfect load balancing, leading to extremely high resource utilization. 3. All convolution layers are fused and processed in parallel through inter-layer pipelining. Therefore, in case the pipeline is full, latency is just the delay of a single convolution layer plus the FC layers. Note that the dependency pattern of the FC layer prevents it from being integrated into the current pipeline.

Best Poster Finalist (BP): no

Poster: pdf
Poster summary: PDF


Back to Poster Archive Listing