Presentation

· Presenters · Organizations · Search Program · Flagged · Happening Now · Maps · Notifications

Poster

: Binarized ImageNet Inference in 29us

SessionResearch Posters

Authors

Event Type

Poster

Registration Categories

TimeThursday, November 15th8:30am - 5pm

LocationC2/3/4 Ballroom

DescriptionWe propose a single-FPGA-based accelerator for ultra-low-latency inference of ImageNet in this work. The design can complete the inference of Binarized AlexNet within 29us with accuracy comparable to other BNN implementations. We achieve this performance with the following contributions: 1. We completely remove floating-point from NL through layer fusion. 2. By using model parallelism rather than data parallelism, we can simultaneously configure all layers and the control flow graphs. Also, the design is flexible enough to achieve nearly perfect load balancing, leading to extremely high resource utilization. 3. All convolution layers are fused and processed in parallel through inter-layer pipelining. Therefore, in case the pipeline is full, latency is just the delay of a single convolution layer plus the FC layers. Note that the dependency pattern of the FC layer prevents it from being integrated into the current pipeline.

Program November 11–16, 2018

Exhibits November 12–15, 2018

KAY BAILEY HUTCHISON CONVENTION CENTER DALLAS

The International Conference for High Performance
Computing, Networking, Storage, and Analysis

Presentation