Introduction

This page presents a brief overview of all conducted experiments as a cross product of hardware platform (including deployment settings), and topology (including quantization and pruning information) whereby each Machine Learning task has its own table.

Tables

Each Machine Learning task has its own table. In each table, within the rows, we show the type of hardware platforms that we used for this task (for example FPGA or GPU) and then more specifically the exact name of the different hardware platforms. For each hardware platform, we list the sweep of specific deployment parameters (batch sizes, operating modes etc) that were used for the experimentation in separate columns. In the columns, we show CNN topologies. When a CNN topology was implemented on a given hardware platform, we show in the corresponding cell the precisions (quantization information) and the channel pruning scale. Otherwise, “na” indicates that the topology wasn’t executed on this specific hardware platform. Many combinations between topology and hardware platform are not supported by the vendors dedicated software environments. INTx depicts a fixed point integer representation with x bits. FPy represents a floating point representation with y bits, for example FP32 is singe precision floating point. Tables follow below.

MNIST

MNIST Classification
Hardware Platform MLP Batch/Stream/Thread
FPGA ZCU102-DPU na [1,2,3,4,5,6,7,8]
ZCU104-DPU na [1,2,3,4,5,6,7,8]
Ultra96-DPU na [1,2,3,4,5,6,7,8]
ZCU104-FINN [INT2, INT4] * [100%,50%,25%,12.5%] [1,2,4,8,16,32,64,128,256,512,10000]
ZCU104-BISMO [INT2, INT4] * [100%,50%,25%,12.5%] [2,4,8,16,32,64,128]
GPU TX2-maxn [FP16, FP32] * [100%,50%,25%,12.5%] [1,2,4,8,16,32,64,128]
TX2-maxp [FP16, FP32] * [100%,50%,25%,12.5%] [1,2,4,8,16,32,64,128]
TX2-maxq [FP16, FP32] * [100%,50%,25%,12.5%] [1,2,4,8,16,32,64,128]
TPU TPU-fast clk na [1]
TPU-slow clk na [1]
VLIW NCS [FP16] * [100%,50%,25%,12.5%] [1,2,4,8,16,32,64,128]
CPU U96-Quadcore A53 [INT2, INT4] * [100%,50%,25%,12.5%] [2,4,8,16,32,64,128]

ImageNet

ImageNet Classification
Hardware Platform ResNet50 GoogLeNetV1 MobileNet Batch/Stream/Thread
FPGA ZCU102-DPU [INT8]*[100%,80%,50%,30%] INT8 na [1,2,3,4,5,6,7,8]
ZCU104-DPU INT8 INT8 na [1,2,3,4,5,6,7,8]
Ultra96-DPU [INT8]*[100%,80%,50%,30%] INT8 INT8 [1,2,3,4,5,6,7,8]
ZCU104-FINN na na na [1,2,4,8,16,32,64,128,256,512,10000]
ZCU104-BISMO na na na [2,4,8,16,32,64,128]
GPU TX2-maxn FP16,FP32 FP16,FP32 na [1,2,4,8,16,32,64,128]
TX2-maxp FP16,FP32 FP16,FP32 na [1,2,4,8,16,32,64,128]
TX2-maxq FP16,FP32 FP16,FP32 na [1,2,4,8,16,32,64,128]
TPU TPU-fast clk na INT8 INT8 [1]
TPU-slow clk na INT8 INT8 [1]
VLIW NCS FP16 na na [1,2,4,8,16,32,64,128]
CPU U96-Quadcore A53 na na na [2,4,8,16,32,64,128]

CIFAR-10

CIFAR-10 Classification
Hardware Platform CNV Batch/Stream/Thread
FPGA ZCU102-DPU na [1,2,3,4,5,6,7,8]
ZCU104-DPU na [1,2,3,4,5,6,7,8]
Ultra96-DPU na [1,2,3,4,5,6,7,8]
ZCU104-FINN [INT2,INT4]*[100%,50%,25%,12.5%] [1,2,4,8,16,32,64,128,256,512,10000]
ZCU104-BISMO [INT2,INT4]*[100%,50%,25%,12.5%] [2,4,8,16,32,64,128]
GPU TX2-maxn [FP16,FP32]*[100%,50%,25%,12.5%] [1,2,4,8,16,32,64,128]
TX2-maxp [FP16,FP32]*[100%,50%,25%,12.5%] [1,2,4,8,16,32,64,128]
TX2-maxq [FP16,FP32]*[100%,50%,25%,12.5%] [1,2,4,8,16,32,64,128]
TPU TPU-fast clk na [1]
TPU-slow clk na [1]
VLIW NCS [FP16]*[100%,50%,25%,12.5%] [1,2,4,8,16,32,64,128]
CPU U96-Quadcore A53 [INT2,INT4]*[100%,50%,25%,12.5%] [2,4,8,16,32,64,128]