Introduction

This page presents a brief overview of all conducted experiments as a cross product of hardware platform (including deployment settings), and topology (including quantization and pruning information) whereby each Machine Learning task has its own table.

Tables

Each Machine Learning task has its own table. In each table, within the rows, we show the type of hardware platforms that we used for this task (for example FPGA or GPU) and then more specifically the exact name of the different hardware platforms. For each hardware platform, we list the sweep of specific deployment parameters (batch sizes, operating modes etc) that were used for the experimentation in separate columns. In the columns, we show CNN topologies. When a CNN topology was implemented on a given hardware platform, we show in the corresponding cell the precisions (quantization information) and the channel pruning scale. Otherwise, “na” indicates that the topology wasn’t executed on this specific hardware platform. Many combinations between topology and hardware platform are not supported by the vendors dedicated software environments. INTx depicts a fixed point integer representation with x bits. FPy represents a floating point representation with y bits, for example FP32 is singe precision floating point. Tables follow below.

		MNIST Classification
Hardware	Platform	MLP	Batch/Stream/Thread
FPGA	ZCU102-DPU	na	[1,2,3,4,5,6,7,8]
	ZCU104-DPU	na	[1,2,3,4,5,6,7,8]
	Ultra96-DPU	na	[1,2,3,4,5,6,7,8]
	ZCU104-FINN	[INT2, INT4] * [100%,50%,25%,12.5%]	[1,2,4,8,16,32,64,128,256,512,10000]
	ZCU104-BISMO	[INT2, INT4] * [100%,50%,25%,12.5%]	[2,4,8,16,32,64,128]
GPU	TX2-maxn	[FP16, FP32] * [100%,50%,25%,12.5%]	[1,2,4,8,16,32,64,128]
	TX2-maxp	[FP16, FP32] * [100%,50%,25%,12.5%]	[1,2,4,8,16,32,64,128]
	TX2-maxq	[FP16, FP32] * [100%,50%,25%,12.5%]	[1,2,4,8,16,32,64,128]
TPU	TPU-fast clk	na	[1]
	TPU-slow clk	na	[1]
VLIW	NCS	[FP16] * [100%,50%,25%,12.5%]	[1,2,4,8,16,32,64,128]
CPU	U96-Quadcore A53	[INT2, INT4] * [100%,50%,25%,12.5%]	[2,4,8,16,32,64,128]

		ImageNet Classification
Hardware	Platform	ResNet50	GoogLeNetV1	MobileNet	Batch/Stream/Thread
FPGA	ZCU102-DPU	[INT8]*[100%,80%,50%,30%]	INT8	na	[1,2,3,4,5,6,7,8]
	ZCU104-DPU	INT8	INT8	na	[1,2,3,4,5,6,7,8]
	Ultra96-DPU	[INT8]*[100%,80%,50%,30%]	INT8	INT8	[1,2,3,4,5,6,7,8]
	ZCU104-FINN	na	na	na	[1,2,4,8,16,32,64,128,256,512,10000]
	ZCU104-BISMO	na	na	na	[2,4,8,16,32,64,128]
GPU	TX2-maxn	FP16,FP32	FP16,FP32	na	[1,2,4,8,16,32,64,128]
	TX2-maxp	FP16,FP32	FP16,FP32	na	[1,2,4,8,16,32,64,128]
	TX2-maxq	FP16,FP32	FP16,FP32	na	[1,2,4,8,16,32,64,128]
TPU	TPU-fast clk	na	INT8	INT8	[1]
	TPU-slow clk	na	INT8	INT8	[1]
VLIW	NCS	FP16	na	na	[1,2,4,8,16,32,64,128]
CPU	U96-Quadcore A53	na	na	na	[2,4,8,16,32,64,128]

		CIFAR-10 Classification
Hardware	Platform	CNV	Batch/Stream/Thread
FPGA	ZCU102-DPU	na	[1,2,3,4,5,6,7,8]
	ZCU104-DPU	na	[1,2,3,4,5,6,7,8]
	Ultra96-DPU	na	[1,2,3,4,5,6,7,8]
	ZCU104-FINN	[INT2,INT4]*[100%,50%,25%,12.5%]	[1,2,4,8,16,32,64,128,256,512,10000]
	ZCU104-BISMO	[INT2,INT4]*[100%,50%,25%,12.5%]	[2,4,8,16,32,64,128]
GPU	TX2-maxn	[FP16,FP32]*[100%,50%,25%,12.5%]	[1,2,4,8,16,32,64,128]
	TX2-maxp	[FP16,FP32]*[100%,50%,25%,12.5%]	[1,2,4,8,16,32,64,128]
	TX2-maxq	[FP16,FP32]*[100%,50%,25%,12.5%]	[1,2,4,8,16,32,64,128]
TPU	TPU-fast clk	na	[1]
	TPU-slow clk	na	[1]
VLIW	NCS	[FP16]*[100%,50%,25%,12.5%]	[1,2,4,8,16,32,64,128]
CPU	U96-Quadcore A53	[INT2,INT4]*[100%,50%,25%,12.5%]	[2,4,8,16,32,64,128]

Introduction

Tables

MNIST

ImageNet

CIFAR-10