Introduction

This page presents a Theoretical Analysis of both hardware platforms as well as CNN topologies. In order to get a general overview of all CNNs and Hardware Platforms included in our experiments, we present the following 3 tables.

Tables

CNNs and Their Accuracy Over All Pruning and Quantization Variants

Table below provides a complete overview of all CNNs that were included in the experimentation and their corresponding accuracy over all Pruning and Quantization Variants.

INT2 INT4 INT8 FP16 FP32
top1 (top5) [%] top1 (top5) [%] top1 (top5) [%] top1 (top5) [%] top1 (top5) [%]
GoogLeNetv1 100 nm nm 69.24 (88.45) 66.93 (87.83) 66.96 (87.84)
MobileNetv1 100 nm nm 69.57 (87.71) nm nm
EfficientNet-S 100 nm nm 77 nm nm
EfficientNet-M 100 nm nm 78.6 nm nm
EfficientNet-L 100 nm nm 80.2 nm nm
ResNet-50 100 nm nm 73.29 (91.26) 75.14 (92.12) 75.15 (92.11)
ResNet-50 80 nm nm 73.30 (91.40) nm nm
ResNet-50 50 nm nm 69.49 (91.00) nm nm
ResNet-50 30 nm nm 68.83 ( 90.16) nm nm
CNV 100 86.86 87.4 nm 87.02 87.06
CNV 50 84.29 84.88 nm 85.55 85.6
CNV 25 79.89 81.09 nm 83.28 83.25
CNV 12.5 73.64 75.85 nm 77.82 77.84
MLP 100 98.75 98.77 nm 97.3 97.31
MLP 50 98.49 98.62 nm 97.45 97.46
MLP 25 98.04 98.29 nm 97.49 97.44
MLP 12.5 96.85 97.54 nm 97.95 97.15

CNNs and Their Compute and Memory Requirements

Next table shows Compute and Memory Requirements for all CNNs in number of operations ([GOPs]), Model Size ([ME]) and Operational Intensity ([OI]) in operations per byte read or written from memory.

Total OPs Total Model Size OI (INT2) OI (INT4) OI (INT8) OI (FP16) OI (FP32)
GOPs [ME] [Ops/Byte] [Ops/Byte] [Ops/Byte] [Ops/Byte] [Ops/Byte]
GoogLeNetv1 100% 3.1 6 2093.97 1046.99 523.49 261.75 130.87
MobileNetv1 100% 1.1 4.2 1075.47 537.74 268.87 134.43 67.22
ResNet-50 100% 7.7 25.5 1210.84 605.42 302.71 151.36 75.68
ResNet-50 80% 6.5 23.7 1086.59 543.3 271.65 135.82 67.91
ResNet-50 50% 3.8 15.8 949.85 474.93 237.46 118.73 59.37
ResNet-50 30% 2.5 10.1 970.16 485.08 242.54 121.27 60.64
EfficientNet-S 100% 4.7 5.4 3481.48 1740.74 870.37 435.18 217.59
EfficientNet-M 100% 7.4 6.9 4289.86 2144.93 1072.46 536.23 268.12
EfficientNet-L 100% 19.4 10.6 7313.21 3656.6 1828.3 914.15 457.08
CNV 100% 0.47 6.16 304.95 152.48 76.24 38.12 19.06
CNV 50% 0.12 1.54 308.32 154.16 77.08 38.54 19.27
CNV 25% 0.03 0.39 315.01 157.51 78.75 39.38 19.69
CNV 12.5% 0.01 0.1 332.61 166.3 83.15 41.58 20.79
MLP 100% 0.02 10.01 8 4 2 1 0.5
MLP 50% 0.00582 2.91 8 4 2 1 0.5
MLP 25% 0.0019 0.93 8 4 2 1 0.5
MLP 12.5% 0.0007 0.33 8 4 2 1 0.5

We created bar charts to better illustrate compute and memory requirements for all CNNs from the previous table in an interactive and easy way.

Hardware Platforms

Table below summarizes all included hardware platforms, each with its corresponding peak performance for diferent datatypes (INTx, FPx), its Memory Bandwidth, Memory capacity as well as Thermal Design Power.

Hardware Platforms INT2 INT4 INT8 FP16 FP32 Memory Bandwidth Memory Capacity Power
[TOP/sec] [TOP/sec] [TOP/sec] [TOP/sec] [TOP/sec] [GBps] [GB] [Watt]
Ultra96-DPU na na 0.96 na na 4.26 2 na
ZCU104-DPU na na 4.6 na na 19.2 4 na
ZCU102-DPU na na 6.71 na na 19.2 4 na
ZCU104-FINN 30.7 8.8 na na na 19.2 4 na
ZCU104-BISMO 30.7 8.8 na na na 19.2 4 na
TX2 - maxn na na na 1.33 0.67 59.7 8 15
TX2 - maxp na na na 1.15 0.57 59.7 8 15
TX2 - maxq na na na 0.87 0.44 59.7 8 15
EdgeTPU-fast na na 4 na na 25.6 1 2
EdgeTPU-slow na na 2 na na 25.6 1 2
NCS (MyriadX) na na 1 0.5 na 12.8 2 1
U96-Quadcore A53-INT8 0.192 0.192 0.192 na na 4.26 2 na

To better illustrate Hardware Platforms' Peak Performance and Memory Bandwidth, an interactive Bar chart can be found below. Please note, only performance for natively supported datatypes are shown.

Overview of Theoretical Evaluation

Rooflines for All Hardware Platforms and CNNs

Combining application requirements with hardware platform characteristics can be leveraged for performance predictions using UCB’s roofline models. Using assumptions for where weights, activation tensors, and state of a neural network are stored, combined with the size of the datatypes used, allow us to derive the arithmetic intensity of a neural network during inference. Combined with the roofline for a given hardware platform, we can provide insight as to whether a neural network will be memory or compute bound and guidance for what is theoretically possible in regards to its throughput.

*Applies to the following pruning factors: 100%, 50%, 25% and 12,5%

Performance Prediction

The following heatmaps show the theoretical performance for the listed hardware platforms across the various machine learning tasks: MNIST, ImageNet and CIFAR-10. The metric used for the theoretical performance is input/second.

MNIST

For MNIST, quantization combined with pruning deliver some of best performance results.

ImageNet

For ImageNet, quantization combined with pruning also deliver some of best performance results.

CIFAR-10

Finally, for CIFAR-10, quantization combined with pruning deliver some of best performance results

Theoretical Pareto Curves

In the following plots we present a theoretical pareto curve for each type of classification.

MNIST

ImageNet

CIFAR-10