Theoretical Analysis
Theoretical Analysis (level 0) for all CNN topologies and hardware platforms
- Introduction
- Tables
- Rooflines for All Hardware Platforms and CNNs
- Performance Prediction
- Theoretical Pareto Curves
This page presents a Theoretical Analysis of both hardware platforms as well as CNN topologies. In order to get a general overview of all CNNs and Hardware Platforms included in our experiments, we present the following 3 tables.
Table below provides a complete overview of all CNNs that were included in the experimentation and their corresponding accuracy over all Pruning and Quantization Variants.
Next table shows Compute and Memory Requirements for all CNNs in number of operations ([GOPs]), Model Size ([ME]) and Operational Intensity ([OI]) in operations per byte read or written from memory.
We created bar charts to better illustrate compute and memory requirements for all CNNs from the previous table in an interactive and easy way.
Table below summarizes all included hardware platforms, each with its corresponding peak performance for diferent datatypes (INTx, FPx), its Memory Bandwidth, Memory capacity as well as Thermal Design Power.
To better illustrate Hardware Platforms' Peak Performance and Memory Bandwidth, an interactive Bar chart can be found below. Please note, only performance for natively supported datatypes are shown.
Combining application requirements with hardware platform characteristics can be leveraged for performance predictions using UCB’s roofline models. Using assumptions for where weights, activation tensors, and state of a neural network are stored, combined with the size of the datatypes used, allow us to derive the arithmetic intensity of a neural network during inference. Combined with the roofline for a given hardware platform, we can provide insight as to whether a neural network will be memory or compute bound and guidance for what is theoretically possible in regards to its throughput.
*Applies to the following pruning factors: 100%, 50%, 25% and 12,5%
The following heatmaps show the theoretical performance for the listed hardware platforms across the various machine learning tasks: MNIST, ImageNet and CIFAR-10. The metric used for the theoretical performance is input/second.
For MNIST, quantization combined with pruning deliver some of best performance results.
For ImageNet, quantization combined with pruning also deliver some of best performance results.
Finally, for CIFAR-10, quantization combined with pruning deliver some of best performance results
In the following plots we present a theoretical pareto curve for each type of classification.