AMP tutorial

Accelerating Computer Vision with Mixed Precision

Overview

Talks

Tutorial QnA

Useful Links

Organizers

ECCV 2020 Tutorial on

ECCV tutorial microsite link: click here

New levels of accuracy in computer vision, from image recognition and detection, to generating images with GANs, have been achieved by increasing the size of trained models. Fast turn-around times while iterating on the design of such models would greatly improve the rate of progress in this new era of computer vision.

This tutorial will describe techniques that utilize half-precision floating point representations to allow deep learning practitioners to accelerate the training of large deep networks while also reducing memory requirements.

The talks and sessions below will provide a deep-dive into available software packages that enable easy conversion of models to mixed precision training, practical application examples, tricks of the trade (mixed precision arithmetic, loss scaling, etc.), as well as considerations relevant to training many popular models in commonly used deep learning frameworks including PyTorch and TensorFlow.

YouTube playlist of all talks: https://www.youtube.com/playlist?list=PL6Rbs64R_CcPsJflYb2doIMmFaDyo-oIK

Topic			Speaker
Introduction	video	pdf	Arun Mallya
Basics and Fundamentals
Training Neural Networks with Tensor Cores	video	pdf	Dusan Stosic
Code Optimization Tricks
PyTorch Performance Tuning Guide	video	pdf	Szymon Migacz
Application Case Studies
Mixed Precision Training for Conditional GANs	video	pdf	Ming-Yu Liu
Mixed Precision Training for FAZE: Few-shot Adaptive Gaze Estimation	video	pdf	Shalini De Mello
Mixed Precision Training for Video Synthesis	video	pdf	Ting-Chun Wang
Mixed Precision Training for Convolutional Tensor-Train LSTM	video	pdf	Wonmin Byeon
Mixed Precision Training for 3D Medical Image Analysis	video	pdf	Dong Yang

What's the difference between FP32 and TF32 modes?
FP32 cores perform scalar instructions. TF32 is a Tensor Core mode, which performs matrix instructions - they are 8-16x faster and more energy efficient. Both take FP32 as inputs. TF32 mode also rounds those inputs to TF32.

To my understanding, Tensor Cores are required for mixed precision training. Are you planning to include this functionality (mixed precision training) in your lower end GPUs?
All the GPUs of the Turing family have Tensor Cores, meaning that you can perform mixed precision training using any Turing GPU. Tensor Cores are also available on the Volta family of GPUs (the family before Turing).

If we use mixed precision training, do we need to support mixed-precision inference when deploying models on hardware like FPGA/ASIC?
No. 16 bit numbers can always be represented as single precision 32 bit numbers. You can also quantize the weights further during inference.

Is it possible to use AMP (or regular mixed precision) when doing something that isn't machine learning? E.g. I have something in mind from stereo vision that would involve frequent matrix multiplications, but has nothing to do with neural networks.
Yes, you can use the cuDNN and cuBLAS libraries which allow you to run lower precision computations on the Tensor Cores. You will however have to manually decide which computations should be / not be performed in lower precision.

Could you elaborate more on the difference between the output of a command line like .half() (in pytorch), and the Apex AMP mixed precision mode?
Half converts the entire model (all weights and tensors) to FP16. AMP casts most layers and operations to FP16 (e.g. linear layers and convolutions), but leaves some layers in FP32 (e.g. normalizations and losses), according to its layer selection rules. This helps stabilize training as the selected layers often require higher precision during training.

After training with mixed precision, are there any examples to perform inference with our trained model using AMP?
If using frameworks, you can apply the same AMP wrapper on the inference script. For faster inference, you can use TensorRT and leverage various precisions, such as 16-bits (FP16) and integer quantization (INT8/INT4).

Any recommended (empirical/theoretical) schedule for switching between precision during training? Would it be straightforward to establish such a schedule in PyTorch for instance?
We recommend wrapping and training the model with Apex AMP, or the newer AMP directly available in PyTorch. This will automatically train your model with mixed precision right from the start.

Do you see mixed precision being adopted more widely in the coming years?
Yes, mixed precision training is already very widely adopted, especially in the industry. We expect even more adoption since NVIDIA is providing extensive hardware and software support.

NVIDIA Tensor Cores: https://www.nvidia.com/en-us/data-center/tensor-cores/
Automatic Mixed Precision (AMP): https://nvidia.github.io/apex/amp.html
PyTorch AMP: https://pytorch.org/docs/stable/amp.html
TensorFlow AMP: https://developer.nvidia.com/blog/nvidia-automatic-mixed-precision-tensorflow/
High-Performance Sample Code for Various Applications: https://github.com/NVIDIA/DeepLearningExamples
Imaginaire GAN repository: https://github.com/NVlabs/imaginaire
NVIDIA Data Loading Library (DALI) for faster data loading: https://github.com/NVIDIA/DALI

Dusan Stosic

Paulius Micikevicius

Szymon Migacz

Ming-Yu Liu

Shalini De Mello

Ting-Chun Wang

Wonmin Byeon

Dong Yang

Arun Mallya