Full-Stack, GPU-based Acceleration of Deep Learning




This tutorial focuses on describing techniques to allow deep learning practitioners to accelerate the training and inference of large deep networks while also reducing memory requirements across a spectrum of off-the-shelf hardware for important applications such as autonomous driving and large language models. Topics include, but are not limited to:
  • Deep learning specialized hardware overview. We review the architecture of the most used deep learning acceleration hardware, including the main computational processors and memory modules.
  • How deep learning is performed on this hardware. We cover aspects of algorithmic intensity and an overview of theoretical aspects of computing. Attendees will learn how to estimate processing time and latency by looking only at hardware specs and the network architecture.
  • Best practices for acceleration. We provide an overview of best practices for designing efficient neural networks including channel number selection, compute heavy operations, or reduction operations among others.
  • Existing tools for model acceleration. In this part we will focus on existing tools to accelerate a trained neural network on GPU devices. We will particularly discuss operation folding, TensorRT, ONNX graph optimization, sparsity.
  • Research overview of recent techniques. In the last part, we will focus on recent advanced techniques for post training model optimization including pruning, quantization, model distillation or NAS among others.


13:3013:35Opening Remarks
13:3514:15 Jason ClemonsFoundations of Deep Learning hardware.
14:1515:00 Pavlo MolchanovDNN Performance optimization: How to achieve more with less cost, software perspective.
15:0015:30Coffee Break
15:3016:15 Maying ShenSparsity in DNN and model compression.
16:1517:00 Hongxu (Danny) YinTowards Efficient and Reliable Deep Learning - Research Insights.


Maying Shen is currently a senior autonomous driving research engineer at NVIDIA. Prior to joining NVIDIA, she graduated from CMU majoring in computer vision, where she developed her interest in seeing the world through the computer's eyes. Her interests include deep learning efficiency from both, training and inference side, working on aspects such as neural network pruning, distillation, or quantization among others.
Jason Clemons received his Ph.D. in computer science and engineering from the University of Michigan, Ann Arbor, MI, USA where he researched computer architectures for mobile computer vision. In his senior research scientist role at NVIDIA his current research focuses on domain-specific computing, in particular the intersection of machine learning, computer vision, and computer architecture. He has worked on machine learning accelerators, computer vision accelerators, accelerating DNN training on GPUs, and accelerating RL using GPUs. He is an IEEE senior member and serves on IEEE International Symposium on Performance Analysis of Systems and Software steering committee.
Hongxu (Danny) Yin is a senior research scientist at Learning and Perception Research (LPR) at NVIDIA. He obtained his Ph.D. at Princeton University, New Jersey, USA, and B. Eng. from Nanyang Technological University, Singapore. He is a recipient of Princeton Yan Huo 94* Graduate Fellowship, Princeton Best Dissertation Finalist within Department, Princeton Natural Sciences and Engineering Fellowship, Defense Science & Technology Agency gold medal, and Thomson Asia Pacific Holdings gold medal. His research interests mainly include data-/execution-efficient and secure deep learning overseeing CNNs and transformers. He has been the organizer of several tutorial/workshop at CVPR and ICCV. He has been featured as Global Outstanding Chinese Power 100 Award by 36Kr and Top 60 Elite Chinese in North America by Forbes.
Pavlo Molchanov is a principal research scientist and research lead with NVIDIA Research since 2015. His research is focused on efficient deep learning and human-centric computer vision in LPR team lead by Jan Kautz. In the area of network efficiency he is working on methods for model acceleration, inversion, novel architectures and adaptive/conditional inference. In the area of human-centric vision he is working on face/body/hand landmarks and pose estimation, action/gesture recognition and designing novel human-computer interaction systems. He holds a degree in signal processing obtained in Tampere University of Technology, Finland in 2014. He served as a program committee member of IEEE AAAI. He has co-organized the Accelerating Computer Vision with Mixed Precision tutorial in conjunction with ICCV 2019


  • Maying Shen, Senior Research Engineer
  • Jason Clemons, Senior Research Scientist
  • Hongxu (Danny) Yin, Senior Research Scientist
  • Pavlo Molchanov, Principal Research Scientist
  • Jose M. Alvarez, Director, Applied research
  • Jan Kautz, VP of research