When to Prune? A Policy towards Early Structural Pruning

CVPR 2022

News
  • [Mar 2022] Accpetance as poster presentation.
  • [Mar 2022] Our paper has been accepted to CVPR 2022.
Abstract

Pruning enables appealing reductions in network memory footprint and time complexity. Conventional post-training pruning techniques lean towards efficient inference while overlooking the heavy computation for training. Recent exploration of pre-training pruning at initialization hints on training cost reduction via pruning, but suffers noticeable performance degradation. We attempt to combine the benefits of both directions and propose a policy that prunes as early as possible during training without hurting performance. Instead of pruning at initialization, our method exploits initial dense training for few epochs to quickly guide the architecture, while constantly evaluating dominant sub-networks via neuron importance ranking. This unveils dominant sub-networks whose structures turn stable, allowing conventional pruning to be pushed earlier into the training. To do this early, we further introduce an Early Pruning Indicator (EPI) that relies on sub-network architectural similarity and quickly triggers pruning when the sub-network’s architecture stabilizes. Through extensive experiments on ImageNet, we show that EPI empowers a quick tracking of early training epochs suitable for pruning, offering same efficacy as an otherwise “oracle” grid-search that scans through epochs and requires orders of magnitude more compute. Our method yields 1.4% top-1 accuracy boost over state-of-the-art pruning counterparts, cuts down training cost on GPU by 2.4x, hence offers a new efficiency-accuracy boundary for network pruning during training.

Key Approach

In this work, we aim to take full advantage of early-stage dense model training that is beneficial for rapid learning and optimal architecture exploration while aiming to identify the best sub-network as early as possible, rather than waiting till training ends as in conventional pruning. To understand how the starting point of pruning can be set automatically, we start by analyzing in depth the evolution of pruned architectures via performing trimming across all epochs rigorously and compare their suitability for pruning. The observations that 1) pruning at early epochs results in different final architectures, but 2) dominant architecture emerges within just a few epochs and stabilizes thereafter till training ends, inspire us the novel metric Early Pruning Indicator (EPI). EPI estimates the structure similarity between networks resulted in pruning at consecutive epochs of the same base mode, indicating the early point to start pruning during training. Augmented by EPI, our pruning-aware-training outperforms pruning-at-initialization alternatives by a large margin. We also show that EPI is agnostic to the pruning method used by showing efficacy for both magnitude-based and gradient-based pruning, enabling a new state-of-the-art boundary for training speedup through in-situ pruning.

The final ImageNet Top-1 accuracy of the pruned network when pruning occurs at different epochs during the early stage of training. We observe pruning at initialization tends to result in untrainable network with magnitude-based pruning method. For gradientbased method, we observe a higher degradation occur when more filters are pruned, and show pruning ratios up to 90% on ResNet50. [Pruned Ratio] denotes the percentage of neurons removed.
Structure stability analysis for ResNet architectures for (a) magnitude-based and (b) gradient-based pruning. Dashed line shows the EPI threshold selected for each network under the pruning criterion.
Bibtex
    

    @inproceedings{shen2022prune,
        title={When to Prune? A Policy towards Early Structural Pruning},
        author={Shen, Maying and Molchanov, Pavlo and Yin, Hongxu and Alvarez, Jose M},
        booktitle={Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition},
        year={2022}
    }