HBP: Hierarchically Balanced Pruning and Accelerator Co-Design for Efficient DNN Inference
DescriptionWeight pruning is studied to accelerate DNN inference by reducing the parameters and computations. Irregular pruning achieves high sparsity while incurring low computation parallelism and imbalanced workloads. The coarse-grained structured pruning sacrifices sparsity for higher parallelism. To strike a better balance, we propose Hierarchically Balanced Pruning by applying fine-grained but structured adjustments on the basis of irregular pruning. Besides, it partitions the weight matrix into hierarchical blocks and constrains the sparsity of the blocks for balanced workloads. Furthermore, an accelerator is proposed to unleash the power of the pruning method. Experimental results show our method significantly outperforms prior studies.
Event Type
Research Manuscript
TimeWednesday, July 12th4:40pm - 4:55pm PDT
Location3010, 3rd Floor
AI/ML Architecture Design