Close

Presentation

Design Space Exploration of Layer-Wise Mixed-Precision Quantization with Tightly Integrated Edge Inference Units
DescriptionLayer-wise mixed-precision quantization (MPQ) has become prevailing for edge inference due to a better balance between accuracy and efficiency compared to the uniform quantization scheme. In this work, we propose a novel algorithm that firstly samples the layer-wise sensitivity with respect to a newly proposed metric that incorporates both accuracy and proxy of hardware cost. At the hardware level, we propose a new processing-in-memory (PIM) architecture that tightly integrates the optimal MPQ policies as part of the processor pipeline through Instruction Set Architecture (ISA) and micro-architecture co-design. The proposed solution achieves 3%-11% higher inference accuracy with similar hardware cost.
Event Type
Work-in-Progress Poster
TimeTuesday, July 11th6:00pm - 7:00pm PDT
LocationLevel 2 Lobby
Topics
AI
Autonomous Systems
Cloud
Design
EDA
Embedded Systems
RISC-V
Security