Presentation
Occamy: Memory-efficient GPU Compiler for DNN Inference
DescriptionThis work proposes Occamy, a new memory-efficient DNN compiler that reduces the memory usage of a DNN model without affecting its accuracy. For each DNN operation, Occamy analyzes dimensions of
input and output tensors, and their liveness within the operation. Across all the operations, Occamy analyzes liveness of all the tensors, generates a memory pool after calculating the maximum required memory size, and schedules when and where to place each tensor in the memory pool. Compared to PyTorch, on an integrated embedded GPU for six DNNs, Occamy reduces the memory usage by 33.75% while achieving the 1.365 times geomean speedup.
input and output tensors, and their liveness within the operation. Across all the operations, Occamy analyzes liveness of all the tensors, generates a memory pool after calculating the maximum required memory size, and schedules when and where to place each tensor in the memory pool. Compared to PyTorch, on an integrated embedded GPU for six DNNs, Occamy reduces the memory usage by 33.75% while achieving the 1.365 times geomean speedup.
Event Type
Research Manuscript
TimeWednesday, July 12th10:55am - 11:10am PDT
Location3012, 3rd Floor
Design
AI/ML System and Platform Design