Near-Memory Computing with Compressed Embedding Table for Personalized Recommendation
DescriptionAn embedding layer in personalized recommendation models requires sparse memory accesses to large memory space, leading to high latency and energy waste. To alleviate it, we propose an embedding vector element quantization and compression method to reduce the memory footprint required by the embedding tables, which results in compression ratios of 3.95--4.14. We also propose near-memory acceleration hardware architecture with an SRAM buffer that stores frequently accessed embedding vectors. Our acceleration technique in 3D-stacked DRAM memories leads to 4.9X--5.4X embedding layer speedup as compared to the 8-core CPU-based execution while reducing the memory energy consumption by 5.9X--12.1X, on average.
TimeWednesday, July 12th6:00pm - 7:00pm PDT
LocationLevel 2 Lobby