OASR-WFBP: An Overlapping Aware Startup Sharing Gradient Merging Strategy for Efficient Communication in Distributed Deep Learning
DescriptionWhile Wait-Free-Back-Propagation (WFBP) serves as a practical method in distributed deep-learning, it suffers from a large communication overhead. Ideally, the communication overhead can be reduced by overlapping the communication and computation of gradient and sharing the startup time among multiple gradient communication phases. However, existing optimizations choose to share the startup time greedily and fail to coordinately exploit the overlapping opportunity between computation and communication. We propose an overlapping aware startup sharing WFBP (OASR-WFBP). An analytic model is designed to guide the sharing procedure. Evaluations show that OSAR-WFBP achieves a 5%-13% optimization in iteration time over the state-of-the-art WFBP algorithm.
TimeWednesday, July 12th6:00pm - 7:00pm PDT
LocationLevel 2 Lobby