An Automated Approach to Improving Communication-Computation Overlap in Clusters
Author : Fishgold, Lewis; Danalis, Anthony; Pollock, Lori; Swany, Martin
Booktitle : Parallel Computing (ParCo ’05)
Date : Sep 2005
Keyword(s) : parallel computing, automated optimization, MPI optimization, communication-computation overlapping, compuniformer, source-to-source optimization
Document Type : In Conference Proceedings
Applications that execute on parallel clusters face scalability concerns due to the high communication overhead that is usually associated with such environments. Modern network technologies that support Remote Direct Memory Access (RDMA) can offer true zero copy communication and reduce communication overhead by overlapping it with computation. For this approach to be effective though, the parallel application using the cluster must be structured in a way that enables communication computation overlapping. Unfortunately, the trade-off between maintainability and performance often leads to a structure that prevents exploiting the potential for communication computation overlapping. This paper describes a source-to-source optimizing transformation that can be performed by an automatic (or semi-automatic) system in order to restructure MPI codes towards maximizing communication-computation overlapping.