Declarative Parallel Programming
Introduction
Parallel programming is a necessity on modern multi-core processors. And these processors are becoming more and more “parallel”—not only are there projected to be more cores on each chip in near future, but the chips are also expected to contain multiple types of cores, such as GPGPUs. In order to get additional computational power it is common for multiple such processing nodes to be networked together into a cluster. It is very likely that programmers will soon find themselves programming massively parallel machines to achieve high performance in their applications, which will be extremely difficult and error prone, if not impossible, with the current software techniques and practices of writing parallel programs
A Practical Approach
This research is motivated by the observation that it takes several years to decades for entirely new languages and programming models to become mainstream, while the problem of parallel programming is right upon us with no “silver bullet” in sight. A much more pragmatic approach is to build on the parallel programming models that are already well established and proven. What these models lack today are maintainability and performance portability on the rapidly changing parallel architectures, which is the what this research attempts to address.
A great amount of effort in writing parallel programs goes into carefully tuning them for a specific architecture. Unfortunately, no sooner than the application gets optimized for a specific platform, it is time to move to a more modern and completely different parallel machine. Consider parallel programs written with MPI, which is the dominant way to write parallel programs with distributed address spaces. Tuning MPI programs involves carefully overlapping communication and computation by strategically placing sends and receives. To ensure best performance the programmer must use the asynchronous variety of communication primitives, placing the communication calls far from where the data will be used. This makes the programs much harder to debug and maintain. Moreover, the amount of overlap is platform dependent. Another optimization technique is to combine multiple small messages. Once again, the tradeoffs between the amount of coalescing and overloading the send / receive queues is dependent on the platform as well as the MPI implementation. Further complication arises when the MPI program needs to be optimized for a cluster of multi-core nodes—an increasingly common occurrence.
This research aims to develop a parallel programming technique that will effectively separate the computation and communication specifications, thus providing a declarative approach to parallel programming, while retaining the benefits of using the most optimized and highly tuned communication libraries, such as MPI. One way to think about this approach is by comparing it to the parser-generation tool yacc that cleanly separates the specification of the language and the actions that need to be taken upon parsing. Such a separation makes it possible to map the parallel program effectively and efficiently on to a distributed memory cluster, a shared memory machine, or a hybrid cluster containing both. Since parallelism is explicitly specified programmers, who have knowledge at the algorithmic level—something a compiler lacks—control data and task distribution. At the same time they are relieved of platform-specific optimizations that can be performed by a compiler. Furthermore, the declarative specification makes it possible to equip the compiler with techniques to automatically partially verify the correctness of the program, e.g., by ensuring that the specifications do not lead to deadlocks.
Goals
- Developing a declarative syntax to specify communication and computation separately.
- Developing a compiler to automatically translate the declarative specification into code that uses lower-level libraries, such as MPI or threads.
- Developing automatic verification techniques.
Funded by NSF, award CNS-0834722
People
Collaborators
- Andrew Lumsdaine
- Douglas Gregor
- Tim Mattox
- Prabhanjan (Anju) Kambadur
- Torsten Hoefler
- Matt Bone