'Parallelization'

Success with NSF!

Finally, there is some success with NSF (somewhat old news by now :).  I got funding for two projects this summer—first for starting work on source-level modeling of memory behavior of applications and second, in joint work with Andrew Lumsdaine, on a declarative approach to parallel programming.  To me both are very interesting directions.

The first idea builds on the notion of reuse distance or stack-distance that has been used before to estimate locality at the binary level.  I want to extend that to source level so that it could be used to direct compiler optimizations.  Several challenges lie in doing this:

  • What is the correlation of source-level reuse distance with the actual memory behavior of the program?
  • Can a source-level reuse distance metric be defined that is amenable to efficient computation?
  • Can empirical methods be developed to handle those situations in which sources are not available?

 
However, the potential is great!  Such a metric could give compilers a handle on the memory behavior of programs that does not exist today.

The second idea, of declarative approach to parallel programming, is in fact a brainchild of Andrew.  Everybody agrees that writing parallel programs is hard.  So, we are not even attempting to create a silver bullet that will magically “parallelize” a sequential program.  Instead we wish to focus on making the task of writing parallel programs less painful.  New programming technologies do not just get adopted overnight. They usually go through a process of validation by advanced users and only then get widely accepted.  This is particularly true of high-performance applications that we want to specially target in this research.  Rather than create yet another language that will solve all parallel programming problems, we have a relatively modest goal: let users specify parallelism in a way that cleanly separates computation from communication.  Given that a lot of performance optimizations in parallel programs is based on communication optimization, this provides a ”declarative” way for the users to specify their program.  The intention is not to create a new language, but to only provide an abstraction—similar to what yacc does for writing parsers.

Finally, there is some success with NSF (somewhat old news by now :).  I got funding for two projects this summer—first for starting work on source-level modeling of memory behavior of applications and second, in joint work with Andrew Lumsdaine, on a declarative approach to parallel programming.  To me both are very interesting directions.

The first idea builds on the notion of reuse distance or stack-distance that has been used before to estimate locality at the binary level.  I want to extend that to source level so that it could be used to direct compiler optimizations.  Several challenges lie in doing this:

  • What is the correlation of source-level reuse distance with the actual memory behavior of the program?
  • Can a source-level reuse distance metric be defined that is amenable to efficient computation?
  • Can empirical methods be developed to handle those situations in which sources are not available?

 
However, the potential is great!  Such a metric could give compilers a handle on the memory behavior of programs that does not exist today.

The second idea, of declarative approach to parallel programming, is in fact a brainchild of Andrew.  Everybody agrees that writing parallel programs is hard.  So, we are not even attempting to create a silver bullet that will magically “parallelize” a sequential program.  Instead we wish to focus on making the task of writing parallel programs less painful.  New programming technologies do not just get adopted overnight. They usually go through a process of validation by advanced users and only then get widely accepted.  This is particularly true of high-performance applications that we want to specially target in this research.  Rather than create yet another language that will solve all parallel programming problems, we have a relatively modest goal: let users specify parallelism in a way that cleanly separates computation from communication.  Given that a lot of performance optimizations in parallel programs is based on communication optimization, this provides a ”declarative” way for the users to specify their program.  The intention is not to create a new language, but to only provide an abstraction—similar to what yacc does for writing parsers.

Add comment August 14th, 2008

No Comments

Estimating dependences

A thought occurred to me.  Can we estimate dependences based on the pattern of branches and values?  Hypothetically, if (say) runtime profiling told me that for a certain pattern of branches and input values a certain loop has no carried-dependences then we could parallelize the loop without carrying out a detailed dependence analysis.  And, if the process works then it could be sufficiently light-weight to be done just-in-time at runtime!

A thought occurred to me.  Can we estimate dependences based on the pattern of branches and values?  Hypothetically, if (say) runtime profiling told me that for a certain pattern of branches and input values a certain loop has no carried-dependences then we could parallelize the loop without carrying out a detailed dependence analysis.  And, if the process works then it could be sufficiently light-weight to be done just-in-time at runtime!

Add comment March 3rd, 2008

No Comments

Macro-dataflow

Nobody seems to be exploring a macro-dataflow style parallelization strategy for MATLAB. I think there is great potential here, especially on multi-core systems. If we can implement a very low overhead queue of tasks we ought to be able to leverage the multi-core revolution and gain reasonable performance improvements completely automatically. The beauty of this approach is that it would work with the interpreter as well as the compiled code. And, as long as we are able to extract enough parallelism, it is inherently scalable.

Extracting parallelism is a well-studied problem. I imagine that we will be able to get significant performance gains without any significant program transformations initially. Eventually, we would want to capture more and more cases through enabling transformations. It could open the gates to several possible research studies!

Nobody seems to be exploring a macro-dataflow style parallelization strategy for MATLAB. I think there is great potential here, especially on multi-core systems. If we can implement a very low overhead queue of tasks we ought to be able to leverage the multi-core revolution and gain reasonable performance improvements completely automatically. The beauty of this approach is that it would work with the interpreter as well as the compiled code. And, as long as we are able to extract enough parallelism, it is inherently scalable.

Extracting parallelism is a well-studied problem. I imagine that we will be able to get significant performance gains without any significant program transformations initially. Eventually, we would want to capture more and more cases through enabling transformations. It could open the gates to several possible research studies!

Add comment July 16th, 2007

No Comments

Distributions

I am getting more and more convinced that automatically distributing data is a critical problem to solve in automatic parallelization and that it has to be done in a way that allows arbitrary data distributions. Restricting to a small set of distributions was the nemesis of HPF and we need to make sure we don’t fall into the same trap.

So, to make the problem more precise, we can start with a collection of well-known data distributions, but we must develop techniques that would allow addition of new, user defined data distributions. Perhaps, a way out (the one that I suggested to Christine) would be to abstract out the properties of a data distribution on which the decision about the best distribution could be based. Then, as long as the user supplies those properties of any distribution they want to add to the repertoire we would be able to handle it. If the properties are reasonable we could also think of automatically deriving them even from procedural descriptions of distributions (e.g., automatic differentiation).

Of course, the main challenge is to actually determine a data distribution out of the given repertoire for each variable. I think the abstract properties of each data distribution will somehow need to be related to the cost associated with each data distribution to perform operations. May be, the abstract “properties” of a distribution will need to be parameterized by various operations. Much is unknown at this point, but one thing appears certain: any analytical model that works at a micro-level basing itself on precisely predicting the cost of moving given amounts of data is doomed to fail! The cost of data movement is highly unpredictable on modern systems. So, any model for automatic data distribution will have to work with partial information. Perhaps, also involve dynamic tuning, at run time. I have a strong hunch that working with macro-level operations (as we do), rather than low-level scalar operations, will greatly help.

I am getting more and more convinced that automatically distributing data is a critical problem to solve in automatic parallelization and that it has to be done in a way that allows arbitrary data distributions. Restricting to a small set of distributions was the nemesis of HPF and we need to make sure we don’t fall into the same trap.

So, to make the problem more precise, we can start with a collection of well-known data distributions, but we must develop techniques that would allow addition of new, user defined data distributions. Perhaps, a way out (the one that I suggested to Christine) would be to abstract out the properties of a data distribution on which the decision about the best distribution could be based. Then, as long as the user supplies those properties of any distribution they want to add to the repertoire we would be able to handle it. If the properties are reasonable we could also think of automatically deriving them even from procedural descriptions of distributions (e.g., automatic differentiation).

Of course, the main challenge is to actually determine a data distribution out of the given repertoire for each variable. I think the abstract properties of each data distribution will somehow need to be related to the cost associated with each data distribution to perform operations. May be, the abstract “properties” of a distribution will need to be parameterized by various operations. Much is unknown at this point, but one thing appears certain: any analytical model that works at a micro-level basing itself on precisely predicting the cost of moving given amounts of data is doomed to fail! The cost of data movement is highly unpredictable on modern systems. So, any model for automatic data distribution will have to work with partial information. Perhaps, also involve dynamic tuning, at run time. I have a strong hunch that working with macro-level operations (as we do), rather than low-level scalar operations, will greatly help.

Add comment May 15th, 2007

No Comments


Links

Calendar

January 2009
M T W T F S S
« Dec    
 1234
567891011
12131415161718
19202122232425
262728293031  

Categories

Admin

Feeds