November 14th, 2008 by cshei
While I was really hoping to have some experimental test results this week, I ran into a host of new problems while trying to test the example cases. I did find some interesting code from the MathWorks site, and it looks like Outage, Nasmg, and Arnoldi will be suitable examples too.
I uncovered yet another bug in SSA where it was handling certain ifs incorrectly. Strangely, there also seems to be a case where the type inference pass tries to call free() twice during its constant propagation phase. I’m currently trying to track that down. There were also many small fixes I had to make in various other things… which were mostly a result of certain assumptions I had made while writing them. Getting the compiler to run end-to-end is turning out to be significantly harder than I expected, but I feel that I made quite a bit of progress toward that goal this week.
Unfortunately, with only 9 days left before the HIPS deadline, I’m not sure if we’ll have everything done by then, but I will keep trying my best to make it. It looks like the LDTA deadline is shortly after HIPS, and it may be a good backup target. The CFP specifically mentions source-to-source compilers. 
Posted in Weekly Goals | 1 Comment »
November 7th, 2008 by cshei
Finishing up the specialization of for loops and array references took a little longer than I expected, but that is now complete. I believe I’ve also ironed out the bugs in DCE too. I’ve been looking for example test cases, but haven’t really found any interesting ones yet. There are a lot of examples at the Mathworks site, so I should be able to find something after sifting though there some more. I really hope to have a good set of example test cases by early next week because as I found out last semester, it will undoubtedly take some time to run the tests and get useful results.
Posted in Weekly Goals | No Comments »
November 7th, 2008 by cshei
This week, I plan to wrap up the specialization of for loops and array references, clean up/test the existing dead-code elimination code, and search for examples to run our tests on.
Posted in Weekly Goals | No Comments »
October 31st, 2008 by cshei
This week, I finished up copy propagation and added code to de-SSA arrays. I’ve returned to the work on specializing for loops and array references, and am close to finishing that. I expect to have that done this weekend, and then we can finally get to the experiments!
Posted in Weekly Goals | No Comments »
October 27th, 2008 by cshei
Today, I’ve been proofreading and reformatting my eScience 2008 extended abstract in order to submit it by today’s deadline. Once that’s done, the things on my plate are: wrapping up copy propagation, implementing dead code elimination, finding a solution/workaround for our SSA woes, and finally coming back to the specialization of for loops and array subscripts.
Posted in Weekly Goals | No Comments »
October 24th, 2008 by cshei
I received a notification a couple of days ago that my eScience 2008 poster submission has been accepted, so I plan to make one final pass over my extended abstract before resumitting it on Moday. I’ll also have to prepare a 250 word description as well.
Earlier this week, I had thought that I could finish the copy propagation in a day or so, but it turns out I underestimated the amount of work involved in this task. First of all, unlike the case of constant propagation, the use of “regular” dynamic rules is insufficient. “Dependent dynamic rules” are required to correctly implement copy propagation, and I hadn’t read up on them until this week. Secondly, the copy propagation example given at the Stratego/XT website mentions how it’s necessary to worry about scope while performing copy propagation in the example language, but after thinking about it for a bit, I don’t think this is necessary in the case of MATLAB (except maybe for global variables). So, the implementation of copy propagation is coming along slowly and surely, and I believe it’s almost complete. Next up will be dead code elimination, which should go more smoothly now that I’ve read up on dependent dynamic rules.
I also realized this week that there is another issue in SSA that will prove to be a showstopper for my matrix multiplication test case I’ve been working towards. It turns out that the inner loop references array C1 and writes to C2, even though it is supposed to be accumulating the results in one array. I’m not sure if this is a result of a bug in the SSA implementation, or if it’s because we haven’t implemented something like array-SSA. It seems like every time we get closer to being able to run a test case end-to-end though the compiler, another bug pops up! I’m almost tempted to sit down and re-write SSA, but with our possible/probable shift to Ruby in the near future, it wouldn’t really be worth it. 
Posted in Weekly Goals | 1 Comment »
October 24th, 2008 by cshei
While doing some testing with the output of SSA, I realized that the extra copies introduced by converting to SSA form and back greatly affect the performance of the resulting code. Therefore, this weeks goals will include implementing copy propagation and dead code elimination, as well as finishing last week’s specialization goals. Once all of these are in place, we should finally be able to see some performance improvement in our compiler output!
Posted in Weekly Goals | No Comments »
October 17th, 2008 by cshei
While I was hoping to complete the specialization of both for loops and array subscripts, I ran into a few major snags in generatessa and ssatoexe. Apparently, the ATerms they were outputting did not quite conform to the structure of the AST we expect/operate on. The biggest problem I ran into was that the code that inserted copies for the phi-functions in ssatoexe sometimes resulted in excessive nesting of the assignments it inserted (a list of these ended up inside a StmtList()). While the pretty printer seemed to print these out with the intended meaning, any subsequent pass that actually operated on the ATerm would fail, including lowertocpp. There were also a couple of minor problems elsewhere.
I spent much of this week tracking down the source of these bugs, which ended up being a significant, yet necessary, distraction from my primary goals. Now that they’ve been ironed out, I’m back on track for making a push to complete the specialization code.
I’ve also been looking though the list of workshops associated with IPDPS, and found the following interesting:
International Heterogeneity in Computing Workshop
International Workshop on
High-Level Parallel Programming Models and
Supportive Environments
IEEE International Workshop on Parallel and Distributed Scientific and Engineering Computing
Posted in Weekly Goals | 1 Comment »
October 13th, 2008 by cshei
This week, I plan to complete the specialization of for loops, and continue to try to lower the overhead of ParaMRange. Once that is complete, I’ll need to find examples where large segments of can be specialized, as it’s becoming clear that transitioning between specialized/non-specialized code incurs quite a significant amount of overhead. I also plan to read up on max-flow min-cut and think about how to use the idea to decide what to specialize.
Edit: another observation I made today was that the overhead of ParaMRange is actually very small at this point, even with the copy constructor being called. The main source of overhead I was observing for the matrix multiplication loops was a call to subsref to get the upper bounds of the inner loops. I had forgotten that in our implementation, subsasgn/subsref is VERY expensive. Specializing array subscripts is now also a goal for this week.
Posted in Weekly Goals | 1 Comment »
October 10th, 2008 by cshei
While looking over the code for the ParaMRange object, I noticed that it was creating a new Value object (and underlying mxArray) for every value it returned. This led to massive memory usage and was a huge source of inefficiency. I’ve re-written it to use a single Value object that has its underlying double updated every time the range object is incremented. This greatly reduces the memory usage, and running large loops no longer uses all 16 GB of memory on pumbaa.
While this change has led to a huge improvement in the speed of ParaMRange objects, I feel that there is still much room for improvement. Since it is now simply incrementing a double, I’ve been sidetracked from the specialization of for loops since I believe a highly optimized ParaMRange object can approach the speed of the specialized version. Arun and I had discussed how the specialized version would use an integer as the loop induction variable, and keep a “shadow” double that is used in the body for computations. By minimizing the overhead of a ParaMRange, we should be able to simply increment the ParaMRange’s underlying double and just use it.
The biggest problem I’m having right now is that MEX files cannot be profiled with gprof. I have a hunch that the biggest speed gain can be obtained from reducing the number of times the Value copy constructor is invoked, but I have no way of determining how much overhead is currently introduced by its use. The ParaMRange version of loops is still quite a bit slower than the version using C++ loops directly, and there isn’t much going on in the ParaMRange code anymore besides incrementing a double and returning a Value wrapping it (this is where the copy constructor is used).
While I didn’t meet my goal of finishing the specialization of for loops, I believe that what I’ve been working on instead will eliminate the benefit of specialization completely if I can sufficiently reduce the amount of overhead introduced by ParaMRange.
Posted in Weekly Goals | No Comments »