Over the last three decades both the computer architectures and applications have become very complex. Applications have become complex and span millions of lines of code. Modern architectures typically have multiple cores and multiple levels of caches.
Compilers fail to statically model and predict application behavior that are large. Compilers based optimizations are restricted to those supplied by the vendors and they typically cannot incorporate custom optimizations. Library based approaches require manual guidance for sophisticated low-level optimizations and is often error prone.
POET is an embedded scripting language that aims at decoupling empirical tuning aspect from the compiler or the library. It is specifically targeted to Scientific applications. It can be embedded in code written in any language like C,C++ or Fortran by interpreting input code fragments as as parametrized string and not trying to interpret the underlying language. POET scripts can be generated by an optimizing compiler or by a professional library developer.
Capabilities of POET
- Generic restructuring transformations (loop unrolling, loop interchange, loop blocking) can be written by library builders to define their custom transformation without having to write any specialized compiler.
- Important properties and special semantics can be easily expressed in the description of input code. For e.g. for the matrix multiplication kernel we can use the Function, loop, Nest and Sequence. We can also encode the domain specific knowledge within the input code description.
- To allow for re-configuration, the transformations can be parametrized e.g. integral parameter to the loop unrolling transformation that allows for the level of unrolling.
POET supports four different kind of sections. The code and xform sections define the generic transformations that can be applied to arbitrary code. The define and output sections define the transformations that can be applied to any particular input code.
(1) poet : {section}
(2) section :
"<" "code" ID {codeAttr} ">" Exp "<" "/code" ">"
(3) | "<" "xform" ID {xformAttr} ">" Exp "<" "/xform" ">"
(4) | "<" "define" ID Exp ">"
(5) | "<" "output" FNAME Exp ">"
(6) codeAttr : "pars" "=" "(" ID {"," ID} ")"
(7) xformAttr : "pars" "=" "(" ID {"," ID} ")"
(8) | "tune" "=" "("ID"="INT{","INT} {";"ID"="INT{","INT}}")"
(9) Exp : Op
| Op ";" Exp | Op "," Exp | Op Exp
(10)Op : "if" Op "then" Op "else" Op
(11) | ID {"." ID} "=" Op
(12) | ID ":" Op
(13) | ID {"." ID} "#" Unit | ["car","cdr",INT] "#" Op
(14) | REPLACE "#" "(" Op "," Op "," Op ")"
(15) | PERMUTE "#" "(" Op "," Op ")"
(16) | DUPLICATE "#" "(" Op "," Op "," Op ")"
(17) | Op ["+","-","*"] Op | "-" Op
(18) | Op ["<","<=",">",">=","==","!="] Op
(19) | "!" Op | Op "&&" Op | Op "||" Op
(20) | Unit
(21)Unit : ID {"." ID} | INT | SOURCE | "(" Exp ")"
ID identifiers, e.g. Loop, Nest;
FNAME file names, e.g., mm.c, ../dgemm.c;
INT integer literals, e.g., 1, 0, 16, 64;
SOURCE strings of code fragments,
e.g. "c[i+j*m]+= alpha*b[k+j*l] * a[i+m*k];"
{s1s2…sn} Zero or more occurrences s1s2…sn
e.g., {", "ID} (zero or more of ", ID");
[t1, t2, .., tm] one token out of t1, t2, .., tm
e.g., [" + ", " - ", " ∗ "] ("+", "-", or "*").
Formal Grammar of POET language
<code Function pars=(head,decl,body)>
@head@
{
@decl@
@body@
}
</code>
<code Nest pars=(loop, body) >
@loop@ {
@body@
}
</code>
<code Sequence pars=(s1,s2) >
@s1@
@s2@
</code>
<code Loop pars=(i,start,stop,step) >
@init=(if start=="" then "" else (i "=" start ));
test=(if stop=="" then "" else (i "<=" stop ));
incr=(if step=="" then "" else (i "+=" step ));
@for (@init@; @test@; @incr@)
</code>
<xform Unroll pars=(input,uloop) tune=(ur=16)>
if (!(input : Nest#(loop,body))) then (
if (input : Function#(head,decl,body)) then
Function#(head,decl,Unroll#(body,uloop))
else if (input : Sequence#(s1,s2)) then
Sequence#(Unroll#(s1,uloop),Unroll#(s2,uloop))
else input
)
else if (loop != uloop) then
Nest#(loop, Unroll#(body,uloop))
else if (!ur || ur == 1) then input
else (
loop : Loop#(ivar,start,stop,step);
dup = (body DUPLICATE#(_, ur-1,(ivar "+=1;" ENDL body)));
Sequence#(Nest#(Loop#(ivar,start,(stop "-" ur-1),step),dup),
Nest#(Loop#(ivar,"",stop,step),body))
)
</xform>
==================================================================
Matrix Multiplication as defined by POET
==================================================================
include opt.poet
<define loopJ Loop#("j",0,"n",1)>
<define loopI Loop#("i",0,"m",1)>
<define loopK Loop#("k",0,"l",1)>
<define mmStmt "c[i+j*m]+= alpha*b[k+j*l] * a[i+m*k];">
<define nest1 Nest#(loopK,mmStmt)>
<define nest2 Nest#(loopI,nest1)>
<define nest3 Nest#(loopJ,nest2)>
<define mmHead "void dgemm(int m, int n, int l, double alpha,
double *a, double *b, double *c)">
<define dgemm Function#(mmHead, "int i, j, k;", nest3)>
<define mm_unroll (Unroll#(dgemm,loopK)) >
<define mm_block (Block#(dgemm,nest3,mmStmt))>
<define mm_permute (Permute#(dgemm,nest3,mmStmt))>
<define mm_block_unroll (Unroll#(mm_block,
InnermostLoop#(mm_block)))>
<output mm.c dgemm>
<output mm_unroll.c (Unroll.ur=8; mm_unroll)>
<output mm_block.c (Block.bsize=(8 64 128); mm_block)>
<output mm_permute.c (Permute.order=(3 2 1); mm_permute)>
<output mm_block_unroll.c
(Block.bsize=(16 16 8);Unroll.ur=8; mm_block_unroll)>
==================================================================
- Code: Defines parametrized code templates that convey special semantics The input code needs to be defined in terms of these code templates. This is essential for the transformation defined in the xform section to recognize the input program and apply transformations accordingly. e.g. for the loop unrolling applied to matrix-multiplication the four sections(Loop,Nest Sequence and Function) are shown above. Notice the ‘@’ which helps the POET lexical analyzer to switch between the POET definitions and the source strings of underlying languages.
- xform: Defines generic code transformation that can be applied to arbitrary code. The xform section relies on a collection of predefined code templates to recognize the structure of input code. It can also have additional tuning parameters that can be used to re-configuration of transformation.
- define and output: Used to split the input code into pre-defined code templates and to dynamically apply different transformations to input code. The output section defines the name of the file to which the the result of optimization should be applied.
More to Come!
References