pyop2.coffee package

Submodules

pyop2.coffee.ast_autotuner module

COFFEE’s autotuning system.

class pyop2.coffee.ast_autotuner.Autotuner(variants, itspace, include, compiler, isa, blas)

Bases: object

Initialize the autotuner.

Parameters:
  • variants – list of (ast, used_optimizations) for autotuning
  • itspace – kernel’s iteration space
  • include – list of directories to be searched for header files
  • compiler – backend compiler info
  • isa – instruction set architecture info
  • blas – COFFEE’s dense linear algebra library info
tune(resolution)

Return the fastest kernel implementation.

Parameters:resolution – the amount of time in milliseconds a kernel is run.

pyop2.coffee.ast_base module

This file contains the hierarchy of classes that implement a kernel’s Abstract Syntax Tree (ast).

pyop2.coffee.ast_base.point(p)
pyop2.coffee.ast_base.point_ofs(p, o)
pyop2.coffee.ast_base.point_ofs_stride(p, o)
pyop2.coffee.ast_base.assign(s, e)
pyop2.coffee.ast_base.incr(s, e)
pyop2.coffee.ast_base.incr_by_1(s)
pyop2.coffee.ast_base.decr(s, e)
pyop2.coffee.ast_base.decr_by_1(s)
pyop2.coffee.ast_base.idiv(s, e)
pyop2.coffee.ast_base.imul(s, e)
pyop2.coffee.ast_base.wrap(e)
pyop2.coffee.ast_base.bracket(s)
pyop2.coffee.ast_base.decl(q, t, s, a)
pyop2.coffee.ast_base.decl_init(q, t, s, a, e)
pyop2.coffee.ast_base.for_loop(s1, e, s2, s3)
pyop2.coffee.ast_base.ternary(e, s1, s2)
pyop2.coffee.ast_base.as_symbol(s)
class pyop2.coffee.ast_base.Perfect

Bases: object

Dummy mixin class used to decorate classes which can form part of a perfect loop nest.

class pyop2.coffee.ast_base.Node(children=None)

Bases: object

The base class of the AST.

gencode()
class pyop2.coffee.ast_base.Root(children=None)

Bases: pyop2.coffee.ast_base.Node

Root of the AST.

gencode()
class pyop2.coffee.ast_base.Expr(children=None)

Bases: pyop2.coffee.ast_base.Node

Generic expression.

class pyop2.coffee.ast_base.BinExpr(expr1, expr2, op)

Bases: pyop2.coffee.ast_base.Expr

Generic binary expression.

gencode()
class pyop2.coffee.ast_base.UnaryExpr(expr)

Bases: pyop2.coffee.ast_base.Expr

Generic unary expression.

class pyop2.coffee.ast_base.Neg(expr)

Bases: pyop2.coffee.ast_base.UnaryExpr

Unary negation of an expression

gencode(scope=False)
class pyop2.coffee.ast_base.ArrayInit(values)

Bases: pyop2.coffee.ast_base.Expr

Array Initilizer. A n-dimensional array A can be statically initialized to some values. For example

A[3][3] = {{0.0}} or A[3] = {1, 1, 1}.

At the moment, initial values like {{0.0}} and {1, 1, 1} are passed in as simple strings.

gencode()
class pyop2.coffee.ast_base.ColSparseArrayInit(values, nonzero_bounds, numpy_values)

Bases: pyop2.coffee.ast_base.ArrayInit

Array initilizer in which zero-columns, i.e. columns full of zeros, are explictly tracked. Only bi-dimensional arrays are allowed.

Zero columns are tracked once the object is instantiated.

Parameters:
  • values – string representation of the values the array is initialized to
  • zerobounds – a tuple of two integers indicating the indices of the first and last nonzero columns
gencode()
class pyop2.coffee.ast_base.Par(expr)

Bases: pyop2.coffee.ast_base.UnaryExpr

Parenthesis object.

gencode()
class pyop2.coffee.ast_base.Sum(expr1, expr2)

Bases: pyop2.coffee.ast_base.BinExpr

Binary sum.

class pyop2.coffee.ast_base.Sub(expr1, expr2)

Bases: pyop2.coffee.ast_base.BinExpr

Binary subtraction.

class pyop2.coffee.ast_base.Prod(expr1, expr2)

Bases: pyop2.coffee.ast_base.BinExpr

Binary product.

class pyop2.coffee.ast_base.Div(expr1, expr2)

Bases: pyop2.coffee.ast_base.BinExpr

Binary division.

class pyop2.coffee.ast_base.Less(expr1, expr2)

Bases: pyop2.coffee.ast_base.BinExpr

Compare two expressions using the operand <.

class pyop2.coffee.ast_base.FunCall(function_name, *args)

Bases: pyop2.coffee.ast_base.Expr, pyop2.coffee.ast_base.Perfect

Function call.

gencode(scope=False)
class pyop2.coffee.ast_base.Ternary(expr, true_stmt, false_stmt)

Bases: pyop2.coffee.ast_base.Expr

Ternary operator: expr ? true_stmt : false_stmt.

gencode()
class pyop2.coffee.ast_base.Symbol(symbol, rank=(), offset=())

Bases: pyop2.coffee.ast_base.Expr

A generic symbol. The length of rank is the tensor rank:

  • 0: scalar
  • 1: array
  • 2: matrix, etc.
Parameters:rank (tuple) – entries represent the iteration variables the symbol depends on, or explicit numbers representing the entry of a tensor the symbol is accessing, or the size of the tensor itself.
gencode()
class pyop2.coffee.ast_base.AVXSum(expr1, expr2)

Bases: pyop2.coffee.ast_base.Sum

Sum of two vector registers using AVX intrinsics.

gencode(scope=False)
class pyop2.coffee.ast_base.AVXSub(expr1, expr2)

Bases: pyop2.coffee.ast_base.Sub

Subtraction of two vector registers using AVX intrinsics.

gencode()
class pyop2.coffee.ast_base.AVXProd(expr1, expr2)

Bases: pyop2.coffee.ast_base.Prod

Product of two vector registers using AVX intrinsics.

gencode()
class pyop2.coffee.ast_base.AVXDiv(expr1, expr2)

Bases: pyop2.coffee.ast_base.Div

Division of two vector registers using AVX intrinsics.

gencode()
class pyop2.coffee.ast_base.AVXLoad(symbol, rank=(), offset=())

Bases: pyop2.coffee.ast_base.Symbol

Load of values in a vector register using AVX intrinsics.

gencode()
class pyop2.coffee.ast_base.AVXSet(symbol, rank=(), offset=())

Bases: pyop2.coffee.ast_base.Symbol

Replicate the symbol’s value in all slots of a vector register using AVX intrinsics.

gencode()
class pyop2.coffee.ast_base.Statement(children=None, pragma=None)

Bases: pyop2.coffee.ast_base.Node

Base class for commands productions.

class pyop2.coffee.ast_base.EmptyStatement(children=None, pragma=None)

Bases: pyop2.coffee.ast_base.Statement, pyop2.coffee.ast_base.Perfect

Empty statement.

gencode()
class pyop2.coffee.ast_base.FlatBlock(code, pragma=None)

Bases: pyop2.coffee.ast_base.Statement

Treat a chunk of code as a single statement, i.e. a C string

gencode(scope=False)
class pyop2.coffee.ast_base.Assign(sym, exp, pragma=None)

Bases: pyop2.coffee.ast_base.Statement, pyop2.coffee.ast_base.Perfect

Assign an expression to a symbol.

gencode(scope=False)
class pyop2.coffee.ast_base.Incr(sym, exp, pragma=None)

Bases: pyop2.coffee.ast_base.Statement, pyop2.coffee.ast_base.Perfect

Increment a symbol by an expression.

gencode(scope=False)
class pyop2.coffee.ast_base.Decr(sym, exp, pragma=None)

Bases: pyop2.coffee.ast_base.Statement, pyop2.coffee.ast_base.Perfect

Decrement a symbol by an expression.

gencode(scope=False)
class pyop2.coffee.ast_base.IMul(sym, exp, pragma=None)

Bases: pyop2.coffee.ast_base.Statement, pyop2.coffee.ast_base.Perfect

In-place multiplication of a symbol by an expression.

gencode(scope=False)
class pyop2.coffee.ast_base.IDiv(sym, exp, pragma=None)

Bases: pyop2.coffee.ast_base.Statement, pyop2.coffee.ast_base.Perfect

In-place division of a symbol by an expression.

gencode(scope=False)
class pyop2.coffee.ast_base.Decl(typ, sym, init=None, qualifiers=None, attributes=None, pragma=None)

Bases: pyop2.coffee.ast_base.Statement, pyop2.coffee.ast_base.Perfect

Declaration of a symbol.

Syntax:

[qualifiers] typ sym [attributes] [= init];

E.g.:

static const double FE0[3][3] __attribute__(align(32)) = {{...}};
size()

Return the size of the declared variable. In particular, return

  • (0,), if it is a scalar
  • a tuple, if it is a N-dimensional array, such that each entry represents the size of an array dimension (e.g. double A[20][10] -> (20, 10))
gencode(scope=False)
get_nonzero_columns()

If the declared array:

  • is a bi-dimensional array,
  • is initialized to some values,
  • the initialized values are of type ColSparseArrayInit

Then return a tuple of the first and last non-zero columns in the array. Else, return an empty tuple.

class pyop2.coffee.ast_base.Block(stmts, pragma=None, open_scope=False)

Bases: pyop2.coffee.ast_base.Statement

Block of statements.

gencode(scope=False)
class pyop2.coffee.ast_base.For(init, cond, incr, body, pragma=None)

Bases: pyop2.coffee.ast_base.Statement

Represent the classic for loop of an imperative language, although some restrictions must be considered: only a single iteration variable can be declared and modified (i.e. it is not supported something like

for (int i = 0, j = 0; ...)
it_var()
start()
end()
size()
increment()
gencode(scope=False)
class pyop2.coffee.ast_base.Switch(switch_expr, cases)

Bases: pyop2.coffee.ast_base.Statement

Switch construct.

Parameters:
  • switch_expr – The expression over which to switch.
  • cases – A tuple of pairs ((case, statement),...)
gencode()
class pyop2.coffee.ast_base.FunDecl(ret, name, args, body, pred=[], headers=None)

Bases: pyop2.coffee.ast_base.Statement

Function declaration.

Syntax:

[pred] ret name ([args]) {body};

E.g.:

static inline void foo(int a, int b) {return;};
gencode()
class pyop2.coffee.ast_base.AVXStore(sym, exp, pragma=None)

Bases: pyop2.coffee.ast_base.Assign

Store of values in a vector register using AVX intrinsics.

gencode(scope=False)
class pyop2.coffee.ast_base.AVXLocalPermute(r, mask)

Bases: pyop2.coffee.ast_base.Statement

Permutation of values in a vector register using AVX intrinsics. The intrinsic function used is _mm256_permute_pd.

gencode(scope=True)
class pyop2.coffee.ast_base.AVXGlobalPermute(r1, r2, mask)

Bases: pyop2.coffee.ast_base.Statement

Permutation of values in two vector registers using AVX intrinsics. The intrinsic function used is _mm256_permute2f128_pd.

gencode(scope=True)
class pyop2.coffee.ast_base.AVXUnpackHi(r1, r2)

Bases: pyop2.coffee.ast_base.Statement

Unpack of values in a vector register using AVX intrinsics. The intrinsic function used is _mm256_unpackhi_pd.

gencode(scope=True)
class pyop2.coffee.ast_base.AVXUnpackLo(r1, r2)

Bases: pyop2.coffee.ast_base.Statement

Unpack of values in a vector register using AVX intrinsics. The intrinsic function used is _mm256_unpacklo_pd.

gencode(scope=True)
class pyop2.coffee.ast_base.AVXSetZero(children=None, pragma=None)

Bases: pyop2.coffee.ast_base.Statement

Set to 0 the entries of a vector register using AVX intrinsics.

gencode(scope=True)
class pyop2.coffee.ast_base.PreprocessNode(prep)

Bases: pyop2.coffee.ast_base.Node

Represent directives which are handled by the C’s preprocessor.

gencode(scope=False)
pyop2.coffee.ast_base.indent(block)

Indent each row of the given string block with n*2 spaces.

pyop2.coffee.ast_base.semicolon(scope)
pyop2.coffee.ast_base.c_sym(const)
pyop2.coffee.ast_base.c_for(var, to, code, pragma='#pragma pyop2 itspace')
pyop2.coffee.ast_base.c_flat_for(code, parent)
pyop2.coffee.ast_base.c_from_itspace_to_fors(itspaces)

pyop2.coffee.ast_linearalgebra module

class pyop2.coffee.ast_linearalgebra.AssemblyLinearAlgebra(assembly_optimizer, kernel_decls)

Bases: object

Convert assembly code into sequences of calls to external dense linear algebra libraries. Currently, MKL, ATLAS, and EIGEN are supported.

Initialize an AssemblyLinearAlgebra object.

Parameters:
  • assembly_optimizer – an AssemblyOptimizer object of the AST
  • kernel_decls – list of declarations used in the AST
transform(library)

Transform perfect loop nests representing matrix-matrix multiplies into calls to a dense linear algebra library.

Parameters:library – the BLAS library that should be used (mkl, atlas, eigen).

pyop2.coffee.ast_optimizer module

class pyop2.coffee.ast_optimizer.AssemblyOptimizer(loop_nest, pre_header, kernel_decls, is_mixed)

Bases: object

Assembly optimiser interface class

Initialize the AssemblyOptimizer.

Parameters:
  • loop_nest – root node of the local assembly code.
  • pre_header – parent of the root node
  • kernel_decls – list of declarations of variables which are visible within the local assembly code block.
  • is_mixed – true if the assembly operation uses mixed (vector) function spaces.
extract_itspace()

Remove fully-parallel loop from the iteration space. These are the loops that were marked by the user/higher layer with a pragma pyop2 itspace.

rewrite(level)

Rewrite an assembly expression to minimize floating point operations and relieve register pressure. This involves several possible transformations:

  1. Generalized loop-invariant code motion
  2. Factorization of common loop-dependent terms
  3. Expansion of constants over loop-dependent terms
  4. Zero-valued columns avoidance
  5. Precomputation of integration-dependent terms
Parameters:level

The optimization level (0, 1, 2, 3, 4). The higher, the more invasive is the re-writing of the assembly expressions, trying to eliminate unnecessary floating point operations.

  • level == 1: performs “basic” generalized loop-invariant code motion
  • level == 2: level 1 + expansion of terms, factorization of basis functions appearing multiple times in the same expression, and finally another run of loop-invariant code motion to move invariant sub-expressions exposed by factorization
  • level == 3: level 2 + avoid computing zero-columns
  • level == 4: level 3 + precomputation of read-only expressions out of the assembly loop nest
slice(slice_factor=None)

Perform slicing of the innermost loop to enhance register reuse. For example, given a loop:

for i = 0 to N
  f()

the following sequence of loops is generated:

for i = 0 to k
  f()
for i = k to 2k
  f()
# ...
for i = (N-1)k to N
  f()

The goal is to improve register re-use by relying on the backend compiler unrolling and vector-promoting the sliced loops.

unroll(loops_factor)

Unroll loops in the assembly nest.

Parameters:loops_factor

dictionary from loops to unroll (factor, increment). Loops are specified as integers:

  • 0 = integration loop,
  • 1 = test functions loop,
  • 2 = trial functions loop.

A factor of 0 denotes that the corresponding loop is not present.

permute()

Permute the integration loop with the innermost loop in the assembly nest. This transformation is legal if _precompute was invoked. Storage layout of all 2-dimensional arrays involved in the element matrix computation is transposed.

split(cut=1)

Split assembly expressions into multiple chunks exploiting sum’s associativity. Each chunk will have cut summands.

For example, consider the following piece of code:

for i
  for j
    A[i][j] += X[i]*Y[j] + Z[i]*K[j] + B[i]*X[j]

If cut=1 the expression is cut into chunks of length 1:

for i
  for j
    A[i][j] += X[i]*Y[j]
for i
  for j
    A[i][j] += Z[i]*K[j]
for i
  for j
    A[i][j] += B[i]*X[j]

If cut=2 the expression is cut into chunks of length 2, plus a remainder chunk of size 1:

for i
  for j
    A[i][j] += X[i]*Y[j] + Z[i]*K[j]
# Remainder:
for i
  for j
    A[i][j] += B[i]*X[j]
class pyop2.coffee.ast_optimizer.AssemblyRewriter(expr, int_loop, syms, decls, parent, hoisted, expr_graph)

Bases: object

Provide operations to re-write an assembly expression:

  • Loop-invariant code motion: find and hoist sub-expressions which are invariant with respect to an assembly loop
  • Expansion: transform an expression (a + b)*c into (a*c + b*c)
  • Distribute: transform an expression a*b + a*c into a*(b+c)

Initialize the AssemblyRewriter.

Parameters:
  • expr – provide generic information related to an assembly expression, including the depending for loops.
  • int_loop – the loop along which integration is performed.
  • syms – list of AST symbols used to evaluate the local element matrix.
  • decls – list of AST declarations of the various symbols in syms.
  • parent – the parent AST node of the assembly loop nest.
  • hoisted – dictionary that tracks hoisted expressions
  • expr_graph – expression graph that tracks symbol dependencies
licm()

Perform loop-invariant code motion.

Invariant expressions found in the loop nest are moved “after” the outermost independent loop and “after” the fastest varying dimension loop. Here, “after” means that if the loop nest has two loops i and j, and j is in the body of i, then i comes after j (i.e. the loop nest has to be read from right to left).

For example, if a sub-expression E depends on [i, j] and the loop nest has three loops [i, j, k], then E is hoisted out from the body of k to the body of i). All hoisted expressions are then wrapped within a suitable loop in order to exploit compiler autovectorization. Note that this applies to constant sub-expressions as well, in which case hoisting after the outermost loop takes place.

count_occurrences(str_key=False)

For each variable in the assembly expression, count how many times it appears as involved in some operations. For example, for the expression a*(5+c) + b*(a+4), return {a: 2, b: 1, c: 1}.

expand()

Expand assembly expressions such that:

Y[j] = f(...)
(X[i]*Y[j])*F + ...

becomes:

Y[j] = f(...)*F
(X[i]*Y[j]) + ...

This may be useful for several purposes:

  • Relieve register pressure; when, for example, (X[i]*Y[j]) is computed in a loop L’ different than the loop L’’ in which Y[j] is evaluated, and cost(L') > cost(L'')
  • It is also a step towards exposing well-known linear algebra operations, like matrix-matrix multiplies.
distribute()

Apply to the distributivity property to the assembly expression. E.g.

A[i]*B[j] + A[i]*C[j]

becomes

A[i]*(B[j] + C[j]).
simplify()

Scan the hoisted terms one by one and eliminate duplicate sub-expressions. Remove useless assignments (e.g. a = b, and b never used later).

class pyop2.coffee.ast_optimizer.ExpressionExpander(var_info, expr_graph, expr)

Bases: object

Expand assembly expressions such that:

Y[j] = f(...)
(X[i]*Y[j])*F + ...

becomes:

Y[j] = f(...)*F
(X[i]*Y[j]) + ...
CONST = -1
ITVAR = -2
expand(node, parent, it_vars, exp_var)

Perform the expansion of the expression rooted in node. Terms are expanded along the iteration variable exp_var.

class pyop2.coffee.ast_optimizer.LoopScheduler(expr_graph, root)

Bases: object

Base class for classes that handle loop scheduling; that is, loop fusion, loop distribution, etc.

Initialize the LoopScheduler.

Parameters:
  • expr_graph – the ExpressionGraph tracking all data dependencies involving identifiers that appear in root.
  • root – the node where loop scheduling takes place.
class pyop2.coffee.ast_optimizer.PerfectSSALoopMerger(expr_graph, root)

Bases: pyop2.coffee.ast_optimizer.LoopScheduler

Analyze data dependencies and iteration spaces, then merge fusable loops. Statements must be in “soft” SSA form: they can be declared and initialized at declaration time, then they can be assigned a value in only one place.

merge()

Merge perfect loop nests rooted in self.root.

class pyop2.coffee.ast_optimizer.ExprLoopFissioner(expr_graph, root, cut)

Bases: pyop2.coffee.ast_optimizer.LoopScheduler

Analyze data dependencies and iteration spaces, then fission associative operations in expressions. Fissioned expressions are placed in a separate loop nest.

Initialize the ExprLoopFissioner.

Parameters:cut – number of operands requested to fission expressions.
expr_fission(expr, copy_loops)

Split an expression containing x summands into x/cut chunks. Each chunk is placed in a separate loop nest if copy_loops is true, in the same loop nest otherwise. Return a dictionary of all of the split chunks, in which each entry has the same format of expr.

Parameters:
  • expr – the expression that needs to be split. This is given as a tuple of two elements: the former is the expression root node; the latter includes info about the expression, particularly iteration variables of the enclosing loops, the enclosing loops themselves, and the parent block.
  • copy_loops – true if the split expressions should be placed in two separate, adjacent loop nests (iterating, of course, along the same iteration space); false, otherwise.
class pyop2.coffee.ast_optimizer.ZeroLoopScheduler(expr_graph, root, decls)

Bases: pyop2.coffee.ast_optimizer.LoopScheduler

Analyze data dependencies, iteration spaces, and domain-specific information to perform symbolic execution of the assembly code so as to determine how to restructure the loop nests to skip iteration over zero-valued columns. This implies that loops can be fissioned or merged. For example:

for i = 0, N
  A[i] = C[i]*D[i]
  B[i] = E[i]*F[i]

If the evaluation of A requires iterating over a region of contiguous zero-valued columns in C and D, then A is computed in a separate (smaller) loop nest:

for i = 0 < (N-k)
  A[i+k] = C[i+k][i+k]
for i = 0, N
  B[i] = E[i]*F[i]

Initialize the ZeroLoopScheduler.

Parameters:decls – lists of array declarations. A 2-tuple is expected: the first element is the list of kernel declarations; the second element is the list of hoisted temporaries declarations.
reschedule()

Restructure the loop nests rooted in self.root based on the propagation of zero-valued columns along the computation. This, therefore, involves fissing and fusing loops so as to remove iterations spent performing arithmetic operations over zero-valued entries. Return a list of dictionaries, a dictionary for each loop nest encountered. Each entry in a dictionary is of the form {stmt: (itvars, parent, loops)}, in which stmt is a statement found in the loop nest from which the dictionary derives, itvars is the tuple of the iteration variables of the enclosing loops, parent is the AST node in which the loop nest is rooted, loops is the tuple of loops composing the loop nest.

class pyop2.coffee.ast_optimizer.ExpressionGraph

Bases: object

Track read-after-write dependencies between symbols.

add_dependency(sym, expr, self_loop)

Extract symbols from expr and create a read-after-write dependency with sym. If sym already has a dependency, then sym has a self dependency on itself.

has_dep(sym, target_sym=None)

If target_sym is not provided, return True if sym has a read-after-write dependency with some other symbols. This is the case if sym has either a self dependency or at least one input edge, meaning that other symbols depend on it. Otherwise, if target_sym is not None, return True if sym has a read-after-write dependency on it, i.e. if there is an edge from target_sym to sym.

pyop2.coffee.ast_plan module

Transform the kernel’s AST according to the backend we are running over.

class pyop2.coffee.ast_plan.ASTKernel(ast, include_dirs=[])

Bases: object

Manipulate the kernel’s Abstract Syntax Tree.

The single functionality present at the moment is provided by the plan_gpu() method, which transforms the AST for GPU execution.

plan_gpu()

Transform the kernel suitably for GPU execution.

Loops decorated with a pragma pyop2 itspace are hoisted out of the kernel. The list of arguments in the function signature is enriched by adding iteration variables of hoisted loops. Size of kernel’s non-constant tensors modified in hoisted loops are modified accordingly.

For example, consider the following function:

void foo (int A[3]) {
  int B[3] = {...};
  #pragma pyop2 itspace
  for (int i = 0; i < 3; i++)
    A[i] = B[i];
}

plan_gpu modifies its AST such that the resulting output code is

void foo(int A[1], int i) {
  A[0] = B[i];
}
plan_cpu(opts)

Transform and optimize the kernel suitably for CPU execution.

gencode()

Generate a string representation of the AST.

pyop2.coffee.ast_plan.init_coffee(isa, comp, blas)

Initialize COFFEE.

pyop2.coffee.ast_utils module

Utility functions for AST transformation.

pyop2.coffee.ast_utils.increase_stack(asm_opt)

“Increase the stack size it the total space occupied by the kernel’s local arrays is too big.

pyop2.coffee.ast_utils.unroll_factors(sizes, ths)

Return a list of unroll factors to run, given loop sizes in sizes. The return value is a list of tuples, where each element in a tuple represents the unroll factor for the corresponding loop in the nest.

For example, if there are three loops i, j, and k, a tuple (2, 1, 1) in the returned list indicates that the outermost loop i should be unrolled by a factor two (i.e. two iterations), while loops j and k should not be unrolled.

Parameters:ths – unrolling threshold that cannot be exceed by the overall unroll factor
pyop2.coffee.ast_utils.ast_update_ofs(node, ofs)

Given a dictionary ofs s.t. {'itvar': ofs}, update the various iteration variables in the symbols rooted in node.

pyop2.coffee.ast_utils.itspace_size_ofs(itspace)

Given an itspace in the form

(('itvar', (bound_a, bound_b), ...)),

return

((('it_var', bound_b - bound_a), ...), (('it_var', bound_a), ...))
pyop2.coffee.ast_utils.itspace_merge(itspaces)

Given an iterator of iteration spaces, each iteration space represented as a 2-tuple containing the start and end point, return a tuple of iteration spaces in which contiguous iteration spaces have been merged. For example:

[(1,3), (4,6)] -> ((1,6),)
[(1,3), (5,6)] -> ((1,3), (5,6))

pyop2.coffee.ast_vectorizer module

class pyop2.coffee.ast_vectorizer.AssemblyVectorizer(assembly_optimizer, intrinsics, compiler)

Bases: object

Loop vectorizer

alignment(decl_scope)

Align all data structures accessed in the loop nest to the size in bytes of the vector length.

padding(decl_scope, nz_in_fors)

Pad all data structures accessed in the loop nest to the nearest multiple of the vector length. Adjust trip counts and bounds of all innermost loops where padded arrays are written to. Since padding enforces data alignment of multi-dimensional arrays, add suitable pragmas to inner loops to inform the backend compiler about this property.

outer_product(opts, factor=1)

Compute outer products according to opts.

  • opts = V_OP_PADONLY : no peeling, just use padding
  • opts = V_OP_PEEL : peeling for autovectorisation
  • opts = V_OP_UAJ : set unroll_and_jam factor
  • opts = V_OP_UAJ_EXTRA : as above, but extra iters avoid remainder loop factor is an additional parameter to specify things like unroll-and-jam factor. Note that factor is just a suggestion to the compiler, which can freely decide to use a higher or lower value.
class pyop2.coffee.ast_vectorizer.OuterProduct(stmt, loops, intr, nest)

Generate outer product vectorisation of a statement.

OP_STORE_IN_MEM = 0
OP_REGISTER_INC = 1
class Alloc(intr, tensor_size)

Bases: object

Handle allocation of register variables.

get_reg()
free_regs(regs)
get_tensor()
OuterProduct.generate(rows)

Generate the outer-product intrinsics-based vectorisation code.

pyop2.coffee.ast_vectorizer.vect_roundup(x)

Return x rounded up to the vector length.

pyop2.coffee.ast_vectorizer.vect_rounddown(x)

Return x rounded down to the vector length.

pyop2.coffee.ast_vectorizer.inner_loops(node)

Find inner loops in the subtree rooted in node.

Module contents