PyOP2 Concepts

Many numerical algorithms and scientific computations on unstructured meshes can be viewed as the independent application of a local operation everywhere on a mesh. This local operation is often called a computational kernel and its independent application lends itself naturally to parallel computation. An unstructured mesh can be described by sets of entities (vertices, edges, cells) and the connectivity between those sets forming the topology of the mesh.

PyOP2 is a domain-specific language (DSL) for the parallel executions of computational kernels on unstructured meshes or graphs.

Sets and mappings

A mesh is defined by sets of entities and mappings between these sets. Sets are used to represent entities in the mesh (nodes in the graph) or degrees of freedom of data (fields) living “on” the mesh (graph), while maps define the connectivity between entities (links in the graph) or degrees of freedom, for example associating an edge with its incident vertices. Sets of mesh entities may coincide with sets of degrees of freedom, but this is not necessarily the case e.g. the set of degrees of freedom for a field may be defined on the vertices of the mesh and the midpoints of edges connecting the vertices.


There is a requirement for the map to be of constant arity, that is each element in the source set must be associated with a constant number of elements in the target set. There is no requirement for the map to be injective or surjective. This restriction excludes certain kinds of mappings e.g. a map from vertices to incident egdes or cells is only possible on a very regular mesh where the multiplicity of any vertex is constant.

In the following we declare a Set vertices, a Set edges and a Map edges2vertices between them, which associates the two incident vertices with each edge:

vertices = op2.Set(4)
edges = op2.Set(3)
edges2vertices = op2.Map(edges, vertices, 2, [[0, 1], [1, 2], [2, 3]])


PyOP2 distinguishes three kinds of user provided data: data that lives on a set (often referred to as a field) is represented by a Dat, data that has no association with a set by a Global and data that is visible globally and referred to by a unique identifier is declared as Const. Examples of the use of these data types are given in the Parallel loops section below.


Since a set does not have any type but only a cardinality, data declared on a set through a Dat needs additional metadata to allow PyOP2 to interpret the data and to specify how much memory is required to store it. This metadata is the datatype and the shape of the data associated with any given set element. The shape is not associated with the Dat directly, but with a DataSet. One can associate a scalar with each element of the set or a one- or higher-dimensional vector. Similar to the restriction on maps, the shape and therefore the size of the data associated which each element needs to be uniform. PyOP2 supports all common primitive data types supported by NumPy. Custom datatypes are supported insofar as the user implements the serialisation and deserialisation of that type into primitive data that can be handled by PyOP2.

Declaring coordinate data on the vertices defined above, where two float coordinates are associated with each vertex, is done like this:

dvertices = op2.DataSet(vertices, dim=2)
coordinates = op2.Dat(dvertices,
                      [[0.0, 0.0], [0.0, 1.0], [1.0, 1.0], [1.0, 0.0]],


In contrast to a Dat, a Global has no association to a set and the shape and type of the data are declared directly on the Global. A 2x2 elasticity tensor would be defined as follows:

elasticity = op2.Global((2, 2), [[1.0, 0.0], [0.0, 1.0]], dtype=float)


Data that is globally visible and read-only to kernels is declared with a Const and needs to have a globally unique identifier. It does not need to be declared as an argument to a par_loop(), but is accessible in a kernel by name. A globally visible parameter eps would be declared as follows:

eps = op2.Const(1, 1e-14, name="eps", dtype=float)


In a PyOP2 context, a (sparse) matrix is a linear operator from one set to another. In other words, it is a linear function which takes a Dat on one set A and returns the value of a Dat on another set B. Of course, in particular, A may be the same set as B. This makes the operation of at least some matrices equivalent to the operation of a particular PyOP2 kernel.

PyOP2 can be used to assemble matrices, which are defined on a sparsity pattern which is built from a pair of DataSets defining the row and column spaces the sparsity maps between and one or more pairs of maps, one for the row and one for the column space of the matrix respectively. The sparsity uniquely defines the non-zero structure of the sparse matrix and can be constructed purely from those mappings. To declare a Mat on a Sparsity only the data type needs to be given.

Since the construction of large sparsity patterns is a very expensive operation, the decoupling of Mat and Sparsity allows the reuse of sparsity patterns for a number of matrices without recomputation. In fact PyOP2 takes care of caching sparsity patterns on behalf of the user, so declaring a sparsity on the same maps as a previously declared sparsity yields the cached object instead of building another one.

Defining a matrix of floats on a sparsity which spans from the space of vertices to the space of vertices via the edges is done as follows:

sparsity = op2.Sparsity((dvertices, dvertices),
                        [(edges2vertices, edges2vertices)])
matrix = op2.Mat(sparsity, float)

Parallel loops

Computations in PyOP2 are executed as parallel loops of a Kernel over an iteration set. Parallel loops are the core construct of PyOP2 and hide most of its complexity such as parallel scheduling, partitioning, colouring, data transfer from and to device and staging of the data into on chip memory. Computations in a parallel loop must be independent of the order in which they are executed over the set to allow PyOP2 maximum flexibility to schedule the computation in the most efficient way. Kernels are described in more detail in PyOP2 Kernels.

Loop invocations

A parallel loop invocation requires as arguments, other than the iteration set and the kernel to operate on, the data the kernel reads and/or writes. A parallel loop argument is constructed by calling the underlying data object (i.e. the Dat or Global) and passing an access descriptor and the mapping to be used when accessing the data. The mapping is required for an indirectly accessed Dat not declared on the same set as the iteration set of the parallel loop. In the case of directly accessed data defined on the same set as the iteration set the map is omitted and only an access descriptor given.

Consider a parallel loop that translates the coordinate field by a constant offset given by the Const offset. Note how the kernel has access to the local variable offset even though it has not been passed as an argument to the par_loop(). This loop is direct and the argument coordinates is read and written:

op2.Const(2, [1.0, 1.0], dtype=float, name="offset");

translate = op2.Kernel("""void translate(double * coords) {
  coords[0] += offset[0];
  coords[1] += offset[1];
}""", "translate")

op2.par_loop(translate, vertices, coordinates(op2.RW))

Access descriptors

Access descriptors define how the data is accessed by the kernel and give PyOP2 crucial information as to how the data needs to be treated during staging in before and staging out after kernel execution. They must be one of pyop2.READ (read-only), pyop2.WRITE (write-only), pyop2.RW (read-write), pyop2.INC (increment), pyop2.MIN (minimum reduction) or pyop2.MAX (maximum reduction).

Not all of these descriptors apply to all PyOP2 data types. A Dat can have modes READ, WRITE, RW and INC. For a Global the valid modes are READ, INC, MIN and MAX and for a Mat only WRITE and INC are allowed.

Loops assembling matrices

We declare a parallel loop assembling the matrix via a given kernel which we’ll assume has been defined before over the edges and with coordinates as input data. The matrix is the output argument of this parallel loop and therefore has the access descriptor INC since the assembly accumulates contributions from different vertices via the edges2vertices mapping. Note that the mappings are being indexed with the iteration indices op2.i[0] and op2.i[1] respectively. This means that PyOP2 generates a local iteration space of size arity * arity with the arity of the Map edges2vertices for any given element of the iteration set. This local iteration space is then iterated over using the iteration indices on the maps. The kernel is assumed to only apply to a single point in that local iteration space. The coordinates are accessed via the same mapping, but are a read-only input argument to the kernel and therefore use the access descriptor READ:

op2.par_loop(kernel, edges,
             matrix(op2.INC, (edges2vertices[op2.i[0]],
             coordinates(op2.READ, edges2vertices))

You can stack up multiple successive parallel loops that add values to a matrix, before you use the resulting values, you must explicitly tell PyOP2 that you want to do so, by calling assemble() on the matrix. Note that executing a solve() will do this automatically for you.

Loops with global reductions

Globals are used primarily for reductions where a given quantity on a field is reduced to a single number by summation or finding the minimum or maximum. Consider a kernel computing the L2 norm of the pressure field defined on the set of vertices as l2norm. Note that the Dat constructor automatically creates an anonymous DataSet of dimension 1 if a Set is passed as the first argument. We assume pressure is the result of some prior computation and only give the declaration for context.

pressure = op2.Dat(vertices, [...], dtype=float)
l2norm = op2.Global(dim=1, data=[0.0])

norm = op2.Kernel("""void norm(double * out, double * field) {
  *out += field[0] * field[0];
}""", "norm")

op2.par_loop(pressure, vertices,