PyOP2 Concepts¶
Many numerical algorithms and scientific computations on unstructured meshes can be viewed as the independent application of a local operation everywhere on a mesh. This local operation is often called a computational kernel and its independent application lends itself naturally to parallel computation. An unstructured mesh can be described by sets of entities (vertices, edges, cells) and the connectivity between those sets forming the topology of the mesh.
PyOP2 is a domain-specific language (DSL) for the parallel executions of computational kernels on unstructured meshes or graphs.
Sets and mappings¶
A mesh is defined by sets
of entities and
mappings
between these sets. Sets are used to represent
entities in the mesh (nodes in the graph) or degrees of freedom of data
(fields) living “on” the mesh (graph), while maps define the connectivity
between entities (links in the graph) or degrees of freedom, for example
associating an edge with its incident vertices. Sets of mesh entities may
coincide with sets of degrees of freedom, but this is not necessarily the case
e.g. the set of degrees of freedom for a field may be defined on the vertices
of the mesh and the midpoints of edges connecting the vertices.
Note
There is a requirement for the map to be of constant arity, that is each element in the source set must be associated with a constant number of elements in the target set. There is no requirement for the map to be injective or surjective. This restriction excludes certain kinds of mappings e.g. a map from vertices to incident egdes or cells is only possible on a very regular mesh where the multiplicity of any vertex is constant.
In the following we declare a Set
vertices
, a
Set
edges
and a Map
edges2vertices
between them, which associates the two incident vertices with each edge:
vertices = op2.Set(4)
edges = op2.Set(3)
edges2vertices = op2.Map(edges, vertices, 2, [[0, 1], [1, 2], [2, 3]])
Data¶
PyOP2 distinguishes three kinds of user provided data: data that lives on a
set (often referred to as a field) is represented by a Dat
,
data that has no association with a set by a Global
and data
that is visible globally and referred to by a unique identifier is declared as
Const
. Examples of the use of these data types are given in
the Parallel loops section below.
Dat¶
Since a set does not have any type but only a cardinality, data declared on a
set through a Dat
needs additional metadata to allow PyOP2 to
interpret the data and to specify how much memory is required to store it. This
metadata is the datatype and the shape of the data associated with any
given set element. The shape is not associated with the Dat
directly, but with a DataSet
. One can associate a scalar with
each element of the set or a one- or higher-dimensional vector. Similar to the
restriction on maps, the shape and therefore the size of the data associated
which each element needs to be uniform. PyOP2 supports all common primitive
data types supported by NumPy. Custom datatypes are supported insofar as
the user implements the serialisation and deserialisation of that type into
primitive data that can be handled by PyOP2.
Declaring coordinate data on the vertices
defined above, where two float
coordinates are associated with each vertex, is done like this:
dvertices = op2.DataSet(vertices, dim=2)
coordinates = op2.Dat(dvertices,
[[0.0, 0.0], [0.0, 1.0], [1.0, 1.0], [1.0, 0.0]],
dtype=float)
Global¶
In contrast to a Dat
, a Global
has no
association to a set and the shape and type of the data are declared directly
on the Global
. A 2x2 elasticity tensor would be defined as
follows:
elasticity = op2.Global((2, 2), [[1.0, 0.0], [0.0, 1.0]], dtype=float)
Const¶
Data that is globally visible and read-only to kernels is declared with a
Const
and needs to have a globally unique identifier. It does
not need to be declared as an argument to a par_loop()
, but is
accessible in a kernel by name. A globally visible parameter eps
would be
declared as follows:
eps = op2.Const(1, 1e-14, name="eps", dtype=float)
Mat¶
In a PyOP2 context, a (sparse) matrix is a linear operator from one set to
another. In other words, it is a linear function which takes a
Dat
on one set A and returns the value of a
Dat
on another set B. Of course, in particular,
A may be the same set as B. This makes the operation of at
least some matrices equivalent to the operation of a particular PyOP2 kernel.
PyOP2 can be used to assemble matrices
, which are defined
on a sparsity pattern
which is built from a pair of
DataSets
defining the row and column spaces the
sparsity maps between and one or more pairs of maps, one for the row and one
for the column space of the matrix respectively. The sparsity uniquely defines
the non-zero structure of the sparse matrix and can be constructed purely from
those mappings. To declare a Mat
on a Sparsity
only the data type needs to be given.
Since the construction of large sparsity patterns is a very expensive
operation, the decoupling of Mat
and Sparsity
allows the reuse of sparsity patterns for a number of matrices without
recomputation. In fact PyOP2 takes care of caching sparsity patterns on behalf
of the user, so declaring a sparsity on the same maps as a previously declared
sparsity yields the cached object instead of building another one.
Defining a matrix of floats on a sparsity which spans from the space of vertices to the space of vertices via the edges is done as follows:
sparsity = op2.Sparsity((dvertices, dvertices),
[(edges2vertices, edges2vertices)])
matrix = op2.Mat(sparsity, float)
Parallel loops¶
Computations in PyOP2 are executed as parallel loops
of a Kernel
over an iteration set. Parallel loops are the
core construct of PyOP2 and hide most of its complexity such as parallel
scheduling, partitioning, colouring, data transfer from and to device and
staging of the data into on chip memory. Computations in a parallel loop must
be independent of the order in which they are executed over the set to allow
PyOP2 maximum flexibility to schedule the computation in the most efficient
way. Kernels are described in more detail in PyOP2 Kernels.
Loop invocations¶
A parallel loop invocation requires as arguments, other than the iteration set
and the kernel to operate on, the data the kernel reads and/or writes. A
parallel loop argument is constructed by calling the underlying data object
(i.e. the Dat
or Global
) and passing an
access descriptor and the mapping to be used when accessing the data. The
mapping is required for an indirectly accessed Dat
not
declared on the same set as the iteration set of the parallel loop. In the
case of directly accessed data defined on the same set as the iteration set
the map is omitted and only an access descriptor given.
Consider a parallel loop that translates the coordinate
field by a
constant offset given by the Const
offset
. Note how the
kernel has access to the local variable offset
even though it has not been
passed as an argument to the par_loop()
. This loop is direct and
the argument coordinates
is read and written:
op2.Const(2, [1.0, 1.0], dtype=float, name="offset");
translate = op2.Kernel("""void translate(double * coords) {
coords[0] += offset[0];
coords[1] += offset[1];
}""", "translate")
op2.par_loop(translate, vertices, coordinates(op2.RW))
Access descriptors¶
Access descriptors define how the data is accessed by the kernel and give
PyOP2 crucial information as to how the data needs to be treated during
staging in before and staging out after kernel execution. They must be one of
pyop2.READ
(read-only), pyop2.WRITE
(write-only),
pyop2.RW
(read-write), pyop2.INC
(increment),
pyop2.MIN
(minimum reduction) or pyop2.MAX
(maximum
reduction).
Not all of these descriptors apply to all PyOP2 data types. A
Dat
can have modes READ
, WRITE
,
RW
and INC
. For a Global
the
valid modes are READ
, INC
, MIN
and
MAX
and for a Mat
only WRITE
and
INC
are allowed.
Loops assembling matrices¶
We declare a parallel loop assembling the matrix
via a given kernel
which we’ll assume has been defined before over the edges
and with
coordinates
as input data. The matrix
is the output argument of this
parallel loop and therefore has the access descriptor INC
since
the assembly accumulates contributions from different vertices via the
edges2vertices
mapping. Note that the mappings are being indexed with the
iteration indices
op2.i[0]
and
op2.i[1]
respectively. This means that PyOP2 generates a local
iteration space of size arity * arity
with the
arity
of the Map
edges2vertices
for any given element
of the iteration set. This local iteration space is then iterated over using
the iteration indices on the maps. The kernel is assumed to only apply to a
single point in that local iteration space. The coordinates
are accessed
via the same mapping, but are a read-only input argument to the kernel and
therefore use the access descriptor READ
:
op2.par_loop(kernel, edges,
matrix(op2.INC, (edges2vertices[op2.i[0]],
edges2vertices[op2.i[1]])),
coordinates(op2.READ, edges2vertices))
You can stack up multiple successive parallel loops that add values to
a matrix, before you use the resulting values, you must explicitly
tell PyOP2 that you want to do so, by calling
assemble()
on the matrix. Note that executing a
solve()
will do this automatically for you.
Loops with global reductions¶
Globals
are used primarily for reductions where a
given quantity on a field is reduced to a single number by summation or
finding the minimum or maximum. Consider a kernel computing the L2 norm of
the pressure
field defined on the set of vertices
as l2norm
. Note
that the Dat
constructor automatically creates an anonymous
DataSet
of dimension 1 if a Set
is passed as
the first argument. We assume pressure
is the result of some prior
computation and only give the declaration for context.
pressure = op2.Dat(vertices, [...], dtype=float)
l2norm = op2.Global(dim=1, data=[0.0])
norm = op2.Kernel("""void norm(double * out, double * field) {
*out += field[0] * field[0];
}""", "norm")
op2.par_loop(pressure, vertices,
l2norm(op2.INC),
vertices(op2.READ))