PyOP2 Architecture

As described in PyOP2 Concepts, PyOP2 exposes an API that allows users to declare the topology of unstructured meshes in the form of Sets and Maps and data in the form of Dats, Mats, Globals and Consts. Computations on this data are described by Kernels described in PyOP2 Kernels and executed by parallel loops.

The API is the frontend to the PyOP2 runtime compilation architecture, which supports the generation and just-in-time (JIT) compilation of low-level code for a range of backends described in PyOP2 Backends and the efficient scheduling of parallel computations. A schematic overview of the PyOP2 architecture is given below:

_images/pyop2_architecture.svg

Schematic overview of the PyOP2 architecture

From an outside perspective, PyOP2 is a conventional Python library, with performance critical library functions implemented in Cython. A user’s application code makes calls to the PyOP2 API, most of which are conventional library calls. The exception are par_loop() calls, which encapsulate PyOP2’s runtime core functionality performing backend-specific code generation. Executing a parallel loop comprises the following steps:

  1. Compute a parallel execution plan, including information for efficient staging of data and partitioning and colouring of the iteration set for conflict-free parallel execution. This process is described in Parallel Execution Plan and does not apply to the sequential backend.

  2. Generate backend-specific code for executing the computation for a given set of par_loop() arguments as detailed in PyOP2 Backends according to the execution plan computed in the previous step.

  3. Pass the generated code to a backend-specific toolchain for just-in-time compilation, producing a shared library callable as a Python module which is dynamically loaded. This module is cached on disk to save recompilation when the same par_loop() is called again for the same backend.

  4. Build the backend-specific list of arguments to be passed to the generated code, which may initiate host to device data transfer for the CUDA and OpenCL backends.

  5. Call into the generated module to perform the actual computation. For distributed parallel computations this involves separate calls for the regions owned by the current processor and the halo as described in MPI.

  6. Perform any necessary reductions for Globals.

  7. Call the backend-specific matrix assembly procedure on any Mat arguments.

Multiple Backend Support

The backend is selected by passing the keyword argument backend to the init() function. If omitted, the sequential backend is selected by default. This choice can be overridden by exporting the environment variable PYOP2_BACKEND, which allows switching backends without having to touch the code. Once chosen, the backend cannot be changed for the duration of the running Python interpreter session.

PyOP2 provides a single API to the user, regardless of which backend the computations are running on. All classes and functions that form the public API defined in pyop2.op2 are interfaces, whose concrete implementations are initialised according to the chosen backend. A metaclass takes care of instantiating a backend-specific version of the requested class and setting the corresponding docstrings such that this process is entirely transparent to the user. The implementation of the PyOP2 backends is completely orthogonal to the backend selection process and free to use established practices of object-oriented design.