Parallel Computations with Dolfinx using MPI#

The aim of this tutorial is to show how variational problems can be solved with Python using Dolfinx running in parallel. The Message Passing Interface (MPI) standard will be used to carry out parallel computations. We will use the mpi4py package to interface MPI in Python.

First, we will look at some basic examples of MPI usage.

Next, we will cover how to define finite element function spaces and functions on several processes.

Furthermore, creating and distributing a finite element mesh in parallel will be demonstrated.

Finally, the elements of the tutorial are combined to show how the variational problem related to a partial differential equation can be solved in parallel.

This tutorial is inspired by and based on https://newfrac.gitlab.io/newfrac-fenicsx-training/05-dolfinx-parallel/dolfinx-parallel.html and https://jsdokken.com/dolfinx_docs/meshes.html.

## Parallel programming imports
import ipyparallel as ipp
from mpi4py import MPI

The demo is created for a specific version of DOLFINx, check the environment.yml file in the Github repository for compatible versions, or look at the version printed on below (on the web-page)

import dolfinx
print(f"Tutorial is compatible with DOLFINx {dolfinx.__version__}")
Tutorial is compatible with DOLFINx 0.7.2

Setting up a cluster#

ipyparallel is used to set up a local cluster consisting of 2 processors. To run Jupyter Notebook cells in parallel, we use %%px cell magic. To learn more about this, see the first parts of the MPI tutorial (Introduction to MPI) as well as https://ipyparallel.readthedocs.io/en/latest/tutorial/magics.html#px-cell-magic.

cluster = ipp.Cluster(engines = "mpi", n = 2)
rc = cluster.start_and_connect_sync()
Starting 2 engines with <class 'ipyparallel.cluster.launcher.MPIEngineSetLauncher'>
INFO:ipyparallel.cluster.cluster.1704465271-hrup:Starting 2 engines with <class 'ipyparallel.cluster.launcher.MPIEngineSetLauncher'>

MPI communication in Dolfinx#

When constructing a mesh in Dolfinx, the type of communicator must be specified. The mesh is partitioned by distributing the nodes of the mesh over different processes.

A parameter ‘ghost_mode’ must be specified. This determines how shared nodes are distributed as ghost nodes between the processes, i.e. which nodes are owned by the local processes and which nodes are ghost nodes that belong to the neighboring processes. We will used the ‘shared_facet’ option, where facet nodes are shared after mesh partitioning.

%%px
import dolfinx as dfx
import ufl

comm = MPI.COMM_WORLD # MPI communicator

# Define a function used to print stuff with the processor rank number in front
def mpi_print(s):
    print(f"Rank {comm.rank}: {s}")

Nx, Ny = 2, 2 # Mesh size

# Create a unit square mesh
mesh = dfx.mesh.create_unit_square(comm, Nx, Ny, ghost_mode = dfx.cpp.mesh.GhostMode.shared_facet)
tdim = mesh.topology.dim

The connectivity mapping between cells, facets and vertices of the mesh must be created. If the problem at hand does not require e.g. the mapping between cells and facets, one can omit creating the respective connectivity map to save computation time. Let’s create the mapping between cells (for the unit square these are of dimension 2) and facets (dimension 1) and print them to see how they are distributed over the two processes

%%px
mesh.topology.create_connectivity(tdim, tdim-1)
print(f"Cell (dim = {tdim}) to facet (dim = {tdim-1}) connectivity:")
mpi_print(mesh.topology.connectivity(tdim, tdim-1))
[stdout:0] 
Cell (dim = 2) to facet (dim = 1) connectivity:
Rank 0: <AdjacencyList> with 6 nodes
  0: [0 3 1 ]
  1: [4 3 8 ]
  2: [2 6 4 ]
  3: [5 6 7 ]
  4: [8 11 10 ]
  5: [7 9 12 ]

[stdout:1] 
Cell (dim = 2) to facet (dim = 1) connectivity:
Rank 1: <AdjacencyList> with 6 nodes
  0: [7 1 0 ]
  1: [3 1 2 ]
  2: [9 5 3 ]
  3: [6 5 4 ]
  4: [10 8 7 ]
  5: [12 11 9 ]

The ghost nodes for each processor rank is stored in the index map of the mesh topology:

%%px
mpi_print(f"Ghost cells (global numbering): {mesh.topology.index_map(tdim).ghosts}")
[stdout:0] 
Rank 0: Ghost cells (global numbering): [4 6]
[stdout:1] 
Rank 1: Ghost cells (global numbering): [1 3]

Dolfinx function spaces#

The degrees of freedom of a finite element function space in dolfinx is distributed over the nodes of the mesh. To illustrate, we create a function space with 1st order Lagrange elements and print the global and local sizes of the dofmap, as well as the ghost nodes.

%%px
V = dfx.fem.FunctionSpace(mesh, ("Lagrange", 1))

mpi_print(f"Global dofmap size: {V.dofmap.index_map.size_global}")
mpi_print(f"Local dofmap size: {V.dofmap.index_map.size_local}")
mpi_print(f"Ghosts: {V.dofmap.index_map.ghosts}")
[stdout:0] 
Rank 0: Global dofmap size: 9
Rank 0: Local dofmap size: 4
Rank 0: Ghosts: [5 6 4 7]
[stdout:1] 
Rank 1: Global dofmap size: 9
Rank 1: Local dofmap size: 5
Rank 1: Ghosts: [3 1 2]

Dolfinx functions#

The degrees of freedom of a dolfinx function is distributed over the nodes of the mesh in the same way as function spaces, as the functions created from a function space inherit the dofmap of the space that they live in. We create a function from the previously defined space \(V\) and print the size of the array.

%%px
u = dfx.fem.Function(V)
mpi_print(f"Local size of array: {u.x.index_map.size_local}")
mpi_print(f"Global size of array: {u.x.index_map.size_global}")
[stdout:0] 
Rank 0: Local size of array: 4
Rank 0: Global size of array: 9
[stdout:1] 
Rank 1: Local size of array: 5
Rank 1: Global size of array: 9

Since we have a scalar function, the size of the array of the function values is the same as the number of nodes in the mesh. If we e.g. had a two-dimensional vector function, the size of the array would be double the amount of mesh nodes.

We can also print the ghost nodes and the rank of the processor owning the ghost nodes:

%%px
mpi_print(f"Ghosts: {u.x.index_map.ghosts}")
mpi_print(f"Ghost owners: {u.x.index_map.owners}")
[stdout:0] 
Rank 0: Ghosts: [5 6 4 7]
Rank 0: Ghost owners: [1 1 1 1]
[stdout:1] 
Rank 1: Ghosts: [3 1 2]
Rank 1: Ghost owners: [0 0 0]

Assembling scalars, vectors, matrices in parallel#

To solve continuous problems numerically, we have to assemble a linear system of equations arising from discretization. Assembling scalars, vectors and matrices in dolfinx has to be carried out carefully when using several processes. We have to make sure that the processors communicate changes in values of overlapping nodes. We start by creating trial and test functions \(u\) and \(v\) from our function space \(V\).

%%px

# Trial and test functions
u = ufl.TrialFunction(V)
v = ufl.TestFunction (V)

Let us consider a linear form

(1)#\[\begin{equation} L(v) = \int_{\Omega}f v dx \end{equation}\]

where \(v\) is a test function, \(\Omega\) is the domain that we have discretized with our mesh and \(f\) is a scalar-valued function. The test function is discretized with 1st order continuous Lagrange elements, and to assemble it as a vector we can run

%%px

import ufl
import dolfinx.fem.petsc as dfx_petsc

# UFL form of right-hand side
L = ufl.inner(1.0, v) * ufl.dx
L = dfx.fem.form(L)

# Assemble UFL form into a vector
_b = dfx.fem.Function(V)
dfx_petsc.assemble_vector(_b.vector, L)
Out[0:8]: <petsc4py.PETSc.Vec at 0x7f1540e6a340>
Out[1:8]: <petsc4py.PETSc.Vec at 0x7f8ff4622570>

Now, after assembling, it is important to distribute the node values from the different processes. After the initial assembly, our vector holds the values

%%px
print("After assembly, prior to communication")
mpi_print(_b.x.array)
[stdout:0] 
After assembly, prior to communication
Rank 0: [0.04166667 0.125      0.08333333 0.04166667 0.08333333 0.125
 0.         0.        ]
[stdout:1] 
After assembly, prior to communication
Rank 1: [0.08333333 0.04166667 0.125      0.125      0.04166667 0.08333333
 0.         0.        ]

First, we well add values from ghost regions and accumulate them on the owning process. To do this, we use the scatter_reverse function with insertion mode “add”:

%%px

# Add values from ghost regions and accumulate them on the owning process
_b.x.scatter_reverse(dfx.la.InsertMode.add)

print("After ADD/REVERSE update")
mpi_print(_b.x.array)   
[stdout:0] 
After ADD/REVERSE update
Rank 0: [0.04166667 0.125      0.08333333 0.125      0.08333333 0.125
 0.         0.        ]
[stdout:1] 
After ADD/REVERSE update
Rank 1: [0.08333333 0.125      0.25       0.125      0.04166667 0.08333333
 0.         0.        ]

The ghost nodes are still not updated, and now we must distribute the values from the owning process to the ghost nodes

%%px

# Get value from owning process and update the ghosts
_b.x.scatter_forward()

print("After INSERT/FORWARD update")
mpi_print(_b.x.array)
[stdout:0] 
After INSERT/FORWARD update
Rank 0: [0.04166667 0.125      0.08333333 0.125      0.125      0.25
 0.08333333 0.125     ]
[stdout:1] 
After INSERT/FORWARD update
Rank 1: [0.08333333 0.125      0.25       0.125      0.04166667 0.125
 0.125      0.08333333]

The same procedures apply to assembly of scalars and matrices. Now, we will look at how we can combine what we have learnt and use it all to solve a variational problem derived from a partial differential equation (PDE).

Putting it all together: Solving a variational problem#

We will consider solving a Poisson problem on the unit square domain, denoted \(\Omega\). The strong form of the problem is: determine \(u\) such that

(2)#\[\begin{align} -\nabla^2 u &= f \quad \mathrm{in} \ \Omega, \\ u &= g \quad \mathrm{on} \ \partial\Omega, \end{align}\]

where \(\partial\Omega\) is the boundary of the domain. The weak form of the problem is derived by multiplying the PDE with a test function \(v\), integrating over the domain and applying integration by parts. This yields

(3)#\[\begin{equation} \int_{\Omega} \nabla u \cdot \nabla v dx = \int_{\Omega}f v dx \end{equation}\]

where the boundary integral vanishes because \(v = 0\) on the boundary due to the Dirichlet boundary condition. For simplicity we set \(g = 0\).

The finite element problem can now be defined as: find \(u_h \in V_h\) such that

(4)#\[\begin{equation} a(u_h, v_h) = L(v_h), \forall \ v_h \in V_h, \end{equation}\]

where \(V_h\) is the finite element space and

(5)#\[\begin{equation} a(u, v) = \int_{\Omega} \nabla u \cdot \nabla v dx \end{equation}\]

and

(6)#\[\begin{equation} L(v) = \int_{\Omega}f v dx. \end{equation}\]

The subscript \(h\) emphasizes that the variables are defined on a discrete mesh.

PETSc is the linear algebra backend used for solving the linear system of equations that defines the weak form. For more information on the Krylov solver used here and its options, see: https://petsc.org/release/.

To visualize the solution, we use pyvista (https://docs.pyvista.org/). For a simple introduction to defining and solving variational problems with FEniCSx, see https://jsdokken.com/dolfinx-tutorial/.

We start by import PETSc and pyvista and create a mesh:

%%px

import ufl
import pyvista as pv
pv.start_xvfb()
comm = MPI.COMM_WORLD # MPI communicator

Nx, Ny = 2, 2 # Mesh size

# Create a unit square mesh
mesh = dfx.mesh.create_unit_square(comm, Nx, Ny, ghost_mode = dfx.cpp.mesh.GhostMode.none)

For this problem, we are going to define boundary conditions on the boundaries of the domains, i.e. the facets of the mesh. We therefore need to create these entities (which are of dimension 1 less than the mesh dimension), as well as the connectivity between the facets and the cells (which are of dimension equal to the mesh dimension).

%%px

# Create mesh facet entities and conncectivity between facets and cells
mesh.topology.create_entities(mesh.topology.dim - 1)
mesh.topology.create_connectivity(mesh.topology.dim - 1, mesh.topology.dim)

Next, we define our finite element space \(V_h\). Here, we choose the space of first order (linear) continuous Lagrange elements. We create trial and test functions, as well as a function for storing the solution from our space.

%%px

# Create a first-order Lagrange finite element space
V = dfx.fem.FunctionSpace(mesh, ("Lagrange", 1))

# Trial and test functions
u = ufl.TrialFunction(V)
v = ufl.TestFunction (V)

u_h = dfx.fem.Function(V) # Solution function

In this problem we will set the source term function \(f = 1\) in the entire domain. With this, we are ready to define the bilinear and linear forms:

%%px

f = dfx.fem.Constant(mesh, dfx.default_scalar_type(1.0)) # Constant source term f = 1

# UFL form of the bilinear form
a = ufl.inner(ufl.grad(u), ufl.grad(v)) * ufl.dx
bilinear_form = dfx.fem.form(a)

# UFL form of right-hand side
L = f * v * ufl.dx
linear_form = dfx.fem.form(L)

The last thing that remains before we can assemble our system of equations is to define our boundary conditions. The following code creates a function \(g = 0\), finds the degrees of freedom corresponding to the boundary facets of the mesh and uses these to create a Dirichlet boundary condition object.

%%px

# Boundary condition function
g = dfx.fem.Function(V) # Dolfinx function, default function value = 0

# Get the dofs of the boundary facets
boundary_facets = dfx.mesh.exterior_facet_indices(mesh.topology)
boundary_dofs   = dfx.fem.locate_dofs_topological(V, mesh.topology.dim - 1, boundary_facets)
bc_g = dfx.fem.dirichletbc(g, boundary_dofs)

bcs = [bc_g]

Now we are ready to assemble the matrix and vectors of our system:

%%px

# Assemble matrix from the bilinear form
A = dfx_petsc.assemble_matrix(bilinear_form, bcs = bcs)
A.assemble()

# Assemble UFL form into a vector
_b = dfx.fem.Function(V) # Dolfinx function of right-hand side
dfx_petsc.assemble_vector(_b.vector, linear_form)
dfx_petsc.apply_lifting(_b.vector, [bilinear_form], bcs = [bcs])
_b.x.scatter_reverse(dfx.la.InsertMode.add)
_b.x.scatter_forward()
dfx_petsc.set_bc(_b.vector, bcs = bcs)
mpi_print(_b.x.array)
[stdout:0] 
Rank 0: [0.    0.    0.    0.    0.125 0.25 ]
[stdout:1] 
Rank 1: [0.    0.    0.    0.25  0.    0.125]

Note the use of apply_lifting and scatter_reverse/scatter_forward before using set_bc. This is important to not lose any information due to lack of communication between the processors running in parallel.

To solve the problem, we will use a direct Krylov solver with LU decomposition as preconditioner. MUMPS is used as the factor solver type.

%%px

# Create a (direct) linear solver
from petsc4py.PETSc import KSP
solver = KSP().create(mesh.comm)
solver.setOperators(A)
solver.setType("preonly")
solver.getPC().setType("lu")
solver.getPC().setFactorSolverType("mumps")

The solution is acquired by running

%%px

solver.solve(_b.vector, u_h.vector)

After solving the system of equations in parallel, it is important to use scatter_forward() to distribute the information from the owning processes to their ghost nodes.

%%px

print("Prior to communication")
mpi_print(u_h.x.array)   

# Get value from owning process and update the ghosts
u_h.x.scatter_forward()

print("After scatter_forward() update")
print(f"Rank: {comm.rank}: {u_h.x.array}")   
[stdout:0] 
Prior to communication
Rank 0: [0. 0. 0. 0. 0. 0.]
After scatter_forward() update
Rank: 0: [0.     0.     0.     0.     0.     0.0625]
[stdout:1] 
Prior to communication
Rank 1: [0.     0.     0.     0.0625 0.     0.    ]
After scatter_forward() update
Rank: 1: [0.     0.     0.     0.0625 0.     0.    ]

The following code visualizes with pyvista the solutions calculated on the two individual processes.

%%px

# Visualize the solution
topology, cell_types, x = dfx.plot.vtk_mesh(V)
grid = pv.UnstructuredGrid(topology, cell_types, x)

# Set output data
grid.point_data["u"] = u_h.x.array.real
grid.set_active_scalars("u")

# Create a pyvista plotter object and plot the datagrid
pl = pv.Plotter()
pl.add_text(f"Rank: {comm.rank}", font_size = 12)
pl.add_mesh(grid)
pl.show()
[stderr:0] 
ERROR:asyncio:Exception in callback Task.task_wakeup(<Future finis...ture pending>>)
handle: <Handle Task.task_wakeup(<Future finis...ture pending>>)>
Traceback (most recent call last):
  File "/__w/mpi-tutorial/mpi-tutorial/3/envs/mpi-tutorial/lib/python3.10/asyncio/events.py", line 80, in _run
    self._context.run(self._callback, *self._args)
RuntimeError: Cannot enter into task <Task pending name='Task-3' coro=<Kernel.poll_control_queue() running at /__w/mpi-tutorial/mpi-tutorial/3/envs/mpi-tutorial/lib/python3.10/site-packages/ipykernel/kernelbase.py:296> wait_for=<Future finished result=<Future pending>> cb=[_chain_future.<locals>._call_set_state() at /__w/mpi-tutorial/mpi-tutorial/3/envs/mpi-tutorial/lib/python3.10/asyncio/futures.py:392]> while another task <Task pending name='Task-2' coro=<Kernel.dispatch_queue() running at /__w/mpi-tutorial/mpi-tutorial/3/envs/mpi-tutorial/lib/python3.10/site-packages/ipykernel/kernelbase.py:534> cb=[IOLoop.add_future.<locals>.<lambda>() at /__w/mpi-tutorial/mpi-tutorial/3/envs/mpi-tutorial/lib/python3.10/site-packages/tornado/ioloop.py:685]> is being executed.
[stderr:1] 
ERROR:asyncio:Exception in callback Task.task_wakeup(<Future finis...ture pending>>)
handle: <Handle Task.task_wakeup(<Future finis...ture pending>>)>
Traceback (most recent call last):
  File "/__w/mpi-tutorial/mpi-tutorial/3/envs/mpi-tutorial/lib/python3.10/asyncio/events.py", line 80, in _run
    self._context.run(self._callback, *self._args)
RuntimeError: Cannot enter into task <Task pending name='Task-3' coro=<Kernel.poll_control_queue() running at /__w/mpi-tutorial/mpi-tutorial/3/envs/mpi-tutorial/lib/python3.10/site-packages/ipykernel/kernelbase.py:296> wait_for=<Future finished result=<Future pending>> cb=[_chain_future.<locals>._call_set_state() at /__w/mpi-tutorial/mpi-tutorial/3/envs/mpi-tutorial/lib/python3.10/asyncio/futures.py:392]> while another task <Task pending name='Task-2' coro=<Kernel.dispatch_queue() running at /__w/mpi-tutorial/mpi-tutorial/3/envs/mpi-tutorial/lib/python3.10/site-packages/ipykernel/kernelbase.py:534> cb=[IOLoop.add_future.<locals>.<lambda>() at /__w/mpi-tutorial/mpi-tutorial/3/envs/mpi-tutorial/lib/python3.10/site-packages/tornado/ioloop.py:685]> is being executed.
[output:0]
[output:1]
# Stop the cluster
rc.cluster.stop_cluster_sync()
Stopping controller
INFO:ipyparallel.cluster.cluster.1704465271-hrup:Stopping controller
Controller stopped: {'exit_code': 0, 'pid': 3827, 'identifier': 'ipcontroller-1704465271-hrup-3801'}
INFO:ipyparallel.cluster.cluster.1704465271-hrup:Controller stopped: {'exit_code': 0, 'pid': 3827, 'identifier': 'ipcontroller-1704465271-hrup-3801'}
Stopping engine(s): 1704465272
INFO:ipyparallel.cluster.cluster.1704465271-hrup:Stopping engine(s): 1704465272
engine set stopped 1704465272: {'exit_code': 1, 'pid': 3861, 'identifier': 'ipengine-1704465271-hrup-1704465272-3801'}
WARNING:ipyparallel.cluster.cluster.1704465271-hrup:engine set stopped 1704465272: {'exit_code': 1, 'pid': 3861, 'identifier': 'ipengine-1704465271-hrup-1704465272-3801'}