The final of six posts covering content from SHPC4001 and SHPC4002.
Welcome to the final post in a six-part retrospective of UWA’s SHPC units. This will be a very brief discussion of parallelism using OpenMP and OpenMPI (hereafter simply MPI) in Fortran.
OpenMP
OpenMP is an API that allows multithreading. The core way in which it does this is that a single parent thread is forked into various child threads, each with shared memory. In Fortran, we can specify such a parallel region using the standard directive-pair syntax:
!$OMP PARALLEL
write(*,*)"Hello world!"
!$OMP END PARALLEL
while in C we need to use the pragma
directive:
#pragma omp parallel
printf("Hello world!\n");
these snippets will simply print \( N \) “Gday world” messages where \( N \) is the number of default threads.
OpenMP must be explicitly loaded, either as a header in C (#include<omp.h>
) or as a module in Fortran (use OMP_LIB
): To compile (with GCC/GFortran), we must specify a flag to the compiler:
gcc -fopenmp -o [program] [code.c]
gfortran -fopenmp -o [program] [code.f95]
Setting and getting threads
We can manually specify the number of threads with the OMP_set_num_threads()
subroutine. We can also get the ID of the current thread using OMP_get_thread_num()
. Consider the following:
program test
use OMP_LIB
call OMP_SET_NUM_THREADS(4)
!$OMP PARALLEL
if (OMP_GET_THREAD_NUM() == 0) then
write(*,*)"Hello world!"
end if
!$OMP END PARALLEL
end program test
This program creates 4 threads, but only the master thread (with a thread ID of 0) prints “Hello world!” The MASTER
directive can be used to do the same thing, but more on that later. For now, let’s talk more about what happens to variables.
Private variables
Let’s look at this code snippet my dark matter halo assignment:
!$OMP PARALLEL DO PRIVATE(new_ll)
do i = 1,npart,1
call ll_init(new_ll, (/0.0,0.0,0.0/), 0)
neighbours(i)%p => new_ll
end do
!$OMP END PARALLEL DO
This code is populating an array with linked lists. First things first, PARALLEL DO
is used for parallelise a for loop. Each of the threads make their way through the do loop and execute the code. Why is new_ll
private? This is to stop multiple threads creating the same linked list and then adding duplicates into the array. What PRIVATE
does is declare the variable to have a local scope to each thread (i.e each thread has a local copy of the variable). In contrast, SHARED
is used to specify that the variable has global scope across all threads. In OpenMP, all variables are shared by default.
Thread Execution
Suppose we have a sequence of lines in our code that we only want 1 thread to use. We can use SECTIONS
:
!$OMP SECTIONS
!$OMP SECTION
write(*,*)"Hello"
!$OMP SECTION
write(*,*)"world!"
!$OMP END SECTIONS
We can also use SINGLE
to specify code that should only be executed by one thread, or MASTER
to use the master thread.
Now, what if we have some sensitive data that we want to parallelise but we only ever want one thread to act at a time (such as writing to a file). We do this using CRITICAL
. An example from my dark matter halo project is populating the cubic mesh with each particle.
!$OMP PARALLEL DO PRIVATE(j_x, j_y, j_z, i)
do i = 1,npart,1
!$OMP CRITICAL
j_x = floor(abs(x(i) - (bds_mv(1)+0.0001))/mesh_subsize) + 1
j_y = floor(abs(y(i) - (bds_mv(2)+0.0001))/mesh_subsize) + 1
j_z = floor(abs(z(i) - (bds_mv(3)+0.0001))/mesh_subsize) + 1
call ll_insert(mesh(j_x,j_y,j_z)%p, (/x(i),y(i),z(i)/), i)
!$OMP END CRITICAL
end do
!$OMP END PARALLEL DO
Another way to ensure that multiple threads only ever update a single memory location one thread at a time is using ATOMIC
.
Synchronisation
Suppose your parallel region is composed to two chunks of code, such as (1) performing a bunch of calculations and (2) analysing the results. In such a scenario, you wouldn’t want to start analysing if the calculations have not finished. We can use the BARRIER
directive to mark a point in our code where each thread must wait until all the other threads have completed. For example:
!$OMP PARALLEL
!some important subroutine
!$OMP BARRIER
!code that depends on previous subroutine having been completed
!$OMP END PARALLEL
Now you must be very careful with BARRIER
statements or else you could end up with a deadlock. Consider the following:
!$OMP PARALLEL
!$OMP CRITICAL
!some important subroutine
!$OMP BARRIER
!code that depends on previous subroutine having been completed
!$OMP END CRITICAL
!$OMP END PARALLEL
because the barrier is in a critical region, only one thread will encounter it, and it will be forever stuck waiting for the other threads.
Sometimes, especially before processing some massive data structure (such as a cubic mesh of linked lists) you might want to ensure that each thread has the correct copy of all variables. We can do this using FLUSH
.
MPI
The message passing interface is a communication protocol for parallel computing. MPI is used to execute a parallel program over several compute nodes (i.e different CPUs). Where OpenMP was concerned about managing things over different threads, MPI generally involves partitioning data and/or splitting tasks across over multiple processors (SPMD). Naturally, the dark matter halo project lends itself well to using MPI; the cubic mesh can be split across multiple processors, with each processor analysing a certain region within the mesh. This section is designed to give you a brief overview of how MPI works as well as its commonly used functions. For a considerably more in-depth overview, consider this MPI Tutorial.
Compiling and running
There are several implementations of MPI. I used OpenMPI but I know of others who used Intel MPI (as part of Parallel). Open MPI includes the mpifort
compiler wrapper, which is needed in order to properly compile your code.
$mpifort -o [program] [code.f95]
Remember to include -fopenmp
is your code also uses OpenMP. Executing is a little different
$mpirun -n [processes] [program]
Here you must specify the number of processes. What mpirun
then does is create and run n copies of your program.
Setting up your world
This is the basic structure of a program that uses MPI:
program test
use MPI
implicit none
integer :: ierr, mpi_rank, mpi_size
call MPI_INIT(ierr)
call MPI_COMM_RANK(MPI_COMM_WORLD, mpi_rank, ierr) !get process rank
call MPI_COMM_SIZE(MPI_COMM_WORLD, mpi_size, ierr) !get num of processes
if (mpi_rank == 0) then
write(*,*) 'I am the master process'
end if
call MPI_FINALIZE(ierr)
end program test
We initialise MPI with MPI_INIT
and terminate it with MPI_FINALIZE
.
Interprocess communication
Let’s jump into the core feature of MPI. Suppose you want to send an array from process 0 (the master process) to process 1 (a slave process). We use MPI_SEND
and MPI_RECV
. Consider the following:
integer :: a(10), status(MPI_STATUS_SIZE), src, dest, flag
if (mpi_rank == src) then
call MPI_SEND(a,size(a),MPI_INTEGER,dest,tag,MPI_COMM_WORLD,ierr)
end if
if (mpi_rank == dest) then
call MPI_RECV(a,size(a),MPI_INTEGER,src,tag,MPI_COMM_WORLD,status,ierr)
end if
There’s a lot going on here, so lets walk through the parameters. First, we need to specify the start of the buffer (this is just the array a
). Second comes the number of elements to be sent/received; this is the size of the array. Third comes the data type; this must be an MPI-compatible type (such as MPI_INTEGER
). Fourth comes the source/destination, which is an integer representing the rank (or process ID) that the data is sent from/to. Fifth is the communicator, MPI_COMM_WORLD
. For MPI_RECV
, we have a status
parameter (did it send or not). And the last parameter ierr
is required Fortran, but can be omitted for C/C++.
It is worth noting that both MPI_SEND
and MPI_RECV
are so-called blocking calls. Only until the data has been sent/received will the subroutine complete. This seems innocuous, but it can pretty easily lead to deadlocks. Consider the following example:
if (mpi_rank == 0) then
call MPI_RECV(recv,size(recv),MPI_INTEGER,1,tag,MPI_COMM_WORLD,status,ierr)
call MPI_SEND(send,size(send),MPI_INTEGER,1,tag,MPI_COMM_WORLD,ierr)
end if
if (mpi_rank == 1) then
call MPI_RECV(recv,size(recv),MPI_INTEGER,0,tag,MPI_COMM_WORLD,status,ierr)
call MPI_SEND(send,size(send),MPI_INTEGER,0,tag,MPI_COMM_WORLD,ierr)
end if
Here both processes are stuck waiting for a light that never comes. One solution is to swap the recv
and send
buffers in process 0, but another is to use the non-blocking commands MPI_ISEND
and MPI_IRECV
. These subroutines return immediately, so there’s no way of knowing whether they actually completed or not. For this reason, it is useful to use MPI_WAIT
in order to explicitly ensure that the command has completed.
Collective communication
Suppose you want to send an array to all the other processes. We can do this with MPI_BCAST
.
call MPI_BCAST(a,size(a),MPI_INTEGER,src,MPI_COMM_WORLD,ierr)
Here the fourth parameter is simply the source, i.e from which process the broadcast came from (there is no need to specify a destination as the data is being sent to all the other processes in MPI_COMM_WORLD
).
Now suppose you don’t want to send the entire array to each process, but instead want to send part of the array to each process. This is where MPI_SCATTER
comes in. As the name implies, it scatters the data between all the processes. So if there are 4 processes, each process receives 1/4th of the array. The order of the decomposition is by the rank of the process, so process 0 receives the first quarter, 1 the second quarter, and so on.
The inverse of scatter is MPI_GATHER
. Here you have data spread across all processes that you want to stitch together into one array. By default, MPI_GATHER
gathers the data into the root process. Suppose you want to then broadcast this new, single array back to all the processes. Instead of typing MPI_GATHER
and MPI_BCAST
, you can use MPI_ALLGATHER
. What MPI_ALLGATHER
is really doing is merely gathering all of the disparate data into each of the processes (instead of just the root process).
There’s a lot more to delve into in the world of parallelism, far beyond the scope of this introductory post. Hopefully you’ll find this post and the code snippets useful for kickstarting your own multithreaded projects!