Parallel ProDiMo¶
Since revision 17791d7a, ProDiMo's chemistry (and Tgas-determination) can be run in parallel with OpenMP implementation. Please note that there is currently a hard-coded limit of 200 OpenMP threads in ProDiMo (simply speaking you cannot use more than 200 cores). There are two new switches in Parameter.in, namely
.true. ! parallel_chem
.true. ! parallel_debug
parallel_chem switches on the parallel implementation, and parallel_debug creates detailed output files for every grid point, filenames point_ix_iz_NP, where ix and iz identify the grid point, and NP is the number of OpenMP threads used. The idea of the debug is as follows. Switch on both parallel_chem and parallel_debug, say restart=.false., radtrans=.false, and cputime_max=99999.9 in Parameter.in (to avoid differences by time cutoffs), and then execute (for csh and tcsh shell system).
> setenv OMP_NUM_THREADS 1
> prodimo
> cp FlineEstimates.out FlineEstimates.out1
> setenv OMP_NUM_THREADS 4
> prodimo
> cp FlineEstimates.out FlineEstimates.out4
> diff FlineEstimates.out1 FlineEstimates.out4 | grep E-21
> diff point_ix_iz_01 point_ix_iz_04 (try various ix and iz)
Replace the setenv command for bash, zsh (Mac) by
export OMP_NUM_THREADS=1
To see the current value of an environment variable, type
printenv OMP_NUM_THREADS
There should be no differences between the point-files (except for the thread-number), which is a very hard test, because the point_ix_iz_NP-files contain the column densities, line optical depths, a log of the Newton-Raphson and time-dependent chemistry solver, and resulting Tgas, species densities, and heating/cooling functions. There can be, however, tiny differences in FlineEstimates.out. I get no differences for line fluxes > 1.E-23 W/m2, but meaningless differences for line fluxes < 1.E-23.
--> IF YOU SEE ANY RELEVANT DIFFERENCES, DON't USE THE PARALLEL MODE ! <--
At present, we have issues with different compiler versions and platforms, so different results are likely to occur! If the test fails (with the simple settings in Parameter.in as described below), you need to run further tests with/without -openmp compiler flag and OMP_NUM_THREADS=1.
--> IF THE RESULTS WITH/WITHOUT -openmp ARE DIFFERENT, DON'T USE -openmp! <--
because ProDiMo might give you wrong results even with OMP_NUM_THREADS=1.
The optimal number of threads to use depends on many factors. A simple discussion is provided in Accelerate Prodimo.
In addition, if you use any other than standard setups, you might use certain untested parts of ProDiMo which are not "threadsafe", which I explain below. The changes made at revision 50c7d2c4 are complex, and tested only for certain parameter combinations, namely ...
.true. ! Xrays : use Xray chemistry and heating?
.true. ! solve_temp : solve the heating/cooling balance?
.true. ! chi_from_RT : calculate chi from UV rad. transfer?
.true. ! Jback_from_RT : calculate background Jnu from rad. transfer?
.true. ! UVpumping : use large model atoms?
.true. ! PAH_from_RT : PAH heating from cross-sections?
.true. ! Rphoto_from_RT : calculate photorates from rad. transfer?
.false. ! Rphoto_bandint : use band-integrated photo-rates?
.false. ! Hatom_bb
.false. ! Hatom_bf
.true. ! NewChemScan
.true. ! XrayRT : Xray radiative Transfer
.true. ! XrayRT_chem : use X-Ray radiation field form the Xray-RT for the chemistry
The XrayRT module (parameter XrayRT and XrayRT_chem) also passed the parallel test (Rab C. 12.12.2012).
If you use other, particular setups, I strongly advise you to run the debug test. Only if the results are practically identical, you can use parallel_chem=.true., otherwise you can't, or you need to make those particular code parts "threadsafe".
What means "threadsafe"?¶
Under OpenMP we have to carefully disentangle shared memory and private variables. The private variables are not visible to any other threads, whereas all threads have simultaneous read/write access to the memory locations of shared variables. By default, OpenMP makes the following choices
local variables (without safe, data, or =value in declaration) :: private
local variables ( with safe, data, or =value in declaration) :: shared (!)
common-blocks :: shared (!)
module :: shared
In many cases, this works just fine, for example if a module contains non-changing physical data being initialized before the beginning of the parallel computation, or if various threads work on different points (ix,iz), and store the results in 2D-arrays simultaneously. However, there are also many cases where this default choice fails miserably. For example:
module dataset
integer, parameter :: NXX=20, NZZ=20, Nlam=60
real :: nHtot(NXX,NZZ)
real :: nH
end module dataset
subroutine tmp
use dataset,ONLY: NXX,NZZ,nHtot,nH
integer :: ix,iz
real :: Tgas
do ix=1,NXX
!$omp parallel
!$omp& private(iz,Tgas)
!$omp do schedule(dynamic,1)
do iz=1,NZZ
nH = nHtot(ix,iz)
Tgas = HEAVY_COMPUTATION(nH,ix,iz)
enddo
!$omp end parallel
enddo
end
The problem here is the "sloppy" definition of the real variable nH, which sits in the shared memory, because it is defined in a module. Thus, all threads use the same memory location to pass nH to HEAVY_COMPUTATION(nH,ix,iz) ...
Another bad example was in
!------------------------------------------------------------------------
REAL FUNCTION SPLINE_INTERPOL(NDIM,N,U,X,Y,B,C,D)
!------------------------------------------------------------------------
implicit none
integer :: NDIM,N
real :: U,X(NDIM),Y(NDIM),B(NDIM),C(NDIM),D(NDIM),DX
integer :: J,K,I2
integer,save :: I=1
...
All threads used simultaneously the interpolation index I, which sits in the shared memory because of the save attribute in the local definition ...
In both cases, the !$omp threadprivate() -command solved the problems, so!$omp threadprivate(nH) just before end module in the first example, and !$omp threadprivate(I) just after integer,save :: I=1.
THUS, if you want your code parts to work with the parallel implementation, e.g. called from RATE_COEFFS(), CHEMFUNC(), HEATCOOLGAS_IT() or STATEQ_IT(), think carefully about ALL variable declarations in the data modules, common blocks, and locally in the subroutines
array(NXX,NZZ) :: shared is probably fine
phys-data(NLAM) :: shared is probably fine, if not changing and initialised before
tmp(Nlam) :: if tmp comes from a module, you probably need to add !$omp threadprivate(tmp)
local,save :: a :: you probably need to add !$omp threadprivate(a), if a can change
common/block/a,b :: you probably need to add !$omp threadprivate(/block/), if a or b can change
In other words, if you have any variable to the left of an assignment statement (or as result of a subroutine call), that is not (ix,iz)-dependent, you probably need to make sure that this variable is private.
Good luck,
Peter Woitke, 7.10.2012
Wing-Fai Thi, 15.10.2012 add remark on the errors when the openmp flag is used even with one thread.