Mathematical (Blas/Lapack) libraries for ProDiMo¶
TODO: Find a proper place for this. Maybe also accelerate ProDiMo
Linking with ifx/ifort mkl libraries¶
01/06/2018 Peter Woitke
You can achieve some acceleration of ProDiMo by linking it with highly optimised linear algebra libraries. If you are using ifx/ifort, I would recommend using -qmkl=sequential as follows
LAPACK_INC=-qmkl=sequential
FLAGS = -r8 -i4 -g -traceback -O3 -xHOST -prec-div -fp-model source -qopenmp
FCRIT = -r8 -i4 -g -traceback -O3 -xHOST -prec-div -fp-model source -qopenmp
DEFS = -fpp -DIFORT -DEXTERNAL_LAPACK
The default compilation of ProDiMo uses slatec_routines.F, which was later replaced by LAPACK.
If you set the lapack_solver flag in Parameter.in, ProDiMo will use lapack2006.F instead, which contains copies of the LAPACK routines from 2006, but only those actually required in ProDiMo, still self-compiled. This flag is enabled by default if you use the quadp_solver
If you say -DEXTERNAL_LAPACK ProDiMo will not use any of these but will link itself to external blas and lapack libraries which can result in a considerable acceleration depending on the computer you are using. On the computers I have tested it, the mkl library, which comes with the ifx/ifort compiler, seems to provide the fastest linear algebra routines; here are some benchmarking results from ParallelTest, where numbers are user-time [sec] measured for the chemistry and the line transfer parts, using 6 openmp processes:
FLAGS = -r8 -i4 -g -traceback -O3 -xHOST -prec-div -fp-model source -qopenmp
FCRIT = -r8 -i4 -g -traceback -O3 -xHOST -prec-div -fp-model source -qopenmp
setenv OMP_NUM_THREADS 6
parallel_chem=T
default compilation with internal linear algebra: CHEM=97.10 LineRT=22.12
with lapack_solver flag: CHEM=97.07 LineRT=20.94
-DEXTERNAL_LAPACK -mkl=sequential: CHEM=88.64 LineRT=20.81
-DEXTERNAL_LAPACK -mkl=parallel: CHEM=93.74 LineRT=20.98
If you are running parallel chemistry as here (parallel_chem=T), you should link with -mkl=sequential. The linear algebra calls in ProDiMo are from within the parallelised parts, where there are on other openmp processes waiting/available. You might get an acceleration, however, from (parallel_chem=F,-mkl=sequential) to (parallel_chem=F,-mkl=parallel), haven't checked that. If you are using the gfortran compiler, you can download and compile OpenBLAS as described below.
BLAS/LAPACK with gfortran¶
Linux¶
OpenBLAS is derived from the old fast GotoBLAS. Contrary to the Intel MKL routines, OpenBLAS is an open-source project, and if it is installed, it may bring significant speed improvements and can be used with gfortran.
Installation of the library¶
Detailed installation instructions for openblas can be found in the Installation Guide. Often, the library can be installed via a package manager (on both Linux or Mac), see the Guide mentioned above.
However, it might be required to download/compile/install the library manually, depending on the system (see the Installation Guide link).
Makefile setting¶
Similar to ifx/ifort, one has to enable the -DEXTERNAL_LAPACK compiler flag. Additional settings like this have to be added in the makefile (see makefile.gfortran in src_develop directory for an example).
#-------------------------------------
# OPENBLAS for -DEXTERNAL_LAPACK ... uses package from ubuntu installation
# has to be adapted to your installation
#-------------------------------------
LAPACK_INC = -I/usr/include/x86_64-linux-gnu
LAPACK = -L/usr/lib/x86_64-linux-gnu/lapack -lopenblas -lpthread -lgfortran
Mac OS X¶
Xcode includes several libraries for accelerating code. To use some of those, one can simply add -framework accelerate to the FLAGS in the makefile e.g.
LAPACK_INC = -framework accelerate
and don't forget -DEXTERNAL_LAPACK. This is for gfortran (ifort/ifx are not supported on Mac OS X). Please note that those things tend to change with the OS X version.
Alternatively one can also install OpenBlas like for Linux (see link above for installation instructions).