audit_LUMI/cheese_audit_LUMI.md

7.5 KiB

ChEESE performance audit

This document describes how to prepare codes for ChEESE performance audits on LUMI.

Essentially this requires three steps:

  1. Prepare Spack for tools installation
  2. Install tools through Spack
  3. Jobscripts
  4. Determine baselines

Well, that is four steps 😄.

Prepare Spack for tool installation

We will leverage LUMI's Spack facility for installation of tools. Any version of Spack will do, but it need to be the same at the time of tool installation and when doing the runs. For the purpose of this document we will use LUMI's module spack/23.03-2.

Spack requires disk space to install packages, book-keeping, etc. On LUMI this is controlled by the environment variable $SPACK_USER_PREFIX which needs to be set before the Spack module can even be loaded. It is recommended to point this variable to a directory which is readable by your whole LUMI compute project.

export PROJECT_ID=465000533    # put your own here
export SPACK_USER_PREFIX=/project/project_${PROJECT_ID}/spack_ChEESE
module load spack/23.03-2

You might consider putting this in you .bashrc or similar as this will be used at every stage of this document.

Now, it is time to bootstrap Spack to install dependencies, etc. This needs to be done only once. Actually, this is step is optional, as it will be executed at your first usage of spack. But it takes some (long!) time and might make you nervous. Anyway, make it so! with:

spack bootstrap now

Next, you need to set up a so called Spack environment. Do the following

cd $SPACK_USER_PREFIX
curl https://code.hlrs.de/ChEESE-2P/audit_LUMI/archive/main.tar.gz | tar xzv --strip=1

mkdir cheese_env
cp -a ChEESE_templates/spack.yaml cheese_env

cp -a ChEESE_templates/repo . 

Activating Spack environment

Finally, it is time to activate your Spack environment with

export PROJECT_ID=465000533    # put your own here
export SPACK_USER_PREFIX=/project/project_${PROJECT_ID}/spack_ChEESE
module load spack/23.03-2
eval $(spack env activate --sh $SPACK_USER_PREFIX/cheese_env)

Install tools through Spack

First, activate your Spack environment as in the previous section. Then, setup your module environment as usual for doing runs or compiling your code. Repeat this every time that you install/modify your private Spack installation. Check with

module load ....
module list

And install the tools Extrae and mpiP with

# get version of currenty loaded compiler and MPI
if [ $LMOD_FAMILY_COMPILER = "cce" ]; then 
  __COMPILER=$LMOD_FAMILY_COMPILER
else
  __COMPILER=$LMOD_FAMILY_COMPILER@$LMOD_FAMILY_COMPILER_VERSION
fi
__MPI=$LMOD_FAMILY_MPI@$LMOD_FAMILY_MPI_VERSION

#spack spec -N -I cheese.extrae%${__COMPILER}~~cuda ^${__MPI} ^binutils@2.36: ^papi@7.0.1
spack install --add cheese.extrae%${__COMPILER}~~cuda ^${__MPI} ^binutils@2.36: ^papi@7.0.1

#spack spec -N -I mpip%${__COMPILER} ^${__MPI}
spack install --add mpip%${__COMPILER} ^${__MPI}

Extrae needs a configuration file extrae_detail_circular.xml

curl -fO https://code.hlrs.de/hpcjgrac/hawk-utils-scripts/raw/branch/main/performance/extrae/share/extrae_detail_circular.xml

Place this in your job folder.

And here is a script to calculate basic performance metrics from mpiP reports

curl -fO https://code.hlrs.de/hpcjgrac/hawk-utils-scripts/raw/branch/main/performance/mpiP/share/mpip2POP.py

Jobscript

You need to modify your jobscripts to attach the tools to your executable.

For mpiP runs add something like this to your jobscript

# activate Spack environment
export PROJECT_ID=465000533    # put your own here
export SPACK_USER_PREFIX=/project/project_${PROJECT_ID}/spack_ChEESE
module load spack/23.03-2
eval $(spack env activate --sh $SPACK_USER_PREFIX/cheese_env)

# setup your modules, etc
# module load ...

# load and configure mpiP
eval $(spack load --sh mpip)
export MPIP="-c -d"
TRACE="env LDPRELOAD=libmpiP.so"

# add $TRACE before you application executable
srun ... ${TRACE} ./appl ...

This will produce mpiP reports which end in *.1.mpiP or *.2.mpiP.

For Extrae add the following to your jobscript

# activate Spack environment
export PROJECT_ID=465000533    # put your own here
export SPACK_USER_PREFIX=/project/project_${PROJECT_ID}/spack_ChEESE
module load spack/23.03-2
eval $(spack env activate --sh $SPACK_USER_PREFIX/cheese_env)

# setup your modules, etc
# module load ...

# setup your modules, etc
# load and configure Extrae
eval $(spack load --sh extrae)
export EXTRAE_CONFIG_FILE=/PATH/TO/extrae_detail_circular.xml
TRACE="env LD_PRELOAD=libompitracecf.so"

# add $TRACE before you application executable
srun ... ${TRACE} ./appl ...

This will create files TRACE.* and directories set-* which hold the intermediate traces. You need to merge these into regular traces with the command

eval $(spack load --sh extrae)
mpi2prv -f TRACE.mpits -o trace_YOUR_FAVORITE_NAME_NRANKS.prv

Determine baseline performance

Running your application under the control of a performance analysis tool can incur significant overhead, i.e. your code will take noticeably longer to execute. At the same time, such overhead will have an impact on the quality of your performance analysis and the robustness of your conclusions. Always be aware of the amount of overhead and try to keep it small where possible. In many cases it is possible to reduce the overhead below 5% of the execution time, which is the same order of magnitude of expected performance variability between runs. If your overhead is larger, be aware that performance metrics may be off by at least as much.

It is therefore important to measure the performance of your code for the particular use-case before applying any performance analysis tools. We refer to this as non-instrumented performance.

At the very least you should determine the elapsed time of run. Do for instance

time mpirun ... ./app

and record the "User time" portion of the output.

Many codes keep track of an application-specific performance metric, such as for instance iterations per second, or similar. Often, this a better than the raw elapsed time, as it will disregard initialisation and shutdown phases which are negligible for longer production runs, but not for short analysis use-cases. If your code reports such a metric, record this as well in addition to the elapsed time. You may consider adding an application-specific metric to your code, if not available yet.

Consider doing not just one run, but several to get a feeling for the variation of the non-instrumented performance across runs.

Optional: Installing DLB

Activate your Spack environment as explained above, then load your module environment as usual.

Add the latest Spack recipe for DLB with

mkdir -p $SPACK_USER_PREFIX/repo/packages/dlb
curl --output-dir $SPACK_USER_PREFIX/repo/packages/dlb -fO https://raw.githubusercontent.com/spack/spack/develop/var/spack/repos/builtin/packages/dlb/package.py

Now, install DLB with

# get version of currenty loaded compiler and MPI
if [ $LMOD_FAMILY_COMPILER = "cce" ]; then 
  __COMPILER=$LMOD_FAMILY_COMPILER
else
  __COMPILER=$LMOD_FAMILY_COMPILER@$LMOD_FAMILY_COMPILER_VERSION
fi
__MPI=$LMOD_FAMILY_MPI@$LMOD_FAMILY_MPI_VERSION

spack spec -N -I  dlb@3.3.1%${__COMPILER} ^${__MPI}
spack install --add  dlb@3.3.1%${__COMPILER} ^${__MPI}

TODO: use DLB

Optional: Fixing Modulefiles

TBD

spack module tcl refresh -y
find 23.03/0.20.0/modules/tcl -type f -exec sed -i "s|^setenv \$<SPACK.PKG.*$|#\0|" {} \;