audit_LUMI/cheese_audit_LUMI.md

# ChEESE performance audit 

This document describes how to prepare codes for ChEESE performance audits on LUMI. 

Essentially this requires three steps:
1. Prepare Spack for tools installation
2. Install tools through Spack
3. Jobscripts
4. Determine baselines 

Well, that is four steps :smile:.


## Prepare Spack for tool installation

We will leverage LUMI's Spack facility for installation of tools. Any version of Spack will do, but it need to be the same at the time of tool installation and when doing the runs. For the purpose of this document we will use LUMI's module `spack/23.03-2`.

Spack requires disk space to install packages, book-keeping, etc. On LUMI this is controlled by the environment variable `$SPACK_USER_PREFIX` which needs to be set before the Spack module can even be loaded. It is recommended to point this variable to a directory which is readable by your whole LUMI compute project.

```bash 
export PROJECT_ID=465000533    # put your own here
export SPACK_USER_PREFIX=/project/project_${PROJECT_ID}/spack_ChEESE
module load spack/23.03-2
```
You might consider putting this in you `.bashrc` or similar as this will be used at every stage of this document.

Now, it is time to bootstrap Spack to install dependencies, etc. This needs to be done only once. Actually, this is step is optional, as it will be executed at your first usage of spack. But it takes some (long!) time and might make you nervous. Anyway, *make it so!* with:
```bash
spack bootstrap now
```

Next, you need to set up a so called Spack environment. Do the following
```bash
cd $SPACK_USER_PREFIX
curl https://code.hlrs.de/ChEESE-2P/audit_LUMI/archive/main.tar.gz | tar xzv --strip=1

mkdir cheese_env
cp -a ChEESE_templates/spack.yaml cheese_env

cp -a ChEESE_templates/repo . 
```

## Activating Spack environment

Finally, it is time to activate your Spack environment with
```
export PROJECT_ID=465000533    # put your own here
export SPACK_USER_PREFIX=/project/project_${PROJECT_ID}/spack_ChEESE
module load spack/23.03-2
eval $(spack env activate --sh $SPACK_USER_PREFIX/cheese_env)
```

## Install tools through Spack

First, activate your  Spack environment as in the previous section.
Then, setup your module environment as usual for doing runs or compiling your code. Repeat this every time that you install/modify your private Spack installation. Check with 
```bash
module load ....
module list
```

And install the tools Extrae and mpiP with
```bash
# get version of currenty loaded compiler and MPI
if [ $LMOD_FAMILY_COMPILER = "cce" ]; then 
  __COMPILER=$LMOD_FAMILY_COMPILER
else
  __COMPILER=$LMOD_FAMILY_COMPILER@$LMOD_FAMILY_COMPILER_VERSION
fi
__MPI=$LMOD_FAMILY_MPI@$LMOD_FAMILY_MPI_VERSION

#spack spec -N -I cheese.extrae%${__COMPILER}~~cuda ^${__MPI} ^binutils@2.36: ^papi@7.0.1
spack install --add cheese.extrae%${__COMPILER}~~cuda ^${__MPI} ^binutils@2.36: ^papi@7.0.1

#spack spec -N -I mpip%${__COMPILER} ^${__MPI}
spack install --add mpip%${__COMPILER} ^${__MPI}
```

Extrae needs a configuration file `extrae_detail_circular.xml` 
```bash
curl -fO https://code.hlrs.de/hpcjgrac/hawk-utils-scripts/raw/branch/main/performance/extrae/share/extrae_detail_circular.xml
```
Place this in your job folder.

And here is a script to calculate basic performance metrics from mpiP reports
```bash
curl -fO https://code.hlrs.de/hpcjgrac/hawk-utils-scripts/raw/branch/main/performance/mpiP/share/mpip2POP.py
```


## Jobscript

You need to modify your jobscripts to attach the tools to your executable.

For mpiP runs add something like this to your jobscript
```bash
# activate Spack environment
export PROJECT_ID=465000533    # put your own here
export SPACK_USER_PREFIX=/project/project_${PROJECT_ID}/spack_ChEESE
module load spack/23.03-2
eval $(spack env activate --sh $SPACK_USER_PREFIX/cheese_env)

# setup your modules, etc
# module load ...

# load and configure mpiP
eval $(spack load --sh mpip)
export MPIP="-c -d"
TRACE="env LDPRELOAD=libmpiP.so"

# add $TRACE before you application executable
srun ... ${TRACE} ./appl ...
```
This will produce mpiP reports which end in `*.1.mpiP` or `*.2.mpiP`.

For Extrae add the following to your jobscript
```bash
# activate Spack environment
export PROJECT_ID=465000533    # put your own here
export SPACK_USER_PREFIX=/project/project_${PROJECT_ID}/spack_ChEESE
module load spack/23.03-2
eval $(spack env activate --sh $SPACK_USER_PREFIX/cheese_env)

# setup your modules, etc
# module load ...

# setup your modules, etc
# load and configure Extrae
eval $(spack load --sh extrae)
export EXTRAE_CONFIG_FILE=/PATH/TO/extrae_detail_circular.xml
TRACE="env LD_PRELOAD=libompitracecf.so"

# add $TRACE before you application executable
srun ... ${TRACE} ./appl ...
```
This will create files `TRACE.*` and directories `set-*` which hold the intermediate traces. You need to *merge* these into regular traces with the command
```bash
eval $(spack load --sh extrae)
mpi2prv -f TRACE.mpits -o trace_YOUR_FAVORITE_NAME_NRANKS.prv
```

## Determine baseline performance

Running your application under the control of a performance analysis tool can incur significant overhead, i.e. your code will take noticeably longer to execute. At the same time, such overhead will have an impact on the quality of your performance analysis and the robustness of your conclusions. Always be aware of the amount of overhead and try to keep it small where possible. In many cases it is possible to reduce the overhead below 5% of the execution time, which is the same order of magnitude of expected performance variability between runs. If your overhead is larger, be aware that performance metrics may be off by at least as much.

It is therefore important to measure the performance of your code for the particular use-case before applying any performance analysis tools. We refer to this as _non-instrumented performance._

At the very least you should determine the elapsed time of run. Do for instance
```bash
time mpirun ... ./app
```
and record the "User time" portion of the output.

Many codes keep track of an application-specific performance metric, such as for instance iterations per second, or similar. Often, this a better than the raw elapsed time, as it will disregard initialisation and shutdown phases which are negligible for longer production runs, but not for short analysis use-cases. If your code reports such a metric, record this as well in addition to the elapsed time. You may consider adding an application-specific metric to your code, if not available yet.

Consider doing not just one run, but several to get a feeling for the variation of the non-instrumented performance across runs.


## Optional: Installing DLB

Activate your Spack environment as [explained above](#Activating%20Spack%20environment), then load your module environment as usual. 

Add the latest Spack recipe for DLB with 
```
mkdir -p $SPACK_USER_PREFIX/repo/packages/dlb
curl --output-dir $SPACK_USER_PREFIX/repo/packages/dlb -fO https://raw.githubusercontent.com/spack/spack/develop/var/spack/repos/builtin/packages/dlb/package.py
```

Now, install DLB with
```bash
# get version of currenty loaded compiler and MPI
if [ $LMOD_FAMILY_COMPILER = "cce" ]; then 
  __COMPILER=$LMOD_FAMILY_COMPILER
else
  __COMPILER=$LMOD_FAMILY_COMPILER@$LMOD_FAMILY_COMPILER_VERSION
fi
__MPI=$LMOD_FAMILY_MPI@$LMOD_FAMILY_MPI_VERSION

spack spec -N -I  dlb@3.3.1%${__COMPILER} ^${__MPI}
spack install --add  dlb@3.3.1%${__COMPILER} ^${__MPI}
```

TODO: use DLB


## Optional: Fixing Modulefiles

TBD
```bash
spack module tcl refresh -y
find 23.03/0.20.0/modules/tcl -type f -exec sed -i "s|^setenv \$<SPACK.PKG.*$|#\0|" {} \;
```
Add instructions 2023-11-16 15:59:28 +00:00			`# ChEESE performance audit`

			`This document describes how to prepare codes for ChEESE performance audits on LUMI.`

			`Essentially this requires three steps:`
			`1. Prepare Spack for tools installation`
			`2. Install tools through Spack`
			`3. Jobscripts`
			`4. Determine baselines`

			`Well, that is four steps :smile:.`


			`## Prepare Spack for tool installation`

			We will leverage LUMI's Spack facility for installation of tools. Any version of Spack will do, but it need to be the same at the time of tool installation and when doing the runs. For the purpose of this document we will use LUMI's module `spack/23.03-2`.

			Spack requires disk space to install packages, book-keeping, etc. On LUMI this is controlled by the environment variable `$SPACK_USER_PREFIX` which needs to be set before the Spack module can even be loaded. It is recommended to point this variable to a directory which is readable by your whole LUMI compute project.

			```bash
			`export PROJECT_ID=465000533 # put your own here`
			`export SPACK_USER_PREFIX=/project/project_${PROJECT_ID}/spack_ChEESE`
			`module load spack/23.03-2`
			```
			You might consider putting this in you `.bashrc` or similar as this will be used at every stage of this document.

			`Now, it is time to bootstrap Spack to install dependencies, etc. This needs to be done only once. Actually, this is step is optional, as it will be executed at your first usage of spack. But it takes some (long!) time and might make you nervous. Anyway, make it so! with:`
			```bash
			`spack bootstrap now`
			```

			`Next, you need to set up a so called Spack environment. Do the following`
			```bash
			`cd $SPACK_USER_PREFIX`
Add compression flag to tar command 2023-12-19 08:05:52 +00:00			`curl https://code.hlrs.de/ChEESE-2P/audit_LUMI/archive/main.tar.gz \| tar xzv --strip=1`
Add instructions 2023-11-16 15:59:28 +00:00
			`mkdir cheese_env`
			`cp -a ChEESE_templates/spack.yaml cheese_env`

			`cp -a ChEESE_templates/repo .`
			```

			`## Activating Spack environment`

			`Finally, it is time to activate your Spack environment with`
			```
			`export PROJECT_ID=465000533 # put your own here`
			`export SPACK_USER_PREFIX=/project/project_${PROJECT_ID}/spack_ChEESE`
			`module load spack/23.03-2`
			`eval $(spack env activate --sh $SPACK_USER_PREFIX/cheese_env)`
			```

			`## Install tools through Spack`

			`First, activate your Spack environment as in the previous section.`
			`Then, setup your module environment as usual for doing runs or compiling your code. Repeat this every time that you install/modify your private Spack installation. Check with`
			```bash
			`module load ....`
			`module list`
			```

			`And install the tools Extrae and mpiP with`
			```bash
			`# get version of currenty loaded compiler and MPI`
			`if [ $LMOD_FAMILY_COMPILER = "cce" ]; then`
			`__COMPILER=$LMOD_FAMILY_COMPILER`
			`else`
			`__COMPILER=$LMOD_FAMILY_COMPILER@$LMOD_FAMILY_COMPILER_VERSION`
			`fi`
			`__MPI=$LMOD_FAMILY_MPI@$LMOD_FAMILY_MPI_VERSION`

			`#spack spec -N -I cheese.extrae%${__COMPILER}~~cuda ^${__MPI} ^binutils@2.36: ^papi@7.0.1`
			`spack install --add cheese.extrae%${__COMPILER}~~cuda ^${__MPI} ^binutils@2.36: ^papi@7.0.1`

			`#spack spec -N -I mpip%${__COMPILER} ^${__MPI}`
			`spack install --add mpip%${__COMPILER} ^${__MPI}`
			```

			Extrae needs a configuration file `extrae_detail_circular.xml`
			```bash
			`curl -fO https://code.hlrs.de/hpcjgrac/hawk-utils-scripts/raw/branch/main/performance/extrae/share/extrae_detail_circular.xml`
			```
			`Place this in your job folder.`

			`And here is a script to calculate basic performance metrics from mpiP reports`
			```bash
			`curl -fO https://code.hlrs.de/hpcjgrac/hawk-utils-scripts/raw/branch/main/performance/mpiP/share/mpip2POP.py`
			```


			`## Jobscript`

			`You need to modify your jobscripts to attach the tools to your executable.`

			`For mpiP runs add something like this to your jobscript`
			```bash
			`# activate Spack environment`
			`export PROJECT_ID=465000533 # put your own here`
			`export SPACK_USER_PREFIX=/project/project_${PROJECT_ID}/spack_ChEESE`
			`module load spack/23.03-2`
			`eval $(spack env activate --sh $SPACK_USER_PREFIX/cheese_env)`

			`# setup your modules, etc`
			`# module load ...`

			`# load and configure mpiP`
			`eval $(spack load --sh mpip)`
			`export MPIP="-c -d"`
			`TRACE="env LDPRELOAD=libmpiP.so"`

			`# add $TRACE before you application executable`
			`srun ... ${TRACE} ./appl ...`
			```
			This will produce mpiP reports which end in `.1.mpiP` or `.2.mpiP`.

			`For Extrae add the following to your jobscript`
			```bash
			`# activate Spack environment`
			`export PROJECT_ID=465000533 # put your own here`
			`export SPACK_USER_PREFIX=/project/project_${PROJECT_ID}/spack_ChEESE`
			`module load spack/23.03-2`
			`eval $(spack env activate --sh $SPACK_USER_PREFIX/cheese_env)`

			`# setup your modules, etc`
			`# module load ...`

			`# setup your modules, etc`
			`# load and configure Extrae`
			`eval $(spack load --sh extrae)`
			`export EXTRAE_CONFIG_FILE=/PATH/TO/extrae_detail_circular.xml`
Preload OMP + MPI library 2023-12-21 10:57:14 +00:00			`TRACE="env LD_PRELOAD=libompitracecf.so"`
Add instructions 2023-11-16 15:59:28 +00:00
			`# add $TRACE before you application executable`
			`srun ... ${TRACE} ./appl ...`
			```
			This will create files `TRACE.` and directories `set-` which hold the intermediate traces. You need to merge these into regular traces with the command
			```bash
			`eval $(spack load --sh extrae)`
			`mpi2prv -f TRACE.mpits -o trace_YOUR_FAVORITE_NAME_NRANKS.prv`
			```

			`## Determine baseline performance`

			Running your application under the control of a performance analysis tool can incur significant overhead, i.e. your code will take noticeably longer to execute. At the same time, such overhead will have an impact on the quality of your performance analysis and the robustness of your conclusions. Always be aware of the amount of overhead and try to keep it small where possible. In many cases it is possible to reduce the overhead below 5% of the execution time, which is the same order of magnitude of expected performance variability between runs. If your overhead is larger, be aware that performance metrics may be off by at least as much.

			`It is therefore important to measure the performance of your code for the particular use-case before applying any performance analysis tools. We refer to this as _non-instrumented performance._`

			`At the very least you should determine the elapsed time of run. Do for instance`
			```bash
			`time mpirun ... ./app`
			```
			`and record the "User time" portion of the output.`

			`Many codes keep track of an application-specific performance metric, such as for instance iterations per second, or similar. Often, this a better than the raw elapsed time, as it will disregard initialisation and shutdown phases which are negligible for longer production runs, but not for short analysis use-cases. If your code reports such a metric, record this as well in addition to the elapsed time. You may consider adding an application-specific metric to your code, if not available yet.`

			`Consider doing not just one run, but several to get a feeling for the variation of the non-instrumented performance across runs.`


			`## Optional: Installing DLB`

			`Activate your Spack environment as [explained above](#Activating%20Spack%20environment), then load your module environment as usual.`

			`Add the latest Spack recipe for DLB with`
			```
			`mkdir -p $SPACK_USER_PREFIX/repo/packages/dlb`
			`curl --output-dir $SPACK_USER_PREFIX/repo/packages/dlb -fO https://raw.githubusercontent.com/spack/spack/develop/var/spack/repos/builtin/packages/dlb/package.py`
			```

			`Now, install DLB with`
			```bash
			`# get version of currenty loaded compiler and MPI`
			`if [ $LMOD_FAMILY_COMPILER = "cce" ]; then`
			`__COMPILER=$LMOD_FAMILY_COMPILER`
			`else`
			`__COMPILER=$LMOD_FAMILY_COMPILER@$LMOD_FAMILY_COMPILER_VERSION`
			`fi`
			`__MPI=$LMOD_FAMILY_MPI@$LMOD_FAMILY_MPI_VERSION`

			`spack spec -N -I dlb@3.3.1%${__COMPILER} ^${__MPI}`
			`spack install --add dlb@3.3.1%${__COMPILER} ^${__MPI}`
			```

			`TODO: use DLB`


			`## Optional: Fixing Modulefiles`

			`TBD`
			```bash
			`spack module tcl refresh -y`
			`find 23.03/0.20.0/modules/tcl -type f -exec sed -i "s\|^setenv \$<SPACK.PKG.*$\|#\0\|" {} \;`
			```