This document describes how to prepare codes for ChEESE performance audits on LUMI.
Essentially this requires three steps:
1. Prepare Spack for tools installation
2. Install tools through Spack
3. Jobscripts
4. Determine baselines
Well, that is four steps :smile:.
## Prepare Spack for tool installation
We will leverage LUMI's Spack facility for installation of tools. Any version of Spack will do, but it need to be the same at the time of tool installation and when doing the runs. For the purpose of this document we will use LUMI's module `spack/23.03-2`.
Spack requires disk space to install packages, book-keeping, etc. On LUMI this is controlled by the environment variable `$SPACK_USER_PREFIX` which needs to be set before the Spack module can even be loaded. It is recommended to point this variable to a directory which is readable by your whole LUMI compute project.
You might consider putting this in you `.bashrc` or similar as this will be used at every stage of this document.
Now, it is time to bootstrap Spack to install dependencies, etc. This needs to be done only once. Actually, this is step is optional, as it will be executed at your first usage of spack. But it takes some (long!) time and might make you nervous. Anyway, *make it so!* with:
```bash
spack bootstrap now
```
Next, you need to set up a so called Spack environment. Do the following
First, activate your Spack environment as in the previous section.
Then, setup your module environment as usual for doing runs or compiling your code. Repeat this every time that you install/modify your private Spack installation. Check with
This will create files `TRACE.*` and directories `set-*` which hold the intermediate traces. You need to *merge* these into regular traces with the command
Running your application under the control of a performance analysis tool can incur significant overhead, i.e. your code will take noticeably longer to execute. At the same time, such overhead will have an impact on the quality of your performance analysis and the robustness of your conclusions. Always be aware of the amount of overhead and try to keep it small where possible. In many cases it is possible to reduce the overhead below 5% of the execution time, which is the same order of magnitude of expected performance variability between runs. If your overhead is larger, be aware that performance metrics may be off by at least as much.
It is therefore important to measure the performance of your code for the particular use-case before applying any performance analysis tools. We refer to this as _non-instrumented performance._
At the very least you should determine the elapsed time of run. Do for instance
```bash
time mpirun ... ./app
```
and record the "User time" portion of the output.
Many codes keep track of an application-specific performance metric, such as for instance iterations per second, or similar. Often, this a better than the raw elapsed time, as it will disregard initialisation and shutdown phases which are negligible for longer production runs, but not for short analysis use-cases. If your code reports such a metric, record this as well in addition to the elapsed time. You may consider adding an application-specific metric to your code, if not available yet.
Consider doing not just one run, but several to get a feeling for the variation of the non-instrumented performance across runs.
## Optional: Installing DLB
Activate your Spack environment as [explained above](#Activating%20Spack%20environment), then load your module environment as usual.