2.9 KiB
2.9 KiB
Best practice -- Storage on Hawk
Change history:
- Initial version ; Jose Gracia, 7 May 2024
TODOs:
- Cleanup after parallel job: check if there is a HLRS recommendation to copy files from workspaces.
Available filesystems
$HOME
- get current quota:
$ na_quota
- group quota: 200 GB, no files limit
- user quota: 50 GB, no files limit
- mounted via NFS; relatively slow
Workspaces
- get current quota:
$ ws_quota
- group quota: 3 TB, 100k files
- user quota: none
- parallel file system Lustre; metadata slow, parallel access is fast
- hitting quota limit, disables queues for whole group
\localscratch\$UID
- total size:
df -h /localscratch
-> 22 TB - temporary scratch space
- deleted at logout
- local SSD; fast
- available only on login nodes
What to put where?
Persistent data on $HOME. E.g:
- source code
- installed programs
Temporary data/builds on /localscratch/$UID
. E.g:
- anything temporary which does not fit on $HOME
Data for parallel jobs on workspace. E.g.
- Input and outputs of jobs
- anything which is access through MPI-IO or similar
Typical workflows
Large software project with autotools, e.g. OpenMPI
SRCFS=$HOME; BUILDFS=/localscratch/$UID; INSTALLFS=$HOME/opt
git clone --depth=1 git@github.com:open-mpi/ompi.git $SRCFS/ompi.git
cd $SRCFS/ompi.git; autoreconf -fiv
mkdir -p $INSTALLFS/ompi_test_347; mkdir -p $BUILDFS/build_ompi
cd $BUILDFS/build_ompi
$SRCFS/ompi.git/configure --prefix $INSTALLFS/ompi_test_347
make && make install
Large software project with CMake, e.g. targetDART
SRCFS=$HOME; BUILDFS=/localscratch/$UID; INSTALLFS=$HOME/opt
git clone --depth=1 git@github.com:targetDART/llvm-project.git $SRCFS/TD.git
mkdir -p $INSTALLFS/TD_buggy_again; mkdir -p $BUILDFS/build_TD
cd $BUILDFS/build_TD
cmake $SRCFS/TD.git/llvm Ninja -DCMAKE_INSTALL_PREFIX=$INSTALLFS/TD_buggy_again
make && make install
Avoiding large git checkouts
Do you really need all history? Do shallow clones.
git clone --depth=1 ...
Mirror git repo to $BUILDFS
Assuming you have a (possibly large) git repo my_repo.git
on $HOME
.
Copy/clone source to $BUILDFS
for faster access.
SRCFS=$HOME; BUILDFS=/localscratch/$UID; INSTALLFS=$HOME/opt
ls $SRCFS/my_repo.git
git clone --depth=1 file://$SRCFS/my_repo.git $BUILDFS/mirror_repo.git
mkdir $BUILDFS/build
cd $BUILDFS/build
cmake $BUILDFS/mirror_repo.git/ -DCMAKE_INSTALL_PREFIX=$INSTALLFS/whatever
$BUILDFS/mirror_repo.git/configure --prefix $INSTALLFS/whatever
Cleanup after parallel job
Rsynch results of parallel job into $HOME
.
PERMANENTFS=$HOME/results/
cd $(ws_allocate my_job 1)
mpirun ./app --resultsdir=results
rsynch -a results $PERMANENTFS/
rm -rf results