# Best practice -- Storage on Hawk Change history: - Initial version ; Jose Gracia, 7 May 2024 TODOs: - [Cleanup after parallel job](Best_practice--Storage_on_Hawk.md#cleanup-after-parallel-job): check if there is a HLRS recommendation to copy files from workspaces. ## Available filesystems `$HOME` - get current quota: `$ na_quota` - group quota: 200 GB, no files limit - user quota: 50 GB, no files limit - mounted via NFS; relatively slow Workspaces - get current quota: `$ ws_quota` - group quota: 3 TB, 100k files - user quota: none - parallel file system Lustre; metadata slow, parallel access is fast - hitting quota limit, disables queues for whole group `/localscratch/$UID` - total size: `df -h /localscratch` -> 22 TB - temporary scratch space - deleted at logout - local SSD; fast - available only on login nodes --- ## What to put where? Persistent data on `$HOME`. E.g: - source code - installed programs Temporary data/builds on `/localscratch/$UID`. E.g: - anything temporary which does not fit on `$HOME` - Data for parallel jobs on workspace. E.g. - Input and outputs of jobs - anything which is access through MPI-IO or similar --- ## Typical workflows --- ### Large software project with autotools, e.g. OpenMPI ```bash SRCFS=$HOME; BUILDFS=/localscratch/$UID; INSTALLFS=$HOME/opt git clone --depth=1 git@github.com:open-mpi/ompi.git $SRCFS/ompi.git cd $SRCFS/ompi.git; autoreconf -fiv mkdir -p $INSTALLFS/ompi_test_347; mkdir -p $BUILDFS/build_ompi cd $BUILDFS/build_ompi $SRCFS/ompi.git/configure --prefix $INSTALLFS/ompi_test_347 make && make install ``` --- ### Large software project with CMake, e.g. targetDART ```bash SRCFS=$HOME; BUILDFS=/localscratch/$UID; INSTALLFS=$HOME/opt git clone --depth=1 git@github.com:targetDART/llvm-project.git $SRCFS/TD.git mkdir -p $INSTALLFS/TD_buggy_again; mkdir -p $BUILDFS/build_TD cd $BUILDFS/build_TD cmake $SRCFS/TD.git/llvm Ninja -DCMAKE_INSTALL_PREFIX=$INSTALLFS/TD_buggy_again make && make install ``` --- ### Avoiding large git checkouts Do you really need all history? Do shallow clones. ```bash git clone --depth=1 ... ``` --- ### Mirror git repo to $BUILDFS Assuming you have a (possibly large) git repo `my_repo.git` on `$HOME`. Copy/clone source to `$BUILDFS` for faster access. ```bash SRCFS=$HOME; BUILDFS=/localscratch/$UID; INSTALLFS=$HOME/opt ls $SRCFS/my_repo.git git clone --depth=1 file://$SRCFS/my_repo.git $BUILDFS/mirror_repo.git mkdir $BUILDFS/build cd $BUILDFS/build cmake $BUILDFS/mirror_repo.git/ -DCMAKE_INSTALL_PREFIX=$INSTALLFS/whatever $BUILDFS/mirror_repo.git/configure --prefix $INSTALLFS/whatever ``` --- ### Cleanup after parallel job Rsynch results of parallel job into `$HOME`. ```cat job.pbs PERMANENTFS=$HOME/results/ cd $(ws_allocate my_job 1) mpirun ./app --resultsdir=results rsynch -a results $PERMANENTFS/ rm -rf results ``` ---