Template repository for Ray workflows on HLRS HPC Systems

Find a file

Kerem Kayabay 8474b4328d preparing the environment for linux-64 platform		2024-01-04 11:47:57 +01:00
deployment_scripts	preparing the environment for linux-64 platform	2024-01-04 11:47:57 +01:00
notebooks	finalized for documentation upload	2024-01-03 09:23:41 +01:00
src	first commit	2023-12-07 10:26:25 +01:00
.gitignore	change environment.yaml to install Ray	2024-01-03 15:53:42 +01:00
README.md	modify scripts for creating the environment	2024-01-03 16:37:34 +01:00

Ray: How to launch a Ray Cluster on Hawk?

This guide shows you how to launch a Ray cluster on HLRS' Hawk system.

Prerequisites

Before running the application, make sure you have the following prerequisites installed in a conda environment:

Python 3.9: This specific python version is used for all uses, you can select it using while creating the conda environment. For more information on, look at the documentation for Conda on HLRS HPC systems.
Conda Installation: Ensure that Conda is installed on your local system. For more information, look at the documentation for Conda on HLRS HPC systems.
Ray: You can install Ray inside
Conda Pack: Conda pack is used to package the Conda environment into a single tarball. This is used to transfer the environment to Vulcan.

git clone <repository_url>

Go into the directory and create an environment using Conda and environment.yaml. Note: Be sure to add the necessary packages in environment.yaml:

cd deployment_scripts
./create-env.sh <your-env>

Send all files using deploy-env.sh:

./deployment_scripts/deploy-env.sh <your-env> <destination_host>:<destination_directory>

Send all the code to the appropriate directory on Vulcan using scp:
```
scp <your_script>.py <destination_host>:<destination_directory>
```

SSH into Vulcan and start a job interatively using:

qsub -I -N DaskJob -l select=4:node_type=clx-21 -l walltime=02:00:00

Go into the directory with all code:
```
cd <destination_directory>
```
Initialize the Dask cluster:
```
source deploy-dask.sh "$(pwd)"
```
Note: At the moment, the deployment is verbose, and there is no implementation to silence the logs. Note: Make sure all permissions are set using chmod +x for all scripts.

To run the application interactively, execute the following command after all the cluster's nodes are up and running:

python

Or to run a full script:

python <your-script>.py

Note: If you don't see your environment in the python interpretor, then manually activate it using:

conda activate <your-env>

Do this before using the python interpretor.

Note: Dask Cluster is set to verbose, add the following to your code while connecting to the Dask cluster:

client = Client(..., silence_logs='error')

Note: Replace all filenames within <> with the actual values applicable to your project.