ray_template/README.md

# Ray: How to launch a Ray Cluster on Hawk?

This guide shows you how to launch a Ray cluster on HLRS' Hawk system.

## Table of Contents
- [Ray: How to launch a Ray Cluster on Hawk?](#ray-how-to-launch-a-ray-cluster-on-hawk)
  - [Table of Contents](#table-of-contents)
  - [Prerequisites](#prerequisites)
  - [Getting Started](#getting-started)
  - [Launch a Ray Cluster in Interactive Mode](#launch-a-ray-cluster-in-interactive-mode)
  - [Launch a Ray Cluster in Batch Mode](#launch-a-ray-cluster-in-batch-mode)

## Prerequisites

Before building the environment, make sure you have the following prerequisites:
- [Conda Installation](https://docs.conda.io/projects/conda/en/latest/user-guide/install/index.html): Ensure that Conda is installed on your local system.
- [Conda-Pack](https://conda.github.io/conda-pack/) installed in the base environment: Conda pack is used to package the Conda environment into a single tarball. This is used to transfer the environment to the target system.
- `linux-64` platform for installing the Conda packages because Conda/pip downloads and installs precompiled binaries suitable to the architecture and OS of the local environment.

For more information, look at the documentation for [Conda on HLRS HPC systems](https://kb.hlrs.de/platforms/index.php/How_to_move_local_conda_environments_to_the_clusters)

## Getting Started

Only the main and r channels are available using the conda module on the clusters. To use custom packages, we need to move the local conda environment to Hawk. 

1. Clone this repository to your local machine:

```bash
git clone <repository_url>
```

2. Go into the directory and create an environment using Conda and environment.yaml.

Note: Be sure to add the necessary packages in `deployment_scripts/environment.yaml`:

```bash
cd deployment_scripts
./create-env.sh <your-env>
```

3. Package the environment and transfer the archive to the target system:

```bash
(my_env) $ conda deactivate
(base) $ conda pack -n my_env -o my_env.tar.gz # conda-pack must be installed in the base environment
```

A workspace is suitable to store the compressed Conda environment archive on Hawk. Proceed to the next step if you have already configured your workspace. Use the following command to create a workspace on the high-performance filesystem, which will expire in 10 days. For more information, such as how to enable reminder emails, refer to the [workspace mechanism](https://kb.hlrs.de/platforms/index.php/Workspace_mechanism) guide.

```bash
ws_allocate hpda_project 10
ws_find hpda_project # find the path to workspace, which is the destination directory in the next step
```

You can send your data to an existing workspace using: 

```bash
scp my_env.tar.gz <username>@hawk.hww.hlrs.de:<workspace_directory>
rm my_env.tar.gz # We don't need the archive locally anymore.
```

4. Clone the repository on Hawk to use the deployment scripts and project structure:

```bash
cd <workspace_directory>
git clone <repository_url>
```

## Launch a Ray Cluster in Interactive Mode

Using a single node interactively provides opportunities for faster code debugging.

1. On the Hawk login node, start an interactive job using:

```bash
qsub -I -l select=1:node_type=rome -l walltime=01:00:00
```

2. Go into the directory with all code:

```bash
cd <source_directory>/deployment_scripts
```

3. Deploy the conda environment to the ram disk:

```bash
source deploy-env.sh
```
Note: Make sure all permissions are set using `chmod +x`.

4. Initialize the Ray cluster.

You can use a Python interpreter to start a Ray cluster:

```python
import ray

ray.init(dashboard_host='127.0.0.1')
```

1. Connect to the dashboard.

Warning: Always use `127.0.0.1` as the dashboard host to make the Ray cluster reachable by only you.

## Launch a Ray Cluster in Batch Mode

1. Add execution permissions to `start-ray-worker.sh`

```bash
cd deployment_scripts
chmod +x ray-start-worker.sh
```

2. Submit a job to launch the head and worker nodes.

You must modify the following variables in `submit-ray-job.sh`:
- Line 3 changes the cluster size. The default configuration launches a 3 node cluster.
- `$PROJECT_DIR`
change environment.yaml to install Ray 2024-01-03 14:53:42 +00:00			`# Ray: How to launch a Ray Cluster on Hawk?`
first commit 2023-12-07 09:26:25 +00:00
change environment.yaml to install Ray 2024-01-03 14:53:42 +00:00			`This guide shows you how to launch a Ray cluster on HLRS' Hawk system.`
first commit 2023-12-07 09:26:25 +00:00
			`## Table of Contents`
change environment.yaml to install Ray 2024-01-03 14:53:42 +00:00			`- [Ray: How to launch a Ray Cluster on Hawk?](#ray-how-to-launch-a-ray-cluster-on-hawk)`
			`- [Table of Contents](#table-of-contents)`
			`- [Prerequisites](#prerequisites)`
			`- [Getting Started](#getting-started)`
ready to test the workflow on Hawk 2024-01-05 12:22:52 +00:00			`- [Launch a Ray Cluster in Interactive Mode](#launch-a-ray-cluster-in-interactive-mode)`
			`- [Launch a Ray Cluster in Batch Mode](#launch-a-ray-cluster-in-batch-mode)`
first commit 2023-12-07 09:26:25 +00:00
			`## Prerequisites`

ready to test the workflow on Hawk 2024-01-05 12:22:52 +00:00			`Before building the environment, make sure you have the following prerequisites:`
			`- [Conda Installation](https://docs.conda.io/projects/conda/en/latest/user-guide/install/index.html): Ensure that Conda is installed on your local system.`
			`- [Conda-Pack](https://conda.github.io/conda-pack/) installed in the base environment: Conda pack is used to package the Conda environment into a single tarball. This is used to transfer the environment to the target system.`
			- `linux-64` platform for installing the Conda packages because Conda/pip downloads and installs precompiled binaries suitable to the architecture and OS of the local environment.

			`For more information, look at the documentation for [Conda on HLRS HPC systems](https://kb.hlrs.de/platforms/index.php/How_to_move_local_conda_environments_to_the_clusters)`
first commit 2023-12-07 09:26:25 +00:00
			`## Getting Started`

ready to test the workflow on Hawk 2024-01-05 12:22:52 +00:00			`Only the main and r channels are available using the conda module on the clusters. To use custom packages, we need to move the local conda environment to Hawk.`

modify scripts for creating the environment 2024-01-03 15:37:34 +00:00			`1. Clone this repository to your local machine:`
first commit 2023-12-07 09:26:25 +00:00
modify scripts for creating the environment 2024-01-03 15:37:34 +00:00			```bash
			`git clone <repository_url>`
			```
first commit 2023-12-07 09:26:25 +00:00
ready to test the workflow on Hawk 2024-01-05 12:22:52 +00:00			`2. Go into the directory and create an environment using Conda and environment.yaml.`

			Note: Be sure to add the necessary packages in `deployment_scripts/environment.yaml`:
first commit 2023-12-07 09:26:25 +00:00
modify scripts for creating the environment 2024-01-03 15:37:34 +00:00			```bash
			`cd deployment_scripts`
			`./create-env.sh <your-env>`
			```
first commit 2023-12-07 09:26:25 +00:00
ready to test the workflow on Hawk 2024-01-05 12:22:52 +00:00			`3. Package the environment and transfer the archive to the target system:`
first commit 2023-12-07 09:26:25 +00:00
ready to test the workflow on Hawk 2024-01-05 12:22:52 +00:00			```bash
			`(my_env) $ conda deactivate`
			`(base) $ conda pack -n my_env -o my_env.tar.gz # conda-pack must be installed in the base environment`
			```
finalized for documentation upload 2024-01-03 08:23:41 +00:00
ready to test the workflow on Hawk 2024-01-05 12:22:52 +00:00			`A workspace is suitable to store the compressed Conda environment archive on Hawk. Proceed to the next step if you have already configured your workspace. Use the following command to create a workspace on the high-performance filesystem, which will expire in 10 days. For more information, such as how to enable reminder emails, refer to the [workspace mechanism](https://kb.hlrs.de/platforms/index.php/Workspace_mechanism) guide.`
finalized for documentation upload 2024-01-03 08:23:41 +00:00
ready to test the workflow on Hawk 2024-01-05 12:22:52 +00:00			```bash
			`ws_allocate hpda_project 10`
			`ws_find hpda_project # find the path to workspace, which is the destination directory in the next step`
			```
first commit 2023-12-07 09:26:25 +00:00
ready to test the workflow on Hawk 2024-01-05 12:22:52 +00:00			`You can send your data to an existing workspace using:`
first commit 2023-12-07 09:26:25 +00:00
ready to test the workflow on Hawk 2024-01-05 12:22:52 +00:00			```bash
			`scp my_env.tar.gz <username>@hawk.hww.hlrs.de:<workspace_directory>`
			`rm my_env.tar.gz # We don't need the archive locally anymore.`
			```
first commit 2023-12-07 09:26:25 +00:00
ready to test the workflow on Hawk 2024-01-05 12:22:52 +00:00			`4. Clone the repository on Hawk to use the deployment scripts and project structure:`
first commit 2023-12-07 09:26:25 +00:00
ready to test the workflow on Hawk 2024-01-05 12:22:52 +00:00			```bash
			`cd <workspace_directory>`
			`git clone <repository_url>`
			```
first commit 2023-12-07 09:26:25 +00:00
ready to test the workflow on Hawk 2024-01-05 12:22:52 +00:00			`## Launch a Ray Cluster in Interactive Mode`
first commit 2023-12-07 09:26:25 +00:00
ready to test the workflow on Hawk 2024-01-05 12:22:52 +00:00			`Using a single node interactively provides opportunities for faster code debugging.`
first commit 2023-12-07 09:26:25 +00:00
ready to test the workflow on Hawk 2024-01-05 12:22:52 +00:00			`1. On the Hawk login node, start an interactive job using:`
first commit 2023-12-07 09:26:25 +00:00
			```bash
ready to test the workflow on Hawk 2024-01-05 12:22:52 +00:00			`qsub -I -l select=1:node_type=rome -l walltime=01:00:00`
first commit 2023-12-07 09:26:25 +00:00			```

ready to test the workflow on Hawk 2024-01-05 12:22:52 +00:00			`2. Go into the directory with all code:`

first commit 2023-12-07 09:26:25 +00:00			```bash
ready to test the workflow on Hawk 2024-01-05 12:22:52 +00:00			`cd <source_directory>/deployment_scripts`
first commit 2023-12-07 09:26:25 +00:00			```

ready to test the workflow on Hawk 2024-01-05 12:22:52 +00:00			`3. Deploy the conda environment to the ram disk:`

first commit 2023-12-07 09:26:25 +00:00			```bash
ready to test the workflow on Hawk 2024-01-05 12:22:52 +00:00			`source deploy-env.sh`
first commit 2023-12-07 09:26:25 +00:00			```
ready to test the workflow on Hawk 2024-01-05 12:22:52 +00:00			Note: Make sure all permissions are set using `chmod +x`.
first commit 2023-12-07 09:26:25 +00:00
ready to test the workflow on Hawk 2024-01-05 12:22:52 +00:00			`4. Initialize the Ray cluster.`

			`You can use a Python interpreter to start a Ray cluster:`
first commit 2023-12-07 09:26:25 +00:00
			```python
ready to test the workflow on Hawk 2024-01-05 12:22:52 +00:00			`import ray`

			`ray.init(dashboard_host='127.0.0.1')`
first commit 2023-12-07 09:26:25 +00:00			```

ready to test the workflow on Hawk 2024-01-05 12:22:52 +00:00			`1. Connect to the dashboard.`

			Warning: Always use `127.0.0.1` as the dashboard host to make the Ray cluster reachable by only you.

			`## Launch a Ray Cluster in Batch Mode`

			1. Add execution permissions to `start-ray-worker.sh`

			```bash
			`cd deployment_scripts`
			`chmod +x ray-start-worker.sh`
			```

			`2. Submit a job to launch the head and worker nodes.`

			You must modify the following variables in `submit-ray-job.sh`:
			`- Line 3 changes the cluster size. The default configuration launches a 3 node cluster.`
			- `$PROJECT_DIR`