2024-01-03 14:53:42 +00:00
# Ray: How to launch a Ray Cluster on Hawk?
2023-12-07 09:26:25 +00:00
2024-01-03 14:53:42 +00:00
This guide shows you how to launch a Ray cluster on HLRS' Hawk system.
2023-12-07 09:26:25 +00:00
## Table of Contents
2024-01-03 14:53:42 +00:00
- [Ray: How to launch a Ray Cluster on Hawk? ](#ray-how-to-launch-a-ray-cluster-on-hawk )
- [Table of Contents ](#table-of-contents )
- [Prerequisites ](#prerequisites )
- [Getting Started ](#getting-started )
2024-01-05 12:22:52 +00:00
- [Launch a Ray Cluster in Interactive Mode ](#launch-a-ray-cluster-in-interactive-mode )
- [Launch a Ray Cluster in Batch Mode ](#launch-a-ray-cluster-in-batch-mode )
2023-12-07 09:26:25 +00:00
## Prerequisites
2024-01-05 12:22:52 +00:00
Before building the environment, make sure you have the following prerequisites:
- [Conda Installation ](https://docs.conda.io/projects/conda/en/latest/user-guide/install/index.html ): Ensure that Conda is installed on your local system.
- [Conda-Pack ](https://conda.github.io/conda-pack/ ) installed in the base environment: Conda pack is used to package the Conda environment into a single tarball. This is used to transfer the environment to the target system.
- `linux-64` platform for installing the Conda packages because Conda/pip downloads and installs precompiled binaries suitable to the architecture and OS of the local environment.
For more information, look at the documentation for [Conda on HLRS HPC systems ](https://kb.hlrs.de/platforms/index.php/How_to_move_local_conda_environments_to_the_clusters )
2023-12-07 09:26:25 +00:00
## Getting Started
2024-01-05 12:22:52 +00:00
Only the main and r channels are available using the conda module on the clusters. To use custom packages, we need to move the local conda environment to Hawk.
2024-01-03 15:37:34 +00:00
1. Clone this repository to your local machine:
2023-12-07 09:26:25 +00:00
2024-01-03 15:37:34 +00:00
```bash
git clone < repository_url >
```
2023-12-07 09:26:25 +00:00
2024-01-05 12:22:52 +00:00
2. Go into the directory and create an environment using Conda and environment.yaml.
Note: Be sure to add the necessary packages in `deployment_scripts/environment.yaml` :
2023-12-07 09:26:25 +00:00
2024-01-03 15:37:34 +00:00
```bash
cd deployment_scripts
./create-env.sh < your-env >
```
2023-12-07 09:26:25 +00:00
2024-01-05 12:22:52 +00:00
3. Package the environment and transfer the archive to the target system:
2023-12-07 09:26:25 +00:00
2024-01-05 12:22:52 +00:00
```bash
(my_env) $ conda deactivate
(base) $ conda pack -n my_env -o my_env.tar.gz # conda-pack must be installed in the base environment
```
2024-01-03 08:23:41 +00:00
2024-01-05 12:22:52 +00:00
A workspace is suitable to store the compressed Conda environment archive on Hawk. Proceed to the next step if you have already configured your workspace. Use the following command to create a workspace on the high-performance filesystem, which will expire in 10 days. For more information, such as how to enable reminder emails, refer to the [workspace mechanism ](https://kb.hlrs.de/platforms/index.php/Workspace_mechanism ) guide.
2024-01-03 08:23:41 +00:00
2024-01-05 12:22:52 +00:00
```bash
ws_allocate hpda_project 10
ws_find hpda_project # find the path to workspace, which is the destination directory in the next step
```
2023-12-07 09:26:25 +00:00
2024-01-05 12:22:52 +00:00
You can send your data to an existing workspace using:
2023-12-07 09:26:25 +00:00
2024-01-05 12:22:52 +00:00
```bash
scp my_env.tar.gz < username > @hawk.hww.hlrs.de:< workspace_directory >
rm my_env.tar.gz # We don't need the archive locally anymore.
```
2023-12-07 09:26:25 +00:00
2024-01-05 12:22:52 +00:00
4. Clone the repository on Hawk to use the deployment scripts and project structure:
2023-12-07 09:26:25 +00:00
2024-01-05 12:22:52 +00:00
```bash
cd < workspace_directory >
git clone < repository_url >
```
2023-12-07 09:26:25 +00:00
2024-01-05 12:22:52 +00:00
## Launch a Ray Cluster in Interactive Mode
2023-12-07 09:26:25 +00:00
2024-01-05 12:22:52 +00:00
Using a single node interactively provides opportunities for faster code debugging.
2023-12-07 09:26:25 +00:00
2024-01-05 12:22:52 +00:00
1. On the Hawk login node, start an interactive job using:
2023-12-07 09:26:25 +00:00
```bash
2024-01-05 12:22:52 +00:00
qsub -I -l select=1:node_type=rome -l walltime=01:00:00
2023-12-07 09:26:25 +00:00
```
2024-01-05 12:22:52 +00:00
2. Go into the directory with all code:
2023-12-07 09:26:25 +00:00
```bash
2024-01-05 12:22:52 +00:00
cd < source_directory > /deployment_scripts
2023-12-07 09:26:25 +00:00
```
2024-01-05 12:22:52 +00:00
3. Deploy the conda environment to the ram disk:
2023-12-07 09:26:25 +00:00
```bash
2024-01-05 12:22:52 +00:00
source deploy-env.sh
2023-12-07 09:26:25 +00:00
```
2024-01-05 12:22:52 +00:00
Note: Make sure all permissions are set using `chmod +x` .
2023-12-07 09:26:25 +00:00
2024-01-05 12:22:52 +00:00
4. Initialize the Ray cluster.
You can use a Python interpreter to start a Ray cluster:
2023-12-07 09:26:25 +00:00
```python
2024-01-05 12:22:52 +00:00
import ray
ray.init(dashboard_host='127.0.0.1')
2023-12-07 09:26:25 +00:00
```
2024-01-05 12:22:52 +00:00
1. Connect to the dashboard.
Warning: Always use `127.0.0.1` as the dashboard host to make the Ray cluster reachable by only you.
## Launch a Ray Cluster in Batch Mode
1. Add execution permissions to `start-ray-worker.sh`
```bash
cd deployment_scripts
chmod +x ray-start-worker.sh
```
2. Submit a job to launch the head and worker nodes.
You must modify the following variables in `submit-ray-job.sh` :
- Line 3 changes the cluster size. The default configuration launches a 3 node cluster.
- `$PROJECT_DIR`