ray_template/README.md

# Dask: How to execute python workloads using a Dask cluster on Vulcan

This repository looks at a deployment of a Dask cluster on Vulcan, and executing your programs using this cluster.

## Table of Contents
- [Getting Started](#getting-started)
- [Usage](#usage)
- [Notes](#notes)

## Getting Started

### 1. Build and transfer the Conda environment to Hawk:

Only the `main` and `r` channels are available using the Conda module on the clusters. To use custom packages, we need to move the local Conda environment to Hawk. 

Follow the instructions in [the Conda environment builder repository](https://code.hlrs.de/SiVeGCS/conda-env-builder). The YAML file to create a test environment is available in the `deployment_scripts` directory.

### 2. Allocate workspace on Hawk:

Proceed to the next step if you have already configured your workspace. Use the following command to create a workspace on the high-performance filesystem, which will expire in 10 days. For more information, such as how to enable reminder emails, refer to the [workspace mechanism](https://kb.hlrs.de/platforms/index.php/Workspace_mechanism) guide.

```bash
ws_allocate dask_workspace 10
ws_find dask_workspace # find the path to workspace, which is the destination directory in the next step
```

### 3. Clone the repository on Hawk to use the deployment scripts and project structure:

```bash
cd <workspace_directory>
git clone <repository_url>
```

### 4. Send all the code to the appropriate directory on Vulcan using `scp`:

```bash
scp <your_script>.py <destination_host>:<destination_directory>
```

### 5. SSH into Vulcan and start a job interactively using:

```bash
qsub -I -N DaskJob -l select=1:node_type=clx-21 -l walltime=02:00:00
```
Note: For multiple nodes, it is recommended to write a `.pbs` script and submit it using `qsub`. Follow section [Multiple Nodes](#multiple-nodes) for more information.

### 6. Go into the directory with all code:

```bash
cd <destination_directory>
```

### 7. Initialize the Dask cluster:

```bash
source deploy-dask.sh "$(pwd)"
```
   Note: At the moment, the deployment is verbose, and there is no implementation to silence the logs.
   Note: Make sure all permissions are set using `chmod +x` for all scripts.

## Usage

### Single Node
To run the application interactively on a single node, follow points 4, 5, 6 and, 7 from [Getting Started](#getting-started), and execute the following command after all the job has started:

```bash
# Load the Conda module
module load bigdata/conda
source activate # activates the base environment

# List available Conda environments for verification purposes
conda env list

# Activate a specific Conda environment.
conda activate dask_environment # you need to execute `source activate` first, or use `source [ENV_PATH]/bin/activate`
```

After the environment is activated, you can run the python interpretor:

```bash
python
```

Or to run a full script:
```bash
python <your-script>.py
```

### Multiple Nodes
To run the application on multiple nodes, you need to write a `.pbs` script and submit it using `qsub`. Follow lines 1-4 from the [Getting Started](#getting-started) section. Write a `submit-dask-job.pbs` script:

```bash
#!/bin/bash
#PBS -N dask-job
#PBS -l select=3:node_type=rome
#PBS -l walltime=1:00:00

#Go to the directory where the code is
cd <destination_directory>

#Deploy the Dask cluster
source deploy-dask.sh "$(pwd)"

#Run the python script
python <your-script>.py
```

A more thorough example is available in the `deployment_scripts` directory under `submit-dask-job.pbs`.

And then execute the following commands to submit the job:

```bash
qsub submit-dask-job.pbs
qstat -anw # Q: Queued, R: Running, E: Ending
ls -l # list files after the job finishes
cat dask-job.o... # inspect the output file
cat dask-job.e... # inspect the error file
```

## Notes

Note: Dask Cluster is set to verbose, add the following to your code while connecting to the Dask cluster:
```python
client = Client(..., silence_logs='error')
```

Note: Replace all filenames within `<>` with the actual values applicable to your project.
first commit 2023-12-07 09:26:25 +00:00			`# Dask: How to execute python workloads using a Dask cluster on Vulcan`

			`This repository looks at a deployment of a Dask cluster on Vulcan, and executing your programs using this cluster.`

			`## Table of Contents`
			`- [Getting Started](#getting-started)`
			`- [Usage](#usage)`
			`- [Notes](#notes)`

			`## Getting Started`

Improved readability 2024-04-23 14:01:32 +00:00			`### 1. Build and transfer the Conda environment to Hawk:`
Changes according to issue #1 2024-04-23 13:55:58 +00:00
			Only the `main` and `r` channels are available using the Conda module on the clusters. To use custom packages, we need to move the local Conda environment to Hawk.
first commit 2023-12-07 09:26:25 +00:00
Removed files to be ignored 2024-04-23 13:59:14 +00:00			Follow the instructions in [the Conda environment builder repository](https://code.hlrs.de/SiVeGCS/conda-env-builder). The YAML file to create a test environment is available in the `deployment_scripts` directory.
first commit 2023-12-07 09:26:25 +00:00
Improved readability 2024-04-23 14:01:32 +00:00			`### 2. Allocate workspace on Hawk:`
first commit 2023-12-07 09:26:25 +00:00
Changes according to issue #1 2024-04-23 13:55:58 +00:00			`Proceed to the next step if you have already configured your workspace. Use the following command to create a workspace on the high-performance filesystem, which will expire in 10 days. For more information, such as how to enable reminder emails, refer to the [workspace mechanism](https://kb.hlrs.de/platforms/index.php/Workspace_mechanism) guide.`

			```bash
			`ws_allocate dask_workspace 10`
			`ws_find dask_workspace # find the path to workspace, which is the destination directory in the next step`
			```
first commit 2023-12-07 09:26:25 +00:00
Improved readability 2024-04-23 14:01:32 +00:00			`### 3. Clone the repository on Hawk to use the deployment scripts and project structure:`
first commit 2023-12-07 09:26:25 +00:00
Changes according to issue #1 2024-04-23 13:55:58 +00:00			```bash
			`cd <workspace_directory>`
			`git clone <repository_url>`
			```
first commit 2023-12-07 09:26:25 +00:00
Improved readability 2024-04-23 14:01:32 +00:00			### 4. Send all the code to the appropriate directory on Vulcan using `scp`:
finalized for documentation upload 2024-01-03 08:23:41 +00:00
Changes according to issue #1 2024-04-23 13:55:58 +00:00			```bash
			`scp <your_script>.py <destination_host>:<destination_directory>`
			```
finalized for documentation upload 2024-01-03 08:23:41 +00:00
Improved readability 2024-04-23 14:01:32 +00:00			`### 5. SSH into Vulcan and start a job interactively using:`
first commit 2023-12-07 09:26:25 +00:00
Changes according to issue #1 2024-04-23 13:55:58 +00:00			```bash
			`qsub -I -N DaskJob -l select=1:node_type=clx-21 -l walltime=02:00:00`
			```
added changes according to #1 2024-04-25 12:18:00 +00:00			Note: For multiple nodes, it is recommended to write a `.pbs` script and submit it using `qsub`. Follow section [Multiple Nodes](#multiple-nodes) for more information.
first commit 2023-12-07 09:26:25 +00:00
Improved readability 2024-04-23 14:01:32 +00:00			`### 6. Go into the directory with all code:`
first commit 2023-12-07 09:26:25 +00:00
Changes according to issue #1 2024-04-23 13:55:58 +00:00			```bash
			`cd <destination_directory>`
			```
first commit 2023-12-07 09:26:25 +00:00
Improved readability 2024-04-23 14:01:32 +00:00			`### 7. Initialize the Dask cluster:`
first commit 2023-12-07 09:26:25 +00:00
Changes according to issue #1 2024-04-23 13:55:58 +00:00			```bash
			`source deploy-dask.sh "$(pwd)"`
			```
first commit 2023-12-07 09:26:25 +00:00			`Note: At the moment, the deployment is verbose, and there is no implementation to silence the logs.`
			Note: Make sure all permissions are set using `chmod +x` for all scripts.

			`## Usage`

added changes according to #1 2024-04-25 12:18:00 +00:00			`### Single Node`
changes made accroding to issue #1 2024-05-22 09:43:24 +00:00			`To run the application interactively on a single node, follow points 4, 5, 6 and, 7 from [Getting Started](#getting-started), and execute the following command after all the job has started:`
first commit 2023-12-07 09:26:25 +00:00
			```bash
changes made accroding to issue #1 2024-05-22 09:43:24 +00:00			`# Load the Conda module`
			`module load bigdata/conda`
			`source activate # activates the base environment`

			`# List available Conda environments for verification purposes`
			`conda env list`

			`# Activate a specific Conda environment.`
			conda activate dask_environment # you need to execute `source activate` first, or use `source [ENV_PATH]/bin/activate`
first commit 2023-12-07 09:26:25 +00:00			```

changes made accroding to issue #1 2024-05-22 09:43:24 +00:00			`After the environment is activated, you can run the python interpretor:`

first commit 2023-12-07 09:26:25 +00:00			```bash
changes made accroding to issue #1 2024-05-22 09:43:24 +00:00			`python`
first commit 2023-12-07 09:26:25 +00:00			```

changes made accroding to issue #1 2024-05-22 09:43:24 +00:00			`Or to run a full script:`
first commit 2023-12-07 09:26:25 +00:00			```bash
changes made accroding to issue #1 2024-05-22 09:43:24 +00:00			`python <your-script>.py`
first commit 2023-12-07 09:26:25 +00:00			```

added changes according to #1 2024-04-25 12:18:00 +00:00			`### Multiple Nodes`
			To run the application on multiple nodes, you need to write a `.pbs` script and submit it using `qsub`. Follow lines 1-4 from the [Getting Started](#getting-started) section. Write a `submit-dask-job.pbs` script:

			```bash
			`#!/bin/bash`
			`#PBS -N dask-job`
changes made accroding to issue #1 2024-05-22 09:43:24 +00:00			`#PBS -l select=3:node_type=rome`
added changes according to #1 2024-04-25 12:18:00 +00:00			`#PBS -l walltime=1:00:00`

			`#Go to the directory where the code is`
			`cd <destination_directory>`

			`#Deploy the Dask cluster`
			`source deploy-dask.sh "$(pwd)"`

			`#Run the python script`
			`python <your-script>.py`
			```

changes made accroding to issue #1 2024-05-22 09:43:24 +00:00			A more thorough example is available in the `deployment_scripts` directory under `submit-dask-job.pbs`.

added changes according to #1 2024-04-25 12:18:00 +00:00			`And then execute the following commands to submit the job:`

			```bash
			`qsub submit-dask-job.pbs`
			`qstat -anw # Q: Queued, R: Running, E: Ending`
			`ls -l # list files after the job finishes`
			`cat dask-job.o... # inspect the output file`
			`cat dask-job.e... # inspect the error file`
			```

first commit 2023-12-07 09:26:25 +00:00			`## Notes`

			`Note: Dask Cluster is set to verbose, add the following to your code while connecting to the Dask cluster:`
			```python
			`client = Client(..., silence_logs='error')`
			```

			Note: Replace all filenames within `<>` with the actual values applicable to your project.