117 lines
No EOL
3.6 KiB
Markdown
117 lines
No EOL
3.6 KiB
Markdown
# Dask: How to execute python workloads using a Dask cluster on Vulcan
|
|
|
|
This repository looks at a deployment of a Dask cluster on Vulcan, and executing your programs using this cluster.
|
|
|
|
## Table of Contents
|
|
- [Getting Started](#getting-started)
|
|
- [Usage](#usage)
|
|
- [Notes](#notes)
|
|
|
|
## Getting Started
|
|
|
|
### 1. Build and transfer the Conda environment to Hawk:
|
|
|
|
Only the `main` and `r` channels are available using the Conda module on the clusters. To use custom packages, we need to move the local Conda environment to Hawk.
|
|
|
|
Follow the instructions in [the Conda environment builder repository](https://code.hlrs.de/SiVeGCS/conda-env-builder). The YAML file to create a test environment is available in the `deployment_scripts` directory.
|
|
|
|
### 2. Allocate workspace on Hawk:
|
|
|
|
Proceed to the next step if you have already configured your workspace. Use the following command to create a workspace on the high-performance filesystem, which will expire in 10 days. For more information, such as how to enable reminder emails, refer to the [workspace mechanism](https://kb.hlrs.de/platforms/index.php/Workspace_mechanism) guide.
|
|
|
|
```bash
|
|
ws_allocate dask_workspace 10
|
|
ws_find dask_workspace # find the path to workspace, which is the destination directory in the next step
|
|
```
|
|
|
|
### 3. Clone the repository on Hawk to use the deployment scripts and project structure:
|
|
|
|
```bash
|
|
cd <workspace_directory>
|
|
git clone <repository_url>
|
|
```
|
|
|
|
### 4. Send all the code to the appropriate directory on Vulcan using `scp`:
|
|
|
|
```bash
|
|
scp <your_script>.py <destination_host>:<destination_directory>
|
|
```
|
|
|
|
### 5. SSH into Vulcan and start a job interactively using:
|
|
|
|
```bash
|
|
qsub -I -N DaskJob -l select=1:node_type=clx-21 -l walltime=02:00:00
|
|
```
|
|
Note: For multiple nodes, it is recommended to write a `.pbs` script and submit it using `qsub`. Follow section [Multiple Nodes](#multiple-nodes) for more information.
|
|
|
|
### 6. Go into the directory with all code:
|
|
|
|
```bash
|
|
cd <destination_directory>
|
|
```
|
|
|
|
### 7. Initialize the Dask cluster:
|
|
|
|
```bash
|
|
source deploy-dask.sh "$(pwd)"
|
|
```
|
|
Note: At the moment, the deployment is verbose, and there is no implementation to silence the logs.
|
|
Note: Make sure all permissions are set using `chmod +x` for all scripts.
|
|
|
|
## Usage
|
|
|
|
### Single Node
|
|
To run the application interactively on a single node, execute the following command after all the cluster's nodes are up and running:
|
|
|
|
```bash
|
|
python
|
|
```
|
|
|
|
Or to run a full script:
|
|
```bash
|
|
python <your-script>.py
|
|
```
|
|
|
|
Note: If you don't see your environment in the python interpretor, then manually activate it using:
|
|
```bash
|
|
conda activate <your-env>
|
|
```
|
|
Do this before using the python interpretor.
|
|
|
|
### Multiple Nodes
|
|
To run the application on multiple nodes, you need to write a `.pbs` script and submit it using `qsub`. Follow lines 1-4 from the [Getting Started](#getting-started) section. Write a `submit-dask-job.pbs` script:
|
|
|
|
```bash
|
|
#!/bin/bash
|
|
#PBS -N dask-job
|
|
#PBS -l select=3:node_type=rome-ai
|
|
#PBS -l walltime=1:00:00
|
|
|
|
#Go to the directory where the code is
|
|
cd <destination_directory>
|
|
|
|
#Deploy the Dask cluster
|
|
source deploy-dask.sh "$(pwd)"
|
|
|
|
#Run the python script
|
|
python <your-script>.py
|
|
```
|
|
|
|
And then execute the following commands to submit the job:
|
|
|
|
```bash
|
|
qsub submit-dask-job.pbs
|
|
qstat -anw # Q: Queued, R: Running, E: Ending
|
|
ls -l # list files after the job finishes
|
|
cat dask-job.o... # inspect the output file
|
|
cat dask-job.e... # inspect the error file
|
|
```
|
|
|
|
## Notes
|
|
|
|
Note: Dask Cluster is set to verbose, add the following to your code while connecting to the Dask cluster:
|
|
```python
|
|
client = Client(..., silence_logs='error')
|
|
```
|
|
|
|
Note: Replace all filenames within `<>` with the actual values applicable to your project. |