104 lines
No EOL
3.7 KiB
Markdown
104 lines
No EOL
3.7 KiB
Markdown
# Reference Guide: Dask Cluster Deployment Scripts
|
|
|
|
Wiki link:
|
|
|
|
Motivation: This document aims to show users how to use additional Dask deployment scripts to streamline the deployment and management of a Dask cluster on a high-performance computing (HPC) environment.
|
|
|
|
Structure:
|
|
- [ ] [Tutorial](https://diataxis.fr/tutorials/)
|
|
- [ ] [How-to guide](https://diataxis.fr/how-to-guides/)
|
|
- [x] [Reference](https://diataxis.fr/reference/)
|
|
- [ ] [Explanation](https://diataxis.fr/explanation/)
|
|
|
|
To do:
|
|
|
|
---
|
|
|
|
## Overview
|
|
|
|
This repository contains a set of bash scripts designed to streamline the deployment and management of a Dask cluster on a high-performance computing (HPC) environment. These scripts facilitate the creation of Conda environments, deployment of the environment to a remote server, and initiation of Dask clusters on distributed systems. Below is a comprehensive guide on how to use and understand each script:
|
|
|
|
### Note: Permissions
|
|
|
|
Ensure that execution permissions (`chmod +x`) are granted to these scripts before attempting to run them. This can be done using the following command:
|
|
|
|
```bash
|
|
chmod +x script_name.sh
|
|
```
|
|
|
|
## Prerequisites
|
|
|
|
Before using these scripts, ensure that the following prerequisites are met:
|
|
|
|
1. **Conda Installation**: Ensure that Conda is installed on your local system. Follow the [official Conda installation guide](https://docs.conda.io/projects/conda/en/latest/user-guide/install/index.html) if not already installed.
|
|
|
|
2. **PBS Job Scheduler**: The deployment scripts (`deploy-dask.sh` and `dask-worker.sh`) are designed for use with the PBS job scheduler. Modify accordingly if using a different job scheduler.
|
|
|
|
3. **SSH Setup**: Ensure that SSH is set up and configured on your system for remote server communication.
|
|
|
|
## 1. create-env.sh
|
|
|
|
### Overview
|
|
|
|
`create-env.sh` is designed to create a Conda environment. It checks for the existence of the specified environment and either creates it or notifies the user if it already exists.
|
|
Note: Define your Conda environment in `environment.yaml` before running this script.
|
|
|
|
### Usage
|
|
|
|
```bash
|
|
./create-env.sh <conda_environment_name>
|
|
```
|
|
|
|
### Note
|
|
|
|
- This script is intended to run on a local system where Conda is installed.
|
|
|
|
## 2. deploy-env.sh
|
|
|
|
### Overview
|
|
|
|
`deploy-env.sh` is responsible for deploying the Conda environment to a remote server. If the tar.gz file already exists, it is copied; otherwise, it is created before being transferred.
|
|
|
|
### Usage
|
|
|
|
```bash
|
|
./deploy-env.sh <environment_name> <destination_directory>
|
|
```
|
|
|
|
### Note
|
|
|
|
- This script is intended to run on a local system.
|
|
|
|
## 3. deploy-dask.sh
|
|
|
|
### Overview
|
|
|
|
`deploy-dask.sh` initiates the Dask cluster on an HPC environment using the PBS job scheduler. It extracts the Conda environment, activates it, and starts the Dask scheduler and workers on allocated nodes.
|
|
|
|
### Usage
|
|
|
|
```bash
|
|
./deploy-dask.sh <current_workspace_directory>
|
|
```
|
|
|
|
### Notes
|
|
|
|
- This script is designed for an HPC environment with PBS job scheduling.
|
|
- Modifications may be necessary for different job schedulers.
|
|
|
|
## 4. dask-worker.sh
|
|
|
|
### Overview
|
|
|
|
`dask-worker.sh` is a worker script designed to be executed on each allocated node. It sets up the Dask environment, extracts the Conda environment, activates it, and starts the Dask worker to connect to the scheduler. This script is not directly executed by the user.
|
|
|
|
### Notes
|
|
|
|
- Execute this script on each allocated node to connect them to the Dask scheduler.
|
|
- Designed for use with PBS job scheduling.
|
|
|
|
## Workflow
|
|
|
|
1. **Create Conda Environment**: Execute `create-env.sh` to create a Conda environment locally.
|
|
2. **Deploy Conda Environment**: Execute `deploy-env.sh` to deploy the Conda environment to a remote server.
|
|
3. **Deploy Dask Cluster**: Execute `deploy-dask.sh` to start the Dask cluster on an HPC environment. |