dask_template/deployment_scripts/deployment_scripts_reference.md

2.2 KiB

Reference Guide: Dask Cluster Deployment Scripts

Overview

This repository contains a set of bash scripts designed to streamline the deployment and management of a Dask cluster on a high-performance computing (HPC) environment. These scripts facilitate the creation of Conda environments, deployment of the environment to a remote server, and initiation of Dask clusters on distributed systems. Below is a comprehensive guide on how to use and understand each script:

Note: Permissions

Ensure that execution permissions (chmod +x) are granted to these scripts before attempting to run them. This can be done using the following command:

chmod +x script_name.sh

Prerequisites

Before using these scripts, ensure that the following prerequisites are met:

  1. Conda Installation: Ensure that Conda is installed on your local system. Follow the official Conda installation guide if not already installed.

  2. PBS Job Scheduler: The deployment scripts (deploy-dask.sh and dask-worker.sh) are designed for use with the PBS job scheduler. Modify accordingly if using a different job scheduler.

  3. SSH Setup: Ensure that SSH is set up and configured on your system for remote server communication.

1. deploy-dask.sh

Overview

deploy-dask.sh initiates the Dask cluster on an HPC environment using the PBS job scheduler. It extracts the Conda environment, activates it, and starts the Dask scheduler and workers on allocated nodes.

Usage

./deploy-dask.sh <current_workspace_directory>

Notes

  • This script is designed for an HPC environment with PBS job scheduling.
  • Modifications may be necessary for different job schedulers.

2. dask-worker.sh

Overview

dask-worker.sh is a worker script designed to be executed on each allocated node. It sets up the Dask environment, extracts the Conda environment, activates it, and starts the Dask worker to connect to the scheduler. This script is not directly executed by the user.

Notes

  • Execute this script on each allocated node to connect them to the Dask scheduler.
  • Designed for use with PBS job scheduling.