Template repository for Dask workflows on HLRS HPC
Find a file
2023-12-07 10:26:25 +01:00
__pycache__ first commit 2023-12-07 10:26:25 +01:00
deployment_scripts first commit 2023-12-07 10:26:25 +01:00
src first commit 2023-12-07 10:26:25 +01:00
dask_test.ipynb first commit 2023-12-07 10:26:25 +01:00
README.md first commit 2023-12-07 10:26:25 +01:00

Dask: How to execute python workloads using a Dask cluster on Vulcan

Wiki link:

Motivation: This document aims to show users how to launch a Dask cluster in our compute platforms and perform a simple workload using it.

Structure:

To do:

  • Made scripts for environment creation and deployment in the folder local_scripts
  • Changed scripts to deployment_scripts

This repository looks at a deployment of a Dask cluster on Vulcan, and executing your programs using this cluster.

Table of Contents

Prerequisites

Before running the application, make sure you have the following prerequisites installed in a conda environment:

  • Python 3.8.18: This specific python version is used for all uses, you can select it using while creating the conda environment. For more information on, look at the documentation for Conda on HLRS HPC systems.
  • Conda Installation: Ensure that Conda is installed on your local system. Follow the [official Conda installation guide] if not already installed.
  • Dask: Install Dask using conda.
  • Conda Pack: Conda pack is used to package the Conda environment into a single tarball. This is used to transfer the environment to Vulcan.

Getting Started

  1. Clone this repository to your local machine:

    git clone <repository_url>
    
  2. Create an environment using Conda and enirvonment.yaml:

    ./deployment_scripts/create-env.sh <your-env>
    
  3. Send all files using deploy-env.sh:

    ./deployment_scripts/deploy-env.sh <your-env> <destination_host>:<destination_directory>
    
  4. SSH into Vulcan and start a job interatively using:

    qsub -I -N DaskJob -l select=4:node_type=clx-21 -l walltime=02:00:00
    
  5. Go into the directory will all code:

    cd <destination_directory>
    
  6. Initialize the Dask cluster:

    source deploy-dask.sh "$(pwd)"
    

    Note: At the moment, the deployment is verbose, and there is no implementation to silence the logs. Note: Make sure all permissions are set using chmod +x for all scripts.

Usage

To run the application interactively, execute the following command after all the cluster's nodes are up and running:

python

Or to run a full script:

python <your-script>.py

Note: If you don't see your environment in the python interpretor, then manually activate it using:

conda activate <your-env>

Do this before using the python interpretor.

Notes

Note: Dask Cluster is set to verbose, add the following to your code while connecting to the Dask cluster:

client = Client(..., silence_logs='error')

Note: Replace all filenames within <> with the actual values applicable to your project.