# Dask: How to execute python workloads using a Dask cluster on Vulcan Wiki link: Motivation: This document aims to show users how to launch a Dask cluster in our compute platforms and perform a simple workload using it. Structure: - [ ] [Tutorial](https://diataxis.fr/tutorials/) - [x] [How-to guide](https://diataxis.fr/how-to-guides/) - [ ] [Reference](https://diataxis.fr/reference/) - [ ] [Explanation](https://diataxis.fr/explanation/) To do: - [x] Made scripts for environment creation and deployment in the folder `local_scripts` - [x] Changed scripts to `deployment_scripts` - [x] Added step about sending python file --- This repository looks at a deployment of a Dask cluster on Vulcan, and executing your programs using this cluster. ## Table of Contents - [Prerequisites](#prerequisites) - [Getting Started](#getting-started) - [Usage](#usage) - [Notes](#notes) ## Prerequisites Before running the application, make sure you have the following prerequisites installed in a conda environment: - [Python 3.8.18](https://www.python.org/downloads/release/python-3818/): This specific python version is used for all uses, you can select it using while creating the conda environment. For more information on, look at the documentation for Conda on [HLRS HPC systems](https://kb.hlrs.de/platforms/index.php/How_to_move_local_conda_environments_to_the_clusters). - [Conda Installation](https://docs.conda.io/projects/conda/en/latest/user-guide/install/index.html): Ensure that Conda is installed on your local system. For more information on, look at the documentation for Conda on [HLRS HPC systems](https://kb.hlrs.de/platforms/index.php/How_to_move_local_conda_environments_to_the_clusters). - [Dask](https://dask.org/): Install Dask using conda. - [Conda Pack](https://conda.github.io/conda-pack/): Conda pack is used to package the Conda environment into a single tarball. This is used to transfer the environment to Vulcan. ## Getting Started 1. Clone [this repository](https://code.hlrs.de/hpcrsaxe/spark_template) to your local machine: ```bash git clone ``` 2. Go into the direcotry and create an environment using Conda and enirvonment.yaml. Note: Be sure to add the necessary packages in environemnt.yaml: ```bash ./deployment_scripts/create-env.sh ``` 3. Send all files using `deploy-env.sh`: ```bash ./deployment_scripts/deploy-env.sh : ``` 4. Send all the code to the appropriate directory on Vulcan using `scp`: ```bash scp .py : ``` 5. SSH into Vulcan and start a job interatively using: ```bash qsub -I -N DaskJob -l select=4:node_type=clx-21 -l walltime=02:00:00 ``` 6. Go into the directory with all code: ```bash cd ``` 7. Initialize the Dask cluster: ```bash source deploy-dask.sh "$(pwd)" ``` Note: At the moment, the deployment is verbose, and there is no implementation to silence the logs. Note: Make sure all permissions are set using `chmod +x` for all scripts. ## Usage To run the application interactively, execute the following command after all the cluster's nodes are up and running: ```bash python ``` Or to run a full script: ```bash python .py ``` Note: If you don't see your environment in the python interpretor, then manually activate it using: ```bash conda activate ``` Do this before using the python interpretor. ## Notes Note: Dask Cluster is set to verbose, add the following to your code while connecting to the Dask cluster: ```python client = Client(..., silence_logs='error') ``` Note: Replace all filenames within `<>` with the actual values applicable to your project.