forked from SiVeGCS/dask_template
101 lines
No EOL
3.3 KiB
Markdown
101 lines
No EOL
3.3 KiB
Markdown
# Dask: How to execute python workloads using a Dask cluster on Vulcan
|
|
|
|
Wiki link:
|
|
|
|
Motivation: This document aims to show users how to launch a Dask cluster in our compute platforms and perform a simple workload using it.
|
|
|
|
Structure:
|
|
- [ ] [Tutorial](https://diataxis.fr/tutorials/)
|
|
- [x] [How-to guide](https://diataxis.fr/how-to-guides/)
|
|
- [ ] [Reference](https://diataxis.fr/reference/)
|
|
- [ ] [Explanation](https://diataxis.fr/explanation/)
|
|
|
|
To do:
|
|
- [x] Made scripts for environment creation and deployment in the folder `local_scripts`
|
|
- [x] Changed scripts to `deployment_scripts`
|
|
|
|
---
|
|
|
|
This repository looks at a deployment of a Dask cluster on Vulcan, and executing your programs using this cluster.
|
|
|
|
## Table of Contents
|
|
- [Prerequisites](#prerequisites)
|
|
- [Getting Started](#getting-started)
|
|
- [Usage](#usage)
|
|
- [Notes](#notes)
|
|
|
|
## Prerequisites
|
|
|
|
Before running the application, make sure you have the following prerequisites installed in a conda environment:
|
|
- [Python 3.8.18](https://www.python.org/downloads/release/python-3818/): This specific python version is used for all uses, you can select it using while creating the conda environment. For more information on, look at the documentation for Conda on [HLRS HPC systems](https://kb.hlrs.de/platforms/index.php/How_to_move_local_conda_environments_to_the_clusters).
|
|
- [Conda Installation](https://docs.conda.io/projects/conda/en/latest/user-guide/install/index.html): Ensure that Conda is installed on your local system. Follow the [official Conda installation guide] if not already installed.
|
|
- [Dask](https://dask.org/): Install Dask using conda.
|
|
- [Conda Pack](https://conda.github.io/conda-pack/): Conda pack is used to package the Conda environment into a single tarball. This is used to transfer the environment to Vulcan.
|
|
|
|
## Getting Started
|
|
|
|
1. Clone this repository to your local machine:
|
|
|
|
```bash
|
|
git clone <repository_url>
|
|
```
|
|
|
|
2. Create an environment using Conda and enirvonment.yaml:
|
|
|
|
```bash
|
|
./deployment_scripts/create-env.sh <your-env>
|
|
```
|
|
|
|
3. Send all files using `deploy-env.sh`:
|
|
|
|
```bash
|
|
./deployment_scripts/deploy-env.sh <your-env> <destination_host>:<destination_directory>
|
|
```
|
|
|
|
4. SSH into Vulcan and start a job interatively using:
|
|
|
|
```bash
|
|
qsub -I -N DaskJob -l select=4:node_type=clx-21 -l walltime=02:00:00
|
|
```
|
|
|
|
5. Go into the directory will all code:
|
|
|
|
```bash
|
|
cd <destination_directory>
|
|
```
|
|
|
|
6. Initialize the Dask cluster:
|
|
|
|
```bash
|
|
source deploy-dask.sh "$(pwd)"
|
|
```
|
|
Note: At the moment, the deployment is verbose, and there is no implementation to silence the logs.
|
|
Note: Make sure all permissions are set using `chmod +x` for all scripts.
|
|
|
|
## Usage
|
|
|
|
To run the application interactively, execute the following command after all the cluster's nodes are up and running:
|
|
|
|
```bash
|
|
python
|
|
```
|
|
|
|
Or to run a full script:
|
|
```bash
|
|
python <your-script>.py
|
|
```
|
|
|
|
Note: If you don't see your environment in the python interpretor, then manually activate it using:
|
|
```bash
|
|
conda activate <your-env>
|
|
```
|
|
Do this before using the python interpretor.
|
|
|
|
## Notes
|
|
|
|
Note: Dask Cluster is set to verbose, add the following to your code while connecting to the Dask cluster:
|
|
```python
|
|
client = Client(..., silence_logs='error')
|
|
```
|
|
|
|
Note: Replace all filenames within `<>` with the actual values applicable to your project. |