A cross-architecture Docker image with Miniconda for Python development
Go to file
Kerem Kayabay ca7319139c add .whl 2024-05-08 11:59:19 +02:00
envs Remove conda-pack 2024-05-03 07:37:24 +00:00
.gitignore add .whl 2024-05-08 11:59:19 +02:00
README.md added comments regarding environment variables for applications that depend on specific paths 2024-04-24 08:56:13 +02:00
build_and_pack_env.sh deploy-env.sh deploys a custom conda env to the HOME directory 2024-02-23 17:55:57 +03:00
deploy-env.sh deploy-env.sh deploys a custom conda env to the HOME directory 2024-02-23 17:55:57 +03:00
filename_extractor.sh Docker image builds and packs the environment from yaml files 2024-02-23 11:30:19 +03:00
miniconda-rockylinux.dockerfile Docker image builds and packs the environment from yaml files 2024-02-23 11:30:19 +03:00

README.md

Miniconda Env Builder Docker Image for amd64 Architecture

Project Objective

The purpose of this repository is to streamline the creation and management of Conda environments compatible with the target HLRS deployment environment. It utilizes a Docker image based on Rocky Linux 8.8, featuring Miniconda, and optimized for the amd64 architecture. This setup facilitates the creation of functional and consistent Conda environments across various architectures, including on development machines like Apple Silicon Macs.

Note: If you only need the main and r channels when building the Conda environment, they are already accessible through the Conda module on the clusters. There's no need to use this repo for building and transferring environments with custom packages.

Prerequisites

Before you begin, ensure you have Docker installed on your machine. If you are using an Apple Silicon Mac, Docker Desktop should be configured to support multi-architecture builds, which is included out of the box with recent versions.

Building the Docker Image

To build the Docker image, follow these steps:

  1. Clone the Repository

First, clone this repository to your local machine using Git:

git clone <repository-url>
cd <repository-directory>

Replace <repository-url> with the URL of the Git repository and <repository-directory> with the name of the directory into which the repository is cloned.

  1. Build the Docker Image

Run the following command in the terminal from the root of the cloned repository:

docker build -f miniconda-rockylinux.dockerfile --platform=linux/amd64 -t miniconda-rockylinux:latest .

This command builds a Docker image named miniconda-rockylinux with the latest tag, specifying the target platform as linux/amd64. Ensure Docker's buildx feature is enabled for cross-platform builds.

Building and Packing a Conda Environment

This Docker image includes a utility script, build_and_pack_env.sh, that automates the process of creating a Conda environment from a YAML file, packing it using conda-pack, and preparing it for transfer to a cluster.

Using the Script

  1. Prepare Your Environment YAML File: Ensure you have a YAML file describing your Conda environment. This file should list all the packages and versions you want to include.

  2. Run the Docker Container with Volume Mount: Run the Docker container, mounting the directory containing your environment YAML file. Replace <path-to-your-yaml-file> with the actual path to your YAML file:

docker run --rm -it -v <path-to-your-yaml-file>:/envs --workdir /envs miniconda-rockylinux:latest
  1. Execute the Script Inside the Container: Once inside the container, run the build_and_pack_env.sh script with your YAML file as an argument. Replace your_environment.yml with the name of your environment file:
build_and_pack_env.sh your_environment.yaml

The script will create the environment, pack it, and output a .tar.gz file that you can transfer to your cluster.

Notes

  • The packed environment file will be saved in the same directory as the original YAML file.
  • Ensure the volume mount (-v) option correctly maps the local directory containing your YAML file to the /envs directory inside the container.

Unpacking and Activating the Environment

  1. Transfer the packed environment: Use scp to transfer the packed environment file and deploy-env.sh to the remote server:
scp deploy-env.sh user@remote-server:~/deploy-env.sh
scp envs/your_environment.tar.gz user@remote-server:~/your_environment.tar.gz
  1. SSH into the remote server
ssh user@remote-server
  1. Deploy the environment
chmod +x deploy-env.sh
./deploy-env.sh your_environment.tar.gz
rm your_environment.tar.gz
  1. Test the environment using a Hawk compute node
# Request an interactive job allocation for 5 minutes on a CPU node
qsub -I -l select=1:node_type=rome -l walltime=00:05:00

# Load the Conda module
module load bigdata/conda
source activate # activates the base environment

# List available Conda environments for verification purposes
conda env list

# Activate a specific Conda environment.
conda activate your_environment # you need to execute `source activate` first, or use `source [ENV_PATH]/bin/activate`

# Ensure environment variables are set correctly for the custom Conda environment
# This is critical for applications that depend on specific paths within the Conda environment

# Add any additional commands you want to run within the Conda environment below this line.

Important Notice Regarding Filesystem Space

Conda environments, especially those with many dependencies or large packages, can consume a significant amount of disk space. Given that the file system space on HOME is limited by a quota (50 GB per user, 200 GB per group), it's important to manage the space used by Conda environments carefully.

Recommendations:

  • Monitor Disk Usage: Regularly check the disk space used by your Conda environments. You can use commands like du -sh ~/.conda/envs/* to see the space used by each environment.
  • Clean Up Regularly: Remove unused environments with conda env remove --name <env-name>. Also, consider using conda clean --all to remove unused packages and caches.
  • Environment Management: Be strategic about the packages included in your environments to minimize space usage. Only include necessary packages and consider lighter alternatives for heavy dependencies.