Nvidia Enroot
Enroot is a container engine developed by NVIDIA that is optimized for running unprivileged containers on NVIDIA GPU hardware in HPC environments (such as terrabyte’s GPU nodes). One of Enroot's key features is its built-in GPU support using libnvidia-container, which allows automatic configuration of containers to leverage the underlying NVIDIA GPU hardware. Additionally, Enroot works with a SLURM plugin called Pyxis, enabling the use of srun commands with Enroot containers and supporting multi-node MPI jobs. This makes Enroot a versatile container runtime for both GPU and non-GPU jobs.
Enroot container images can be built from existing Dockerfiles or Docker images with a set of simple commands in a terminal window. Additionally, images can be obtained from the the NVIDIA NGC Cloud (which are in fact also Docker container images).
If you have never worked before with containers and/or Docker, please check out the fabulous and very easy to follow Docker tutorials of our colleagues from DFD-LAX that provide you with all the necessary knowledge in just a few minutes (access only for DLR-EOC). For a more detailed but also easy to follow documentation, see the official Docker docs.
Load Enroot on terraybte
Unlike Charliecloud, Enroot is not available as a software module, but is preinstalled as part of the default software environment on all CPU and GPU nodes. Hence, Enroot is NOT available on the login node, which makes it necessary to launch an interactive job on the target nodes (i.e. either the CPU or the GPU nodes) to make direct use of the tool.
# Run interactive job (adapt to your needs!)
salloc --cluster=hpda2 --partition=hpda2_compute_gpu --nodes=1 --ntasks-per-node=1 --gres=gpu:1 --time=02:00:00 srun --pty bash -i
# Check if Enroot is available and get an overview of available commands
enroot
Usage: enroot COMMAND [ARG...]
Command line utility for manipulating container sandboxes.
Commands:
batch [options] [--] CONFIG [COMMAND] [ARG...]
bundle [options] [--] IMAGE
create [options] [--] IMAGE
exec [options] [--] PID COMMAND [ARG...]
export [options] [--] NAME
import [options] [--] URI
list [options]
remove [options] [--] NAME...
start [options] [--] NAME|IMAGE [COMMAND] [ARG...]
version
Generate an Enroot image from Docker registry URL
You can easily create an Enroot image from an existing image on Docker Hub. The newly created Enroot image has the same name as the imported image but with the .sqsh extension. This image can then be used to create Enroot containers. Container images are imported into the current working directory.
# Import Docker image into an Enroot image
enroot import docker://hello-world:latest
Generate an Enroot image from NVIDIA NGC Cloud registry URL or any other public docker repository
The catalogue of available Nvidia NGC container images can be consulted here: https://ngc.nvidia.com/catalog/containers. To import these container images, you need an API key, which is associated to your Nvidia NGC account. You can generate your API key here: https://ngc.nvidia.com/setup/api-key. Before generating the API key, you will be redirected to the Nvidia login prompt. If you don't have an Nvidia account yet, you can create a new account from this prompt by clicking on More Options -> Login-Hilfe -> Konto Erstellen.
To configure Enroot for using your API key, create the file enroot/.credentials within your $HOME and append the following line to it:
machine nvcr.io login $oauthtoken password API_KEY
machine authn.nvidia.com login $oauthtoken password API_KEY
where API_KEY is the key generated as described above.
After doing this, you can import container images from Nvidia NGC. The container image will be written into the current working directory. For example, the latest TensorFlow container can be imported with the following command:
enroot import docker://nvcr.io#nvidia/tensorflow:24.05-tf2-py3
Make sure to add docker:// and # to the container image links provided in the Nvidia NGC Catalog or any other public Docker repository!
Example for importing a current gdal image from the public OSGeo repository (no API key needed):
enroot import docker://ghcr.io#osgeo/gdal:ubuntu-small-latest
Generate an Enroot image from an existing Docker image
If you have a Docker image on your local machine, you can convert it into an Enroot image .sqh and transfer it to the HPDA terrabyte HPC system:
- Install Enroot on your system (check for latest version at https://github.com/NVIDIA/enroot/blob/master/doc/installation.md):
# Debian-based distributions
arch=$(dpkg --print-architecture)
curl -fSsL -O https://github.com/NVIDIA/enroot/releases/download/v3.5.0/enroot_3.5.0-1_${arch}.deb
sudo apt install -y ./*.deb
# RHEL-based distributions
arch=$(uname -m)
sudo dnf install -y epel-release # required on some distributions
sudo dnf install -y https://github.com/NVIDIA/enroot/releases/download/v3.5.0/enroot-3.5.0-1.el8.${arch}.rpm
- Fetch Docker image from your local Docker deamon and convert it into an Enroot container image. The image will be written into $HOME/enroot. You can specify the output file name with the -o flag.
enroot import -o gdal_image_new.sqsh dockerd://gdal_4.4.1
scp gdal_image_new.sqsh <USER_ID>@login.terrabyte.lrz.de:/path/on/the/dss
Convert an Enroot image into a runnable Enroot container
Once you have an Enroot image, you need to create an Enroot container for running your application within it. For this, the .sqh file will be unpacked and converted into a root filesystem. The filesystem will be written into $HOME/enroot/name_of_container
enroot create --name hello_world enroot/hello-world+latest.sqsh
Start a container and install custom packages
You can easily install additional software into an existing container:
#Run the container as fakeroot and make file system writable
enroot start --root --rw gdal
#Install the software
apt-get update
apt-get install python3-pip
exit
#Export the modified enroot container as an enroot container image.
enroot export -o gdal_modifyed.sqsh gdal
Run a command in a container
See https://github.com/NVIDIA/enroot/blob/master/doc/cmd/start.md for more examples.
enroot start --rw gdal echo "hallo"
enroot start --env TEST="Hallo" --mount /dss:/dss --rw gdal bash -c 'ls /dss && echo $TEST'
Run the container in a SLURM job
Here is an example of how to run the container in a SLURM job. Since enroot is preinstalled on alle nodes, it does not have to be loaded via the module command.
#!/bin/bash
#SBATCH -J enroot_test
#SBATCH -o enroot_test.out
#SBATCH -e enroot_test.err
#SBATCH -D ./
#SBATCH --get-user-env
#SBATCH --clusters=hpda2
#SBATCH --partition=hpda2_compute_gpu
#SBATCH --nodes=1
#SBATCH --ntasks-per-node=1
#SBATCH --gres=gpu:1
#SBATCH --mem=50gb
#SBATCH --mail-type=fail
#SBATCH --mail-user=example.user@dlr.de
#SBATCH --export=NONE
#SBATCH --time=00:10:00
#SBATCH --account=hpda-c
enroot start --env TEST="Hallo" --mount /dss:/dss --rw gdal bash -c 'ls /dss && echo $TEST && sleep=1m'
#enroot start --mount /dss:/dss --rw pytorch_container python /path/to/your/python/script.py