Skip to main content

Nvidia Enroot

Enroot is a container engine developed by NVIDIA that is optimized for running unprivileged containers on NVIDIA GPU hardware in HPC environments (such as terrabyte’s GPU nodes). One of Enroot's key features is its built-in GPU support using libnvidia-container, which allows automatic configuration of containers to leverage the underlying NVIDIA GPU hardware. Additionally, Enroot works with a SLURM plugin called Pyxis, enabling the use of srun commands with Enroot containers and supporting multi-node MPI jobs. This makes Enroot a versatile container runtime for both GPU and non-GPU jobs.

Enroot container images can be built from existing Dockerfiles or Docker images with a set of simple commands in a terminal window. Additionally, images can be obtained from the the NVIDIA NGC Cloud (which are in fact also Docker container images).

If you have never worked before with containers and/or Docker, please check out the fabulous and very easy to follow Docker tutorials of our colleagues from DFD-LAX that provide you with all the necessary knowledge in just a few minutes (access only for DLR-EOC). For a more detailed but also easy to follow documentation, see the official Docker docs.

Load Enroot on terraybte

Attention

Unlike Charliecloud, Enroot is not available as a software module, but is preinstalled as part of the default software environment on all CPU and GPU nodes. Hence, Enroot is NOT available on the login node, which makes it necessary to launch an interactive job on the target nodes (i.e. either the CPU or the GPU nodes) to make direct use of the tool.

# Run interactive job (adapt to your needs!)
salloc --cluster=hpda2 --partition=hpda2_compute_gpu --nodes=1 --ntasks-per-node=1 --gres=gpu:1 --time=02:00:00 srun --pty bash -i

# Check if Enroot is available and get an overview of available commands
enroot

Usage: enroot COMMAND [ARG...]

Command line utility for manipulating container sandboxes.

Commands:
batch [options] [--] CONFIG [COMMAND] [ARG...]
bundle [options] [--] IMAGE
create [options] [--] IMAGE
exec [options] [--] PID COMMAND [ARG...]
export [options] [--] NAME
import [options] [--] URI
list [options]
remove [options] [--] NAME...
start [options] [--] NAME|IMAGE [COMMAND] [ARG...]
version

Generate an Enroot image from Docker registry URL

You can easily create an Enroot image from an existing image on Docker Hub. The newly created Enroot image has the same name as the imported image but with the .sqsh extension. This image can then be used to create Enroot containers. Container images are imported into the current working directory.

# Import Docker image into an Enroot image
enroot import docker://hello-world:latest

Generate an Enroot image from NVIDIA NGC Cloud registry URL or any other public docker repository

The catalogue of available Nvidia NGC container images can be consulted here: https://ngc.nvidia.com/catalog/containers. To import these container images, you need an API key, which is associated to your Nvidia NGC account. You can generate your API key here: https://ngc.nvidia.com/setup/api-key. Before generating the API key, you will be redirected to the Nvidia login prompt. If you don't have an Nvidia account yet, you can create a new account from this prompt by clicking on More Options -> Login-Hilfe -> Konto Erstellen.

To configure Enroot for using your API key, create the file enroot/.credentials within your $HOME and append the following line to it:

machine nvcr.io login $oauthtoken password API_KEY
machine authn.nvidia.com login $oauthtoken password API_KEY

where API_KEY is the key generated as described above.

After doing this, you can import container images from Nvidia NGC. The container image will be written into the current working directory. For example, the latest TensorFlow container can be imported with the following command:

enroot import docker://nvcr.io#nvidia/tensorflow:24.05-tf2-py3
Attention

Make sure to add docker:// and # to the container image links provided in the Nvidia NGC Catalog or any other public Docker repository!

Example for importing a current gdal image from the public OSGeo repository (no API key needed):

enroot import docker://ghcr.io#osgeo/gdal:ubuntu-small-latest

Generate an Enroot image from an existing Docker image

If you have a Docker image on your local machine, you can convert it into an Enroot image .sqh and transfer it to the HPDA terrabyte HPC system:

  1. Install Enroot on your system (check for latest version at https://github.com/NVIDIA/enroot/blob/master/doc/installation.md):
# Debian-based distributions
arch=$(dpkg --print-architecture)
curl -fSsL -O https://github.com/NVIDIA/enroot/releases/download/v3.5.0/enroot_3.5.0-1_${arch}.deb
sudo apt install -y ./*.deb

# RHEL-based distributions
arch=$(uname -m)
sudo dnf install -y epel-release # required on some distributions
sudo dnf install -y https://github.com/NVIDIA/enroot/releases/download/v3.5.0/enroot-3.5.0-1.el8.${arch}.rpm
  1. Fetch Docker image from your local Docker deamon and convert it into an Enroot container image. The image will be written into $HOME/enroot. You can specify the output file name with the -o flag.
enroot import -o gdal_image_new.sqsh dockerd://gdal_4.4.1
scp gdal_image_new.sqsh <USER_ID>@login.terrabyte.lrz.de:/path/on/the/dss

Convert an Enroot image into a runnable Enroot container

Once you have an Enroot image, you need to create an Enroot container for running your application within it. For this, the .sqh file will be unpacked and converted into a root filesystem. The filesystem will be written into $HOME/enroot/name_of_container

enroot create --name hello_world enroot/hello-world+latest.sqsh

Start a container and install custom packages

You can easily install additional software into an existing container:

#Run the container as fakeroot and make file system writable
enroot start --root --rw gdal

#Install the software
apt-get update
apt-get install python3-pip
exit

#Export the modified enroot container as an enroot container image.
enroot export -o gdal_modifyed.sqsh gdal

Run a command in a container

See https://github.com/NVIDIA/enroot/blob/master/doc/cmd/start.md for more examples.

enroot start --rw gdal echo "hallo" 
enroot start --env TEST="Hallo" --mount /dss:/dss --rw gdal bash -c 'ls /dss && echo $TEST'

Run the container in a SLURM job

Here is an example of how to run the container in a SLURM job. Since enroot is preinstalled on alle nodes, it does not have to be loaded via the module command.

#!/bin/bash

#SBATCH -J enroot_test
#SBATCH -o enroot_test.out
#SBATCH -e enroot_test.err
#SBATCH -D ./
#SBATCH --get-user-env
#SBATCH --clusters=hpda2
#SBATCH --partition=hpda2_compute_gpu
#SBATCH --nodes=1
#SBATCH --ntasks-per-node=1
#SBATCH --gres=gpu:1
#SBATCH --mem=50gb
#SBATCH --mail-type=fail
#SBATCH --mail-user=example.user@dlr.de
#SBATCH --export=NONE
#SBATCH --time=00:10:00
#SBATCH --account=hpda-c

enroot start --env TEST="Hallo" --mount /dss:/dss --rw gdal bash -c 'ls /dss && echo $TEST && sleep=1m'

#enroot start --mount /dss:/dss --rw pytorch_container python /path/to/your/python/script.py