Skip to main content

Charliecloud (GPU)

For a container image to be able to make use of GPUs, the necessary GPU libraries and tools, which must be identical to the version installed on the GPU node, have to be installed/injected into a charliecloud image.

This would potentially have to be (re)done whenever the installed version of the libraries changes on the terrabyte nodes (expected to happen maybe once a year).

In its newer versions, charliecloud contains the tool ch-fromhost --nvidia to detect the files needed to inject.

We provide a script that automates the injection of the necessary files into an existing charliecloud image. The script generates a new charliecloud image that can then be used on the GPU nodes.

GPU node only

These scripts/commands have to be run on one of the GPU nodes! Only these nodes have the libraries and tools we have to inject. So either run it in an interactive SLURM job on one of the GPU partitions or from a Jupyterlab session running on a GPU partition.

GPU Enabled Image

/dss/dsstbyfs01/pn56su/pn56su-dss-0020/usr/local/bin/injectNvidiaDriverIntoCharliecloudSqfs.sh <Path2yourExistingCharliecloudImage> [<Path2NewGPUEnabledCharliecloudImage>]

You could also run it manually via:

GPU Enabled Image manually

inputCharliecloudImage=<Path2yourExistingCharliecloudImage>
outCharliecloudImage=<Path2NewGPUEnabledCharliecloudImage>

if [ ! -f /usr/bin/nvidia-container-cli ]
then
echo "Error this script has to run on a GPU Node which has the Nvidia drivers and nvidia-container-cli installed to inject the correct Nvidia drivers. Restart it on a GPU Node"
exit 1
fi

# load charlicloud in DLR Version with sqfs Support
echo "Loading Charliecloud module"
module use /dss/dsstbyfs01/pn56su/pn56su-dss-0020/usr/share/modules/files/
module load charliecloud

if [ ! -e "$inputCharliecloudImage" ]
then
echo "Error Input File $inputCharliecloudImage does not exist. Exiting"
exit 2
fi

if [ -f $outCharliecloudImage ]
then
echo "Error Output File $outCharliecloudImage exists. Delete and rerun to recreate"
exit 3
fi

intermediateImage=$(basename $inputCharliecloudImage .sqfs)

# convert to writable intermediate image onSSD
echo "Converting to intermediate Filesystem Version for injection"
mkdir -p /tmp/${USER}_chNvidia
ch-convert $inputCharliecloudImage /tmp/${USER}_chNvidia/$intermediateImage

# inject NVIDIA Drivers
echo "Injecting Nvidia driver into image"
ch-fromhost --nvidia /tmp/${USER}_chNvidia/$intermediateImage

echo "Converting to target image $outCharliecloudImage"

ch-convert /tmp/${USER}_chNvidia/$intermediateImage $outCharliecloudImage && touch ${outCharliecloudImage/sqfs/successful}

echo "Cleaning up intermediate Filesystem Image"
rm -r /tmp/${USER}_chNvidia/$intermediateImage
rmdir /tmp/${USER}_chNvidia

Enable NVIDIA Persistence MODE

In some use cases it is essential to have the NVIDIA Persistence Mode Enabled inside the container. To make Persistence accessible please map the necessary socket into your charliecloud container:

--bind=/var/run/nvidia-persistenced/.:/run/nvidia-persistenced