Introduction
Clusters
The HPDA terrabyte compute infrastructure provides two clusters optimized for different use cases. The CPU cluster offers a large amount of cores for precise, flexible and fast parallel processing and can be used for a wide range of tasks. The GPU cluster is optimized for ultra-fast processing of large amounts of data and is especially suited for machine learning tasks. Both clusters are hosted at the LRZ (Leibniz Supercomputing Centre) in Garching near Munich.
CPU cluster | GPU cluster | |
---|---|---|
Number of nodes | 262 | 47 |
Number of CPUs per node | 2 | 2 |
Number of cores per CPU | 40 (80 Hyperthreads) | 24 (48 Hyperthreads) |
Number of GPUs per Node | 0 | 4 |
CPU Type | Intel Xeon Platinum 8380 40C 270W 2.3GHz | Intel Xeon Gold 6336Y 24C 185W 2.4GHz |
GPU Type | N/A | NVIDIA HGX A100 80GB 500W |
RAM per node | 1024 GByte | (1024 + 320) GByte |
Bandwidth to Infiniband HDR per node | 200 GBit/s | 200 GBit/s |
LINPACK computing power per node | 4,5 TFlop/s | 68,5 TFlop/s |
Memory bandwidth per node | 409,6 GByte/s | (409,6 + 8156) GByte/s |
CPU-Clusters
The HPDA terrabyte CPU-clusters consists of several partitions. While some of the cluster's partitions are reserved for internal services and testing, two of them are currently available for the public:
Cluster specifications | Limits | |||||||
---|---|---|---|---|---|---|---|---|
Cluster | Partition | Nodes in partition | CPU Cores and Hyperthreads per node | Typical job type | Node range per job min - max | Maximum runtime (hours) | CPU Cores and Hyperthreads | Memory limit (GByte) |
Cluster system: Intel Xeon Platinum 8380 40C 270W 2.3GHz nodes with Infiniband interconnect and 2 hardware threads per physical core | ||||||||
hpda2 | hpda2_compute | 53 | 80 Cores 160 Hyperthreads | 1-53 | 240* | - | 1024 per node | |
hpda2_test | 2 | Do not run production jobs! | 1-1 | 2 | 80 Cores 160 Hyperthreads | |||
hpda2_jupyter | 2 | 1-1 | 48 | 4 Cores 8 Hyperthreads |
* If your job is too long for the maximum runtime of the cluster, you can implement auto-requeuing of the job in your SLURM job script.
GPU-Cluster
Cluster specifications | Limits | ||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|
Cluster | Partition | Nodes in partition | GPUs per node | CPU Cores and Hyperthreads per node | Typical job type | Node range per job min - max | Maximum runtime (hours) | GPUs | Memory limit GPU (GByte) | CPU Cores and Hyperthreads | Memory limit CPU (GByte) |
Cluster system: NVIDIA HGX A100 80GB 500W GPU nodes with Intel Xeon Gold 6336Y 24C 185W 2.4GHz CPUs (with Infiniband interconnect and 2 hardware threads per physical core) | |||||||||||
hpda2 | hpda2_compute_gpu | 14 | 4 | 48 Cores 96 Hyperthreads | 1-12 | 240* | - | 320 per node | - | 1024 per node | |
hpda2_testgpu | 1 | Do not run production jobs! | 1-1 | 2 | 4 | 48 Cores 96 Hyperthreads |
* If your job is too long for the maximum runtime of the cluster, you can implement auto-requeuing of the job in your SLURM job script.
Storage
Both the CPU- and the GPU-cluster are directly attached to a dedicated GPFS-storage system (Data Science Storage, DSS) with a capacity of about 50 PB net. The DSS hosts a large amount of various earth observation and auxiliary data and offers the possibility to store personal data (HOME), project data (dedictated storage containers) and intermediate data (SCRATCH).
Access
terrabyte HPC uses the command line to create, run and manage processing jobs on the cluster. For this, we rely on the workload manager SLURM, which is a state-of-the-art scheduling software of HPC-centers all around the world. Knowing how to use SLURM is a prerequisite to bring your processing to the large scale and to make full use of our available hardware resources. But don't be scared if you have never heared about SLURM or HPC. We have written down all you need in the documentation and it is easy and fast to learn. Learn about ways to run your processes on terrabyte in the job submission section. Jobs can be either started from the command line (interactive test jobs) or script-driven (for production jobs).