Interactive (testing)
Interactive test jobs are suitable for program testing and development
purposes and shall be executed on the
hpda2_test
(CPU-cluster) and the
hpda2_testgpu
(GPU-cluster) partitions. While hpda2_testgpu
is not yet available, you
can use the hpda2_compute_gpu
partition
for testing purposes. On the test-partitions you can book 1 node at
maximum for testing. On the CPU-cluster, the CPU-minimum is 1
hyperthread. For a general introduction to SLURM, see: SLURM Workload
Manager.
Specify the amount of required resources wisely! Don't block resources by demanding resources (e.g. amount of cores, RAM, processing time), you don't need. As SLURM takes care of a fair share of resources, small jobs (i.e. jobs that request a small amount of resources for short times) are preferred over large jobs (i.e. jobs that request a large amount of resources for long times) for scheduling in the queue.
Interactive SLURM shell for simple testing
On the CPU-cluster
In order to start an interactive SLURM shell to execute simple non-parallel test tasks on the hpda2_test partition of the CPU-cluster, do:
-
Get an interactive shell on hpda2_test
- In order to get e.g. a bash shell (default) on a node segment of 16 hyperthreads (8 physical cores) and 8 GB RAM for 10 minutes, type:
salloc --cpus-per-task=16 --mem=8G --partition=hpda2_test --time=00:10:00
noteIn this default case, srun --pty bash -i is executed automatically (see command below).
- In order to get e.g. a C shell on a full node (160 hyperthreads=80 physical cores) with 1 TB RAM for 1 hour, type:
salloc --cpus-per-task=160 --partition=hpda2_test --time=01:00:00 srun --pty csh -i
noteSince the C shell is not the default, srun --pty csh -i must be explicitly added to the salloc statement.
-
Run your program/script within the interactive shell
- Example: run a simple python script; for an instruction on how to use software modules, see Software Modules
EnvironmentIf you want to use your own Python/Software environment, you can use a mamba or conda environment or a container.
module load python/3.7.11-extended #load python sofware module
python ./mypythonscript.py- Example: run an OpenMP multithreaded program
export OMP_NUM_THREADS=28
./myprog.exe -
Stop the interactive SLURM shell
exit
On the GPU-cluster
In order to start an interactive SLURM shell for testing purposes on the GPU-cluster, you can use the hpda2_testgpu partition.
-
Get an interactive shell on hpda2_testgpu
- In order to get e.g. a bash shell (default) on a node segment with 1 GPU (80 GB RAM) for 2 hours, type:
salloc --cluster=hpda2 --partition=hpda2_testgpu --nodes=1 --ntasks-per-node=1 --gres=gpu:1 --time=02:00:00
noteIn this default case, srun --pty bash -i is executed automatically (see command below).
- In order to get e.g. a C shell on a node segment with 1 GPU (80 GB RAM) for 2 hours, type:
salloc --cluster=hpda2 --partition=hpda2_testgpu --nodes=1 --ntasks-per-node=1 --gres=gpu:1 --time=02:00:00 srun --pty csh -i
noteSince the C shell is not the default, srun --pty csh -i must be explicitly added to the salloc statement.
-
Run your program/script within the interactive shell
- Example: run a simple python script; for an instruction on how to use software modules, see Software Modules
EnvironmentIf you want to use your own Python/Software environment, you can use a mamba or conda environment or a container.
module load python/3.7.11-extended #load python sofware module
python ./mypythonscript.py -
Stop the interactive SLURM shell
exit
Interactive SLURM shell for parallel testing
In order to start an interactive SLURM shell to execute parallel MPI hybrid MPI/OpenMP test tasks on the hpda2_test partition of the CPU-cluster, do:
- Login to the login-node
- For performing program testing and short runs the following sequence of commands can be used: First, salloc is invoked to reserve the needed resources. Then, a suitably parameterized call to a parallel program binary (usually mpiexec) is used to start up that program, using the resources assigned by SLURM. You can book 1 node (=160 hyperthreads = 80 pyhiscal cores) at maximum for testing!
Commands for resource allocation and job run | Use case |
---|---|
| Start an MPP mode Intel MPI program using 1 full node on the cluster (= 80 MPI tasks, each using 1 physical core) In order to use OpenMPI over Intel MPI, unload the intel-mpi module before calling mpiexec:
The version in this example may not be available anymore - replace with other version if needed. For an instruction on how to use software modules, see Software Modules |
| Start a hybrid mode Intel MPI program using 8 MPI tasks, with 7 OpenMP threads per task and 40 GB RAM in total. |
| Start a hybrid mode Intel MPI program using 5 MPI tasks, with 8 OpenMP threads per task, distributed across 1 node. Note that there are 80 physical cores per node available on a hpda2_test node, and due to hyperthreading the logical core count on each node is 160. In this example, the hyperthreads are not used, but you could increase the value of OMP_NUM_THREADS to make use of them. A non-hybrid program will need to use OMP_NUM_THREADS=1, and --tasks-per-node can have any value up to 160 (larger values may result in failure to start). |
GOTCHA: Put interactive shells into the background and resume
For a shell started as an interactive job, the corresponding job terminates and cannot be resumed, once you exit the shell or close the terminal window.
However, a possible workaround is to use screen, which is a terminal multiplexer (see e.g. https://linuxize.com/post/how-to-use-linux-screen/):
screen -S test # opens a new screen session
salloc --cpus-per-task=2 --mem=8G --partition=hpda2_test --time=00:10:00 # starts a new job with an interactive bash-shell
<Ctrl> + <A> <D> # exits the screen session with the job-shell (without terminating the job)
screen -r test # resumes the screen session with the job-shell
<Ctrl> + <D> # exits the shell and terminates the job