check_cluster
Introduction to check_cluster
check_cluster
is commandline tool to visualize the utilization of a slurm cluster.
The tool visualizes the usage of nodes per parition and shows the jobs run by the current user.
To retrieve the information from slurm, scontrol show nodes and scontrol show jobs are used.
The data is refreshed every 10 seconds.
Using check_cluster with Modules
To use check_cluster
on the terrabyte HPC system, load the check_cluster
module with the following command:
# consider adding the module use line to your ~/.bashrc to always make terrabyte modules available
module use /dss/dsstbyfs01/pn56su/pn56su-dss-0020/usr/share/modules/files/
module load check_cluster
Once loaded, you can start using check_cluster
to monitor the cluster. Below are some examples of common operations:
Usage Examples
Example 1: Check Node Status
To view the status of all nodes in the cluster:
check_cluster