Nodes and partitions

Overview

In the Linux cluster, the individual servers are referred to as "nodes". These nodes are divided into partitions. If you want to perform a calculation on the cluster, you must specify which partition the nodes should come from.

The partitions are named according to the year of release of the CPUs installed.
The following partitions are available to you:

PartitionPartition description
pub23University public partition for all users. Maximum 6 days of computing time per job. Includes 36 nodes, each with 2 AMD EPYC 7443 24-core processors from 2023, 256GB RAM and 100Gbit Infiniband networking for multi-node jobs.
pub23gpuUniversity public partition for all users. Maximum 6 days of computing time per job. Includes 2 nodes each with an Nvidia A100 80GB GPU and an AMD EPYC 7443 24-core processor from 2023, 256GB RAM and 100Gbit Infiniband networking for multi-node jobs.
pub17University public partition for all users. Maximum 6 days of computing time per job. Includes 8 nodes (its-cs[132-139]) each with 2 12-core Intel Xeon E5-2650 processors from 2017 with 512GB RAM and 3 nodes (its-cs[161-163]) each with 2 16-core Intel Xeon Gold 5218 processors from 2019 with 384GB RAM. All 11 nodes have Infiniband networking for multi-node jobs.
pub15University public partition for all users. Maximum 2 days of computing time per job. Includes a total of 40 nodes from 2015, each with 2 12-core Intel Xeon E5-2680 processors with 128GB memory and Infiniband networking for multi-node jobs. As these nodes are also located in the AG-Garcia partition at the same time, users who use the nodes via the AG-Garcia partition have priority.
pub12University public partition for all users. Maximum 10 days of computing time per job. 2 16-core AMD Opteron 6276 processors from 2012, 32GB-128GB RAM and Infiniband networking for jobs with multiple nodes. As these nodes are also located in the mpi and mpi1 partitions at the same time, users who use the nodes via the mpi and mpi1 partitions have priority.
FB16All employees of department 16 automatically have access to this partition. Employees from other departments and students working on a project can also be granted access for a limited period of time if the computing time of 10 days in "public" or 8 days in "public2" is not sufficient for them. Please contact Daniel Bischof, who will be happy to help you with any questions regarding access to the partition.
Unlimited computing time per job, 12 dual processor systems, each with 2 Intel Xeon 6-core processors and Infiniband networking.
Further partitionsThere are further partitions that are not public. The computing nodes in these "moderated" partitions were usually financed by specialist areas/departments and operated by them in the Linux cluster.

Retrieve information about partitions and nodes

The sinfo command lists information about the runtimes and availability of the partitions in the cluster. Abbreviated example output from sinfo:

sinfo
uk00123@its-cs1:/home/users/000/uk00123> sinfo
PARTITION AVAIL TIMELIMIT NODES STATE NODELIST
headnodes up 1:00 2 idle its-cs[1,136]
pub23* up 6-00:00:00 4 drain its-cs[523,531-533]
pub23* up 6-00:00:00 14 alloc its-cs[500-501,524]
pub23* up 6-00:00:00 16 idle its-cs[502-505,507-522,525-530]
pub23gpu up 6-00:00:00 1 alloc its-cs536
pub23gpu up 6-00:00:00 1 idle its-cs537

...
  • In the pub23 partition, 14 nodes are already allocated, i.e. in use. 16 nodes are in idle state and are available for tasks, while 4 have been paused by an admin, e.g. to represent something. The maximum runtime (TIMELIMIT) is limited to 6 days. The asterisk after the partition name means that it is the default partition if no partition is specified for a job.

 

There is also a graphical version of sinfo, which can be called with sview. For this, the so-called "X11 forwarding" must be activated when entering the cluster (e.g. ssh -X its-cs1.its.uni-kassel.de).

Further details on complete partitions can be called up as follows (abbreviated output):

scontrol show partition
PartitionName=pub23
DefaultTime=00:05:00 DisableRootJobs=NO ExclusiveUser=NO ExclusiveTopo=NO
MaxNodes=36 MaxTime=6-00:00:00 MinNodes=0 LLN=NO
Nodes=its-cs[500-533]
PriorityJobFactor=1 PriorityTier=1 RootOnly=NO ReqResv=NO OverSubscribe=NO
OverTimeLimit=NONE PreemptMode=OFF
State=UP TotalCPUs=1632 TotalNodes=34 SelectTypeParameters=NONE
JobDefaults=(null)
DefMemPerNode=UNLIMITED MaxMemPerNode=UNLIMITED
TRES=cpu=1632,mem=8262000M,node=34,billing=1632

Call up information on individual nodes

scontrol show node

uk00123@its-cs1:/home/users/000/uk00123> scontrol show node its-cs214
NodeName=its-cs214 Arch=x86_64 CoresPerSocket=6
CPUAlloc=0 CPUErr=0 CPUTot=12 CPULoad=0.02 Features=12cores,NoIB
Gres=(null)
NodeAddr=its-no214 NodeHostName=its-cs214
OS=Linux RealMemory=64000 AllocMem=0 Sockets=2 Boards=1
State=IDLE ThreadsPerCore=1 TmpDisk=0 Weight=1
BootTime=2015-09-10T11:42:54 SlurmdStartTime=2015-09-10T11:45:25
CurrentWatts=0 LowestJoules=0 ConsumedJoules=0
ExtSensorsJoules=n/s ExtSensorsWatts=0 ExtSensorsTemp=n/s