Nodes and partitions

Overview

In the Linux cluster, the individual servers are referred to as "nodes". These nodes are divided into partitions. If you want to perform a calculation on the cluster, you must specify which partition the nodes should come from.

The partitions are named according to the year of release of the CPUs installed in each case.
The following partitions are available:


Partition	Partition description
pub12	University public partition for all users. Maximum 10 days of computing time per job. Two 16-core AMD Opteron 6276 processors from 2012, 32GB-128GB RAM and Infiniband networking for jobs with multiple nodes. As these nodes are also located in the mpi and mpi1 partitions at the same time, users who use the nodes via the mpi and mpi1 partitions have priority.
pub17	University public partition for all users. Maximum 6 days of computing time per job. Includes 8 nodes (its-cs[132-139]) each with 2 12-core Intel Xeon E5-2650 processors from 2017 with 512GB RAM and 3 nodes (its-cs[161-163]) each with 2 16-core Intel Xeon Gold 5218 processors from 2019 with 384GB RAM. All 11 nodes have Infiniband networking for multi-node jobs.
pub15	University public partition for all users. Maximum 2 days of computing time per job. Contains a total of 40 nodes from 2015, each with 2 12-core Intel Xeon E5-2680 processors with 128GB RAM and Infiniband networking for multi-node jobs. As these nodes are also located in the AG-Garcia partition at the same time, users who use the nodes via the AG-Garcia partition have priority.
pub15gpu	University public partition for all users. Maximum 2 days of computing time per job. Contains a total of 8 nodes for which CPU, RAM and Infiniband networking are identical to those in pub15. In addition, as these nodes are also located in the AG-Garcia partition at the same time, users who use the nodes via the AG-Garcia partition have priority.
FB16	All employees of department 16 have access to this partition. Employees from other departments and students working on a project can also be granted access for a limited period of time if the computing time of 10 days in the "public" or 8 days in the "public2" is not sufficient for them. Please contact Daniel Bischof, who will be happy to help you with any questions about the partition. Unlimited computing time per job, 12 dual processor systems, each with 2 Intel Xeon 6-core processors and Infiniband networking.
mpi,mpi1	Moderated partitions for MPI applications with many nodes. Access on request. Maximum 400 hours of computing time per job. Dual processor systems with 2 16-core Opteron processors each and Infiniband networking.
Further partitions	There are further partitions that are not public. The computing nodes in these "moderated" partitions were usually financed by specialist areas/departments and operated by them in the Linux cluster.

Retrieve information about partitions and nodes

The sinfo command lists information about the runtimes and availability of the partitions in the cluster. Abbreviated example output from sinfo:

sinfo
sinfo	uk00123@its-cs1:/home/users/000/uk00123> sinfo PARTITION AVAIL TIMELIMIT NODES STATE NODELIST public* up 10-00:00:00 1 drain its-cs[240] public* up 10-00:00:00 22 alloc its-cs[193-205,....,216-218] public* up 10-00:00:00 12 idle its-cs[214-215,...,228-231] ...

22 nodes are already allocated in the public partition, i.e. in use. 12 nodes are in the idle state and are available for tasks, while 22 are fully allocated. The maximum runtime (TIMELIMIT) is limited to 10 days. The asterisk after the partition name means that it is the default partition if no partition is specified for a job.

There is also a graphical version of sinfo, which can be called with sview. For this, the so-called "X11 forwarding" must be activated when entering the cluster (e.g. ssh -X its-cs1.its.uni-kassel.de).

Further details on partitions can be called up as follows (abbreviated output):

scontrol show partition
scontrol show partition	uk00123@its-cs1:/home/users/000/uk00123> scontrol show partition public PartitionName=public AllocNodes=ALL AllowGroups=ALL Default=YES DefaultTime=00:05:00 DisableRootJobs=NO GraceTime=0 Hidden=NO MaxNodes=UNLIMITED MaxTime=2-00:00:00 MinNodes=1 MaxCPUsPerNode=UNLIMITED Nodes=its-cs10,its-cs[193-205],...,its-cs[228-231] Priority=1 RootOnly=NO ReqResv=NO Shared=NO PreemptMode=OFF State=UP TotalCPUs=416 TotalNodes=35 SelectTypeParameters=N/A DefMemPerNode=UNLIMITED MaxMemPerNode=UNLIMITED

Call up information on individual nodes

scontrol show node
scontrol show node	uk00123@its-cs1:/home/users/000/uk00123> scontrol show node its-cs214 NodeName=its-cs214 Arch=x86_64 CoresPerSocket=6 CPUAlloc=0 CPUErr=0 CPUTot=12 CPULoad=0.02 Features=12cores,NoIB Gres=(null) NodeAddr=its-no214 NodeHostName=its-cs214 OS=Linux RealMemory=64000 AllocMem=0 Sockets=2 Boards=1 State=IDLE ThreadsPerCore=1 TmpDisk=0 Weight=1 BootTime=2015-09-10T11:42:54 SlurmdStartTime=2015-09-10T11:45:25 CurrentWatts=0 LowestJoules=0 ConsumedJoules=0 ExtSensorsJoules=n/s ExtSensorsWatts=0 ExtSensorsTemp=n/s