FAQ - High performance computing

Why can I no longer log in to cs1 or cs10?

Sometimes users of the cluster start their tasks on the login node its-cs1.its.uni-kassel.de, so that it is busy and makes it difficult for other users to continue working. If this occurs repeatedly, the user in question will be blocked. In this case, please send us an e-mail. Only its-cs10 is intended for interactive work.

Is there a short overview of important information about the cluster and SLURM commands?

In addition to the homepage for scientific data processing of the ITS, various websites about SLURM, such as the official SLURM website www.slurm.schedmd.com and the manpages (e.g. man sbatch on its-cs1), a summary of important information is available for download as a"quick reference".

Why is my job not starting?

If your job is in the queue and does not start, the NODELIST(REASON) field of the squeue output( Start submit script section) will give you information about the possible cause:

Resources	Job is waiting for available resources
Priority	Job has moved further back in the queue due to jobs with higher priority
ReqNodeNotAvail	Combinations of the requested resources are not available or do not exist or job started on reserved partition.

More information in the man squeue documentation in the JOB REASON CODES section.

Why is my job breaking up?

To check exactly why a job aborted, some additional information is helpful. Send us an e-mail with the job number and attach the log file and submit script.

Why is the performance of my job poor?

This may be related to the computer architecture of the assigned node. The cluster consists of a large number of partitions whose CPUs have different speeds. When running serial jobs on a node with a processor clock rate of e.g. 2300 MHz, it is very likely that the job will take longer to run than on a desktop PC with a faster CPU.

Inefficient program code can also be responsible for long runtimes. For code and performance analyses, send an e-mail with the relevant information about your job.

Further information on hardware can be found under Hardware of the cluster.

What special features can I request in the cluster?

The following features of the Linux cluster can be set with the --constraint option of the commands sbatch, salloc or in a submit script(#SBATCH --constraint <features>):

32|24|12|8Cores InfiniBand NoIB Switch1|2|3 (available in the exec, mpi, mpi1 partitions) SwitchA|B (available in the thphysik partition) Xeon5675

What can be calculated on the cluster?

Basically anything that is available via the module system can be calculated on the cluster. Parallelization is supported with software such as MVAPICH, MPICH or OpenMP. Other software can also be installed in the user directory, provided it can run under Linux (no Windows programs can be installed or started via Wine). Further information can be found on the page Batch operation with Slurm.

Is there a directory in which I can store large data?

The scratch drive /work is mounted in the file system of the cluster and is globally available. However, the directory is not backed up. Take your own precautions here to back up important data. Further information on storage space can be found under Access.

Go-Link of this page: https://www.uni-kassel.de/go/hochleistungsrechnen-faq