Starting programs in batch mode

Start job with submit script

The sbatch myscript.sh command is used to transfer a submit script to the SLURM workload manager for later execution, followed by a confirmation with job number:

uk00123@its-cs1:/home/users/000/uk00123> sbatch myscript.sh
Submitted batch job 5403542

Cancel job

You can cancel a job prematurely with scancel <JobID>.

uk00123@its-cs1:/home/users/000/uk00123> scancel 5403542


If you use a name for your jobs, all running jobs with this name can be canceled with scancel --jobname <JobName>:

uk00123@its-cs1:/home/users/000/uk00123> scancel --jobname "My Testjob"


The following command cancels ALL custom jobs. Only use this if you are sure!

scancel -u $USER

Information about current jobs

The squeue command displays information about waiting and running batch jobs. Completed jobs are not displayed.

squeue -u uk00123 displays information about all jobs of the specified user. For your own jobs, you can simply write $USER instead of the UniAccount:

uk00123@its-cs1:/home/users/000/uk00123> squeue -u $USER
JOBID PARTITION NAME USER ST TIME NODES NODELIST(REASON)
5403542 minijobs myscript.sh uk00123 R 2:06 1 its-cs194
5403547 minijobs myscript.sh uk00123 R 1:02 1 its-cs256


squeue -j <job-id_list> only lists the jobs whose IDs are specified (separated by commas):

uk00123@its-cs1:/home/users/000/uk00123> squeue -j 5403542,5403547
JOBID PARTITION NAME USER ST TIME NODES NODELIST(REASON)
5403542 minijobs myscript.sh uk00123 R 2:06 1 its-cs194
5403547 minijobs myscript.sh uk00123 R 1:02 1 its-cs256


If the console window is wide enough, the maximum job runtime of the currently running jobs can be displayed with squeue -l:

uk00123@its-cs1:/home/users/000/uk00123> squeue -l -u $USER
JOBID PARTITION NAME USER STATE TIME TIME_LIMIT NODES NODELIST(REASON)
5403542 minijobs myscript.sh uk00123 RUNNING 2:06 1:00:00 1 its-cs194
5403547 minijobs myscript.sh uk00123 RUNNING 1:02 1:00:00 1 its-cs256


With squeue -p PARTITIONSNAME only the jobs that were submitted in this partition are shown. Caution! As the partitions overlap, it is possible, for example, that no job is displayed in a partition even though all the nodes in the partition are working.

Syntax:

squeue [options]

-u <user_list>

print jobs from list of users

-i <seconds>

repeatedly gather and report requested
information

-j <job_id_list>

print list of job IDs

-n <name_list>

print jobs or job steps having one of the
specified names

--start

report expected start time and resources
to be allocated for pending jobs

Detailed information about a job/node/partition

With scontrol show job <JobID> the current status and much more information about the job can be displayed:

uk00123@its-cs1:/home/users/000/uk00123> scontrol show job 5403542


The most important information displayed is the status of the job (JobState). As long as the job is waiting in the queue until the resources are available and the allocation is created, it has the status PENDING. If it is then being executed, the status is RUNNING.

After successful completion of the job, the status is COMPLETED, otherwise FAILED or TIMEOUT. The latter means that SLURM aborted the job after the maximum time specified by the user in the submit script because it was not yet finished.

The standard output and error messages of the program are based on the parameters in the submit script --output and --error in the files defined there (e.g. slurm.its-cs194.5403542.out and slurm.its-cs194.5403542.err)

Syntax:

scontrol show ENTITY_ID

job <job_id>

print job information

node <name>

print node information

partition <name>

print partition information

reservation

print list of reservations