Starting programs in batch mode
Start job with submit script
The sbatch myscript.sh command is used to transfer a submit script to the SLURM workload manager for later execution, followed by a confirmation with job number:
uk00123@its-cs1:/home/users/000/uk00123> sbatch myscript.sh |
Cancel job
You can cancel a job prematurely with scancel <JobID>.
uk00123@its-cs1:/home/users/000/uk00123> scancel 5403542 |
If you use a name for your jobs, all running jobs with this name can be canceled with scancel --jobname <JobName>:
uk00123@its-cs1:/home/users/000/uk00123> scancel --jobname "My Testjob" |
The following command cancels ALL custom jobs. Only use this if you are sure!
scancel -u $USER |
Information about current jobs
The squeue command displays information about waiting and running batch jobs. Completed jobs are not displayed.
squeue -u uk00123 displays information about all jobs of the specified user. For your own jobs, you can simply write $USER instead of the UniAccount:
uk00123@its-cs1:/home/users/000/uk00123> squeue -u $USER |
squeue -j <job-id_list> only lists the jobs whose IDs are specified (separated by commas):
uk00123@its-cs1:/home/users/000/uk00123> squeue -j 5403542,5403547 |
If the console window is wide enough, the maximum job runtime of the currently running jobs can be displayed with squeue -l:
uk00123@its-cs1:/home/users/000/uk00123> squeue -l -u $USER |
With squeue -p PARTITIONSNAME only the jobs that were submitted in this partition are shown. Caution! As the partitions overlap, it is possible, for example, that no job is displayed in a partition even though all the nodes in the partition are working.
Syntax: | squeue [options] |
-u <user_list> | print jobs from list of users |
-i <seconds> | repeatedly gather and report requested |
-j <job_id_list> | print list of job IDs |
-n <name_list> | print jobs or job steps having one of the |
--start | report expected start time and resources |
Detailed information about a job/node/partition
With scontrol show job <JobID> the current status and much more information about the job can be displayed:
uk00123@its-cs1:/home/users/000/uk00123> scontrol show job 5403542 |
The most important information displayed is the status of the job (JobState). As long as the job is waiting in the queue until the resources are available and the allocation is created, it has the status PENDING. If it is then being executed, the status is RUNNING.
After successful completion of the job, the status is COMPLETED, otherwise FAILED or TIMEOUT. The latter means that SLURM aborted the job after the maximum time specified by the user in the submit script because it was not yet finished.
The standard output and error messages of the program are based on the parameters in the submit script --output and --error in the files defined there (e.g. slurm.its-cs194.5403542.out and slurm.its-cs194.5403542.err)
Syntax: | scontrol show ENTITY_ID |
job <job_id> | print job information |
node <name> | print node information |
partition <name> | print partition information |
reservation | print list of reservations |