Skip to content

Argonne Leadership Computing Facility

Job Queuing and Submission

Job Queuing and Submission

The CS-2 systems use slurm for job submission and queueing.
csrun_cpu is used to run a cpu-only job on one or more worker nodes.
csrun_wse is used to run a job on both the wafer scale engine and one or more worker nodes.

Your job will be blocked until there are available resources.
Scheduling is in first in, first out (FIFO) order.

# csrun_cpu [--help] [--alloc_node] [--mount_dirs] command_to_execute
csrun_cpu --help
# csrun_wse [--help] [--total-nodes] [--tasks_per_node] [--cpus_per_task] [--mount_dirs] command_for_cs_execution
csrun_wse --help
#
# squeue is used to inspect the job queue
# squeue [OPTIONS]
squeue -a
# scancel is used to cleanly kill a job
# scancel [OPTIONS] [job_id[_array_id][.step_id]]
scancel JOBID

Note: slurm jobs using the wafer (started using csrun_wse) will show with two parts in the squeue -a output. The jobids will be the base ID plus 0 or 1, e.g. 555+0 and 555+1. To cancel using scancel, give just the base ID as the argument, e.g. scancel 555.

You can find a detailed documentation for slurm here or use --help to see a summary of options for slurm commands, i.e. squeue --help.
See some examples of how these commands are used to submit and queue jobs in section Steps to run a model/program.