When you login to the Supercomputer system you land on a login node. Login nodes are for editing, compiling, preparing jobs. They are not for running jobs. Failure to follow this policy may result in account revocation. From the login node you can submit job scripts using sbatch or start interactive jobs with salloc.
Slurm
We use Slurm for cluster/resource management and job scheduling. Slurm is responsible for allocating resources to users, providing a framework for starting, executing and monitoring work on allocated resources and scheduling work for future execution.
Job Submissions
sbatch
sbatch submits a batch script to Slurm. The batch script may be given to sbatch through a file name on the command line, or if no file name is specified, sbatch will read in a script from standard input. The batch script may contain options preceded with “#SBATCH” before any executable commands in the script.
sbatch exits immediately after the script is successfully transferred to the Slurm controller and assigned a Slurm job ID. The batch script is not necessarily granted resources immediately, it may sit in the queue of pending jobs for some time before its required resources become available.
When you submit the job, Slurm responds with the job’s ID, which will be used to identify this job in reports from Slurm.
login:~> sbatch run_example Submitted batch job 1234567
salloc
salloc is used to allocate resources for a job in real time. Typically this is used to allocate resources and spawn a shell. The shell is then used to execute srun commands to launch parallel tasks.
login:~> salloc -N 1 -p shared salloc: Granted job allocation 1234567 salloc: Waiting for resource configuration salloc: Nodes n1234 are ready for your job n1234:~>
Monitoring Jobs
squeue
You can monitor your jobs using the squeue command. To view jobs specific to your user use squeue -u your_username. Please do not script or “watch” the squeue as this can put unnecessary load on the queuing system.
Email Notifications
You can use the email options in your submission scripts to notify you when a job starts/stops/errors. We recommend using your NETL email address for these notifications. External email address may experience delays or may fail to relay through the NETL email system.
#SBATCH --mail-type=begin,end,fail #SBATCH --mail-user=user@domain.com
Cancel Jobs
Cancel a job by JOB ID.
login:~> scancel -j $JOBID
Hold Jobs
Prevent a pending job from being started:
login:~> scontrol hold $jobid
Release Held Jobs
Allow a held job to accrue priority and run:
login:~> scontrol release $jobid