Quality of Service (QOS) is a Slurm concept which imposes constraints on job priority, wallclock time, and submissions on jobs. The following Quality of Service can be requested on Joule 3.0:
gpu
Default Quality of Service for the gpu partition. Each gpu job will be assigned a whole system with the ability to use up to eight H100 GPUs for 48 hours. Up to 10 gpu jobs maybe submitted, but only 1 can run at a time.
long
Allows jobs to run in the general and bigmem partitions for up to a week. Users are restricted to having 10 running jobs at a time on up to 3,072 cores. This Quality of Service will be subject to additional group core restrictions.
normal
Default Quality of Service for the general and bigmem partitions. Users can submit up to 250 jobs that collectively use a maximum of 30,720 cores. The wallclock time for this Quality of Service is 48 hours.
priority
Any job submitted using the priority Quality of Service will receive a large priority boost which will minimize job waiting time. Only two priority jobs are allowed to be submitted at a time on up to 1024 cores. The wallclock time for this Quality of Service is 1 week.
post
Default Quality of Service for the post partition. Each post job will be assigned a whole system for 8 hours. Up to 10 post jobs maybe submitted, but only 1 can run at a time.
project
A Quality of Service that is only available upon request and subject to management review. This Quality of Service is intended for projects that need access to dedicated resources for a set period of time.
Joule 3.0 Quality of Service Overview
QOS | Max Cores | Time Limit | Job Limit | Submit Limit |
gpu | 128 | 48 Hours | 1 | 10 |
long | 3,072 | 1 Week | 10 | 50 |
normal | 30,720 | 48 Hours | 250 | 250 |
post | 48 | 8 Hours | 1 | 10 |
priority | 1024 | 1 Week | 2 | 2 |
project | 25,600 | 2 Weeks | 1 | 10 |
Job requests that exceed the Quality of Service limits will receive an error message.
Users can specify their desired Quality of Service at submission time. Listed below is an example for the long Quality of Service.
For sbatch, the line in your submit script this line would look like:
#SBATCH --qos=long
Interactive jobs using salloc would look like:
login:~> salloc -N 1 -q long