A repository of example job submissions scripts is located in the /nfs/apps/Submission directory. This section will provide documented examples of both a MPI-only and MPI+OpenMP Hybrid script. These scripts can be submitted to the queuing system using the sbatch command.
CP2K MPI-only Script
#/bin/bash -l
## Job Name as it will appear on squeue
#SBATCH --job-name="CP2K"
##
## MPI Submission settings. Usually only the "nodes" line should be changed. This script is recommended for < 8 nodes.
#SBATCH --nodes=2
#SBATCH --ntasks-per-node=128
##
## The partition and QOS to use
#SBATCH --partition=general
#SBATCH --qos=normal
##Load modules and Program Settings
module load cp2k/2024.1
##Run the program
mpirun cp2k.psmp input_file
Accounting
Defines the partition and Quality of Service (QOS). The partition is a required field. The normal QOS will be used if no QOS is specified. In this example the general partition and normal QOS are used.
##Accounting
## The partition and QOS to use
SBATCH --partition=general
#SBATCH --qos=normal
Submission Settings
Specify the desired nodes, tasks, times, job names and more. For this example 2 nodes, 128 tasks per node, and a job name “CP2K” are used. Please see the Hardware Overview system section for more information on the hardware design for Joule 3.0
#SBATCH --job-name="CP2K"
#SBATCH --nodes=2
#SBATCH --ntasks-per-node=128
Modules
Specify the required module(s) for the job. In this example, a CP2K module will be used. Unlike Joule 2.0, 3.0 modules will automatically load any dependency modules.
##Load Modules
module load cp2k/2024.1
Joule 3.0’s list of available modules is constantly evolving. The list of available modules can be viewed using the module avail command.
Program Run Line
Specify the run line for the program.
##Run the program
mpirun cp2k.psmp input_file
Job Submission
Assuming the name of the script is run_cp2k_2024, the script can be submitted to the queuing system using sbatch:
login:~> sbatch run_cp2k_2024.1
The job will launch when the requested resources are available in the queuing system.
VASP MPI+OpenMP Hybrid Script
This example will primarily focus on the differences between a MPI and MPI+OpenMP Hybrid run.
## Job Name as it will appear on squeue
#SBATCH --job-name="VASP"
##
## MPI/OpenMP Hybrid Submission settings. Only the "nodes" line should be changed. Recommended for use on > 8 nodes.
#SBATCH --nodes=16
#SBATCH --ntasks-per-node=16
#SBATCH --cpus-per-task=8
#SBATCH --hint=multithread
##
## The partition to use
#SBATCH --partition=general
#SBATCH –qos=normal
##Load modules and Program Settings
module load vasp/6.4.2/standard
MAP_BY=l3cache
##Run the program
mpirun --map-by ${MAP_BY}:PE=${SLURM_CPUS_PER_TASK} --bind-to core -x OMP_NUM_THREADS=${SLURM_CPUS_PER_TASK} -x OMP_PROC_BIND=true vasp
Submission Settings
The submission settings for this script are requesting 16 nodes, 16 tasks per node, 8 cores per task, and suggests to the queuing system that this is a multi-threaded run. 16 tasks are requested per node to match up with the 16 processor CCDs on each compute system. 8 cores per task are requested to match up with the 8 Zen 4 cores that are on a CCD. Generally, only the nodes line should be adjusted in a MPI+OpenMP Hybrid script.
## MPI/OpenMP Hybrid Submission settings. Only the "nodes" line should be changed. Recommended for use on > 8 nodes.
#SBATCH --nodes=16
#SBATCH --ntasks-per-node=16
#SBATCH --cpus-per-task=8
#SBATCH --hint=multithread
Program Run Line
The mpirun line for this Hybrid example is more complicated than the MPI-only script. The “–map-by” argument is being used to map 16 tasks per node on to separate L3caches. This will correspond to each task using a CCD. The “PE” argument of the “–map-by” is used to map the 8 OpenMP threads per task on to the appropriate CCXs for each CCD. “OMP_NUM_THREADS=${SLURM_CPUS_PER_TASK}” is used to set the OpenMP to the appropriate number of threads for each CCD.
##Load modules and Program Settings
module load vasp/6.4.2/standard
MAP_BY=l3cache
##Run the program
mpirun --map-by ${MAP_BY}:PE=${SLURM_CPUS_PER_TASK} --bind-to core -x OMP_NUM_THREADS=${SLURM_CPUS_PER_TASK} -x OMP_PROC_BIND=true vasp
Note Regarding Hybrid Jobs
The example above uses a combination of MPI and OpenMP for a Hybrid VASP run. As described in the AMD EPYC 9534 Processor section, Hybrid runs can scale to larger numbers of systems than what a MPI-only submission can. However, a Hybrid run may not run faster than a MPI-only submission for small resource requests. Several MPI-only and MPI+OpenMP Hybrid VASP example scripts are provided under /nfs/apps/Submission/ to accommodate any potential VASP job submission.
Please remember that not all programs feature MPI+OpenMP functionality.