MPI Jobs

PBS script:

#PBS -q main
#PBS -l walltime=15:00:00
#PBS -l select=12:ncpus=1:mem=2625mb:mpiprocs=1:ompthreads=1
#PBS -W group_list=hpc
#PBS -r y
qhold -h u $PBS_JOBID

module load ….

mpirun your_code

 

This script requests :

  • the queue main nodes
  • a 15 hours wall-time
  • 12 chunks with 1 cpu, 2625mb of memory, 1 MPI task and 1 thread per chunk
  • It specifies that the job is re-runnable and that in the case of cancellation and rerun, the job must be put on hold by the server.
  • IntelMPI and OpenMPI are PBSpro aware, the machine file ($PBS_NODEFILE) must not be specify.
  • With this resources requirement, the machine file can have the following form as in the main queue the nodes are shared and as many nodes than chunks number can be allocated by PBS:
    node0620
    node0620
    node0620
    node0710
    node0710
    node0710
    node0710
    node0720
    node0721
    node0801
    node0846
  • In this case, memory limits are 7875MB, 13125MB, 2625MB, 2625MB and 2625mB on nodes node0620, node0710, node0721, node0801 and node0846 respectively.

If, instead of selecting 12 chunks of 1 ncpus and 1 mpiprocs per chunk, 1 chunk of 12 ncpus and 12 mpiprocs, is requested, all the process are placed on the same nodes:

...
#PBS -l select=1:ncpus=12:vmem=31500mb:mpiprocs=12:ompthreads=1
#PBS -l pvmem=2625mb
...

That is equivalent to the placement:

#PBS -l place=pack

In this case, the memory limit is set to 31500mb for all the process of this job.