Enforcement

 

Cpus and memory enforcements

Cpus and memory are enforced on zenobe with "cgroups". The term cgroup (pronounced see-group, short for control groups) refers to a Linux kernel feature that was introduced in version 2.6.24. A cgroup may be used to restrict access to system resources and account for resource usage. The root cgroup is the ancestor of all cgroups and provides access to all system resources. When a cgroup is created, it inherits the configuration of its parent. Once created, a cgroup may be configured to restrict access to a subset of its parent’s resources. When processes are assigned to a cgroup, the kernel enforces all configured restrictions. When a process assigned to a cgroup creates a child process, the child is automatically assigned to its parent’s cgroup.

On zenobe Linux cgroups implementation in PBSpro scheduler do the following:

  • Prevent job processes from using more resources than specified; e.g. disallow bursting above limits
  • Keep job processes within defined memory and CPU boundaries
  • Track and report resource usage

Cgroups are not set by chunk but by node/host. If PBS can put several chunks of a job on the same node, these resources will be all attached to the same cgroup.

Within the memory cgroup, the memory management is based on the Resident Set Size, ie the physical memory used. Use mem or pmem to request the job resources.

When a job is killed due to hitting the memory cgroup limit, you will see something like the following in the job's output:

Cgroup memory limit exceeded: Killed process ...
Job Lifecycle with Cgroups
When PBS runs a single-host job with Cgroup, the following happens:
  1. PBS creates a cgroup on the host assigned to the job. PBS assigns resources (CPUs and memory) to the cgroup.
  2. PBS places the job’s parent process ID (PPID) in the cgroup. The kernel captures any child processes that the job starts on the primary execution host and places them in the cgroup.
  3. When the job has finished, the cgroups hook reports CPU and memory usage to PBS and PBS cleans up the cgroup.
When PBS runs a multi-host job, the following happens:
  1. PBS creates a cgroup on each host assigned to the job. PBS assigns resources (CPUs and memory) to the cgroup.
  2. PBS places the job’s parent process ID (PPID) in the cgroup. The kernel captures any child processes that the job starts on the primary execution host and places them in the cgroup.
    • MPI jobs:
      • As PBS is integrated with IntelMPI and OpenMPI. PBS places the parent process ID (PPID) in the correct cgroup. PBS communicates the PPID to any sister MoMs and adds them to the correct group on the sister MoM.
      • For MPI jobs that do not use IntelMPI or OpenMPI,  please contact itatcenaero [dotcenaero] be to verify the program behavior.
    • Non-MPI jobs: we must make sure that job processes get attached to the correct job.   Please contact itatcenaero [dotcenaero] be to verify the program behavior.
  3. When the job has finished, the cgroups hook reports CPU and memory usage to PBS and PBS cleans up the cgroup.

Node Access Enforcement

  • Users who are not running PBS jobs cannot access to the compute nodes.