What is a batch system?

Because our HPC systems have many nodes and many users, we need a tool to distribute computational tasks over the available nodes. To do this, we use what is known as a batch system.

Practically everybody is comfortable using interactive computer systems: when you click to open a program, e.g. a word processor or internet browser, that program is started immediately by your computer. In a batch system, however, you prepare a set of commands (a 'job') that the computer will execute later.

This set of commands is stored in a file: the job script. For example, a job script could contain commands to copy a file, then start a program, and then copy another file. This set of commands is not executed right away, but at a later time and generally on a different node than the job was submitted to. When and where a job is executed is determined by the job scheduler.

Advantages of a batch system

Advantages of a batch system are:

  • It allows many jobs (tens, hundreds or even thousands) to run at the same time. The system manages that jobs are run, and will run as many jobs as possible on the available resources. Interactively, this would not be so easy.

  • On an interactive system, too many users could start running applications on the same node during peak hours, which may cause these applications to run very slow or even crash the node. A batch system allows users to always submit jobs, even if a lot of people are using the system at the same time. Meanwhile, the system takes care of balancing the load across nodes.

  • Resources are used efficiently. With interactive usage, the computer may be very busy at some times (e.g. during office hours) and very quiet at other times (e.g. at night), which would be inefficient. In a batch system, most jobs may be submitted during office hours, but the scheduler will continue to start jobs at night as nodes become available.

Allocation of cores 

The batch system is responsible for allocating cores, processors or nodes to a job. It depends on the system what kind of granularity is used. Some systems allow an individual core to be allocated to a user, while other systems only allocate entire nodes.

Allocation on a per-core basis has the disadvantage that many users may be running programs on the same node. These applications then share common resources, such as the local disk and memory. If one user uses a lot of these resources, this may affect how the applications of other users run (they may become slow or unstable).

Allocation on a per-node basis has the the disadvantage that even if your application only uses a single core, you will pay for all cores in the node as long as it is allocated to your job - but the system is not intended for small computational tasks that only use a single core anyway.

On our systems, we offer different allocation possibility. See the "usage and accounting" sections for each of the systems to check what policies are currently available for the compute nodes you want to access.


Common properties of batch systems

Various batch systems exist, but they all share the following parts:

  • A method to define the requirements of a job
  • A method to define the actions that are to be performed
  • Handling of standard output and standard error
  • A system to schedule jobs
  • Utilities to monitor the progress of jobs
  • No labels