Parallelization

SMP vs Cluster

Schematic distribution of a MPIOMP job. Each MPI process can be on different computing node or processor and can run several OMP threads eg. on CPU cores. In a MPI-only mode each CPU core is associated with one MPI process which is itself one OMP thread at the same time.

Usual Linux clusters consist of several small SMP systems interconnected with a high-speed network (e.g.: InfiniBand). An ordinary SMP system is usually NUMA-type. This means that memory access time depends on the location of the memory like the cache hierarchy of a processor. In a NUMA machine it is crucial to allocate cpu and memory on the same node. This is achieved by partitioning the system among user processes. The partition is called cpuset and usually done by the scheduler. memory is allocated by numactl in the same manner. The constrained process is allowed to run only its cpuset. Within a cpuset processes can be bound to CPUs as well. This is called CPU binding and usually done by the MPI subsystem.

The allowed range of CPUs and memory from a job can be checked by: numactl --show

Parallel Modes

Parallelization can be process-based and/or thread-based. Latter needs an SMP machine. Process-based techniques on the other hand can span through more nodes. MPI is a process-based OpenMP (OMP) is a thread-based technique. MPI programs need an MPI environment to start and this environment is specific for a binary and can't be replaced by an other MPI subsystem unless a special MPI wrapper is applied like SGI's Perfboost. A hybrid method utilizes threads within nodes and processes for communication between nodes.

Parallel Mode	Node Span	Machine Type	Environment
OpenMP	Inter-node, main process has >100% utilization	SMP	Threads are controlled by environment variables
MPI	Intra/Inter-node, separate processes	Cluster/SMP	MPI programs started with an mpi runner (mpirun/mpiexec), various implementations
MPI-OMP Hybrid	Inter/Intra-node, some main processes per node with >100% utilization	Cluster/SMP	Processes sarted by the mpi runner and threads are controlled by env. variables (see above)

Shell Framework

THE REST OF THIS PAGE CONTAINS NOT SUPPORTED LEGACY INFORMATION!!!

Multi-step Jobs

It is possible to run multi-step jobs within the framework's run scheme. It is suitable to run the same program with different inputs in sequence. Since its general nature you have to write your own run kernel for each type of complex jobs. The kernel file should contain a mandatory function: <PROGRAM>/kernel and usually performs the following steps:

Run the first step
Check and validate
Save outputs
Prepare new input
Set new mpi mode
Run the next step (goto 2.)

Be aware that the kernel function runs inside the run library, therefore you have to declare only local shell variables. Also mind that the kernel can be check only at runtime, so prepare it on small test cases. You are able to reconfigure the mpi mode before calling the run step. Thus complex workflows with different resource needs can be run in one job. The usual case is to switch from full MPI to a reduced MPIOMP mode in order to fit into the memory. Allowed job file variables are: SCKTS (CPU sockets), CORES (Cores per CPU), BIND (Binding). Note that binding have to be set explicitly.

Enable the run kernel in the job file:

 KERNEL=<KERNELFILE>

To switch mpi mode first set job file variables accoring to your needs and call:

 run/prg/mode MODE

where MODE is mpi or mpiomp (see example below).

A complex kernel example for VASP, which performs a combined 3-step PBE, HSE calculation: PBE precondition, fast HSE CG geometry optimization, final and accurate (normal) HSE CG optimization.

vasp_kernel_save="${run_prg_vasp_save}"

function vasp/kernel() {
  # reserved global variables
  # ${_inp} : input
  # ${_out} : output
  # ${_cmd} : command

  import gui
  import str/f
  import run/prg
  import run/prg/vasp/check

  ###
  ### STEP 1. PBE preconditioning
  ###
  local _step="pbe"
  u/title "PBE 1scf"
  run/prg/step
  if failed $? ; then $failure; fi

  ### validate - check WAVECAR
  run/prg/vasp/check/wavecar
  if failed $? ; then $failure; fi

  ### timing
  run/prg/vasp/check/timing

  ### save outputs
  u/title "Save outputs"
  run/prg/save "${_step}" ${run_prg_vasp_check_save}


  ###
  ### STEP 2. HSE Fast CG
  ###
  u/title "HSE Fast CG"

  ### new input
  if readable CONTCAR ; then
    mv -fv CONTCAR POSCAR
  else
    msg "Not found: CONTCAR"
    $failure
  fi

  # create new input for HSE fast
  # http://cms.mpi.univie.ac.at/vasp/vasp/PRECFOCK_FFT_grid_in_Hartree_Fock_GW_related_routines.html
  # http://cms.mpi.univie.ac.at/vasp/vasp/Amount_exact_DFT_exchange_correlation_AEXX_AGGAX_AGGAC_ALDAC_tags.html
  local _incar=${_step}.INCAR
  cat ${_incar}           | \
  str/f/set NSW 100       | \
  str/f/set EDIFFG 5E-04  | \
  str/f/set ISTART 1      | \
  str/f/set LHFCALC       | \
  str/f/set AEXX 0.25     | \
  str/f/set AGGAX 0.25    | \
  str/f/set AGGAC 1.00    | \
  str/f/set HFSCREEN 0.2  | \
  str/f/set PRECFOCK Fast | \
  str/f/set IALGO 53      | \
  str/f/set ISYM 3        | \
  str/f/set TIME 0.4      > INCAR

  _step="hsef"
  SCKTS=4
  CORES=2
  BIND="omplace -s 1"
  run/prg/mode mpiomp
  run/prg/step
  if failed $? ; then $failure; fi

  ### validate - check accuracy
  run/prg/vasp/check/accuracy
  if failed $? ; then $failure; fi

  ### timing
  run/prg/vasp/check/timing

  ### save outputs
  u/title "Save outputs"
  run/prg/save "${_step}" ${run_prg_vasp_check_save}


  ###
  ### STEP 3. HSE Normal CG
  ###
  u/title "HSE Normal CG"

  ### new input
  if readable CONTCAR ; then
    mv -fv CONTCAR POSCAR
  else
    msg "Not found: CONTCAR"
    $failure
  fi

  local _incar=${_step}.INCAR
  cat ${_incar}             | \
  str/f/set EDIFFG 1E-04    | \
  str/f/set PRECFOCK Normal > INCAR

  SCKTS=4
  CORES=2
  BIND="omplace -s 1"
  run/prg/mode mpiomp
  run/prg/step
  if failed $? ; then $failure; fi

  ### validate - check accuracy
  run/prg/vasp/check/accuracy
  if failed $? ; then $failure; fi

  ### timing
  run/prg/vasp/check/timing
}

Quantum Chemistry Codes

VASP

Load the environment:

 mld vasp/proj
 mld vasp/<VERSION>

Input files are named with the following convention:

 <PREFIX>.cntl : INCAR
 <PREFIX>.geom : POSCAR
 <PREFIX>.kpts : KPOINTS
 <PREFIX>.qpts : QPOINTS

POTCAR file is automatically created by the run script. Output files get the <PREFIX> prefix.

Quantum Espresso

Load the following modules:

 mld espresso/pseudo
 mld espresso/<VERSION>

The control file should contain ./ for directories.

 &control
   pseudo_dir = './',
   outdir='./
   prefix='<PREFIX>'

How to run calculations

Contents