Scaling of VASP

From Nano Group Budapest
Jump to navigation Jump to search

HSE06 hybrid calculations

Test benchmark case: VASP 5.4.1, gamma only (except of GPU is vasp_std)

ENCUT = 600.00 eV (ENAUG = 1200.00 eV)

PRECFOCK=Fast

NAME budapest2 uv2 debrecen2 szeged fast
Perf. per Node (1/s) 120.25 615.78 543.88 131.23 83.08
Perf. per Core (1/s) 6.01 6.41 181.29 2.73 6.92
Total Perf. (1/s) 0 0.84 0.62 2.18 0.52 0.33
time (s) 1188 1624 460 1905 3009
NCORE 5 1 1 6 1
Total Cores 140 96 12 192 48
Node count 7 1 4 4 4
MPI per Node 20 96 3 48 12
mkl_threads 1 1 1 1 1
VASPcore 140 96 12 192 48
IALGO DAV DAV DAV DAV DAV
CPU E5-2680V2 E5-4610 Nvidia K20x Opteron 6174 E5-2620
Wattage (W) 115 115 235 115 115
count per Node 2 8 3 4 8
suplement (W) 50 50 150 75 50
perNode (W) 280 970 855 535 970
total (W) 1960 970 3420 2140 3880
efficiency (1/W) 429 635 636 245 86

Notes: - There is one MPI thread per pysical GPU. There are 3 GPU's per node in debrecen2. As well as there is no gamma only VASP version available, the performance would be even higher.

- AMD Opteron 6174 is considerably slower than recent intel CPU's. They have only 1/2 of the speed of an intel core. It is best to use NCORE = 6 flag. This will group the six cores of the real cpu dies. Opteron 6174 actually two six core cpu's inside the same chassis. Thus at szeged's nodes there are 8 hexa core cpu's, totaling 48 cores.

- On intel cpu's NCORE > 1 is important when more than 100 MPI threads. A value 2 might be enough to restore the linear scaling and avoid the bandwidth limitations.