Scaling of VASP

From Nano Group Budapest
Jump to navigation Jump to search

HSE06 hybrid calculations

Test benchmark case: VASP 5.4.1, gamma only (except of GPU is vasp_std)

ENCUT = 600.00 eV (ENAUG = 1200.00 eV)

PRECFOCK=Fast

NAME budapest2 uv2 debrecen2 szeged fast
Perf. per Node (1/s) 120.25 615.78 543.88 131.23 89.29
Perf. per Core (1/s) 6.01 6.41 181.29 2.73 7.44
Total Perf. (1/s) 0.84 0.62 2.18 0.52 0.36
time of one SCF iteration (s) 1188 1624 460 1905 2800
NCORE 5 1 1 6 1
Total Cores 140 96 12 192 48
Node count 7 1 4 4 4
MPI processes per Node 20 96 3 48 12
IALGO DAV DAV DAV DAV DAV
CPU E5-2680V2 E5-4610 Nvidia K20x Opteron 6174 E5-2620
Wattage (W) 115 95 235 115 115
count per Node 2 16 3 4 2
suplement (W) 50 50 150 75 50
perNode (W) 280 1570 855 535 280
total (W) 1960 1570 3420 2140 1120
efficiency (1/W) 4295 3922 6361 2453 3189

Notes:

- There is one MPI thread per pysical GPU. There are 3 GPU's per node in debrecen2. As well as there is no gamma only VASP version available, the performance would be even higher.

- AMD Opteron 6174 is considerably slower than recent intel CPU's. They have only 1/2 of the speed of an intel core. It is best to use NCORE = 6 flag. This will group the six cores of the real cpu dies. Opteron 6174 actually two six core cpu's inside the same chassis. Thus at szeged's nodes there are 8 hexa core cpu's, totaling 48 cores.

- On intel cpu's NCORE > 1 is important when more than 100 MPI threads. A value 2 might be enough to restore the linear scaling and avoid the bandwidth limitations.