Scaling of VASP

From Nano Group Budapest
Jump to navigation Jump to search

HSE06 hybrid calculations

Test benchmark case: VASP 5.4.1, gamma only (except of GPU is vasp_std)

ENCUT = 600.00 eV (ENAUG = 1200.00 eV)

PRECFOCK=Fast

NAME budapest2 uv2 debrecen2 szeged fast
Perf. per Node (1/s) 120.25 615.78 543.88 131.23 89.29
Perf. per Core (1/s) 6.01 6.41 181.29 2.73 7.44
Total Perf. (1/s) 0.84 0.62 2.18 0.52 0.36
time of one SCF iteration (s) 1188 1624 460 1905 2800
NCORE 5 1 1 6 1
Total Cores 140 96 12 192 48
Node count 7 1 4 4 4
MPI processes per Node 20 96 3 48 12
IALGO DAV DAV DAV DAV DAV
CPU E5-2680V2 E5-4610 Nvidia K20x Opteron 6174 E5-2620
TDP (W) 115 95 235 115 115
count per Node 2 16 3 4 2
suplement (W) 50 50 150 75 50
perNode (W) 280 1570 855 535 280
total (W) 1960 1570 3420 2140 1120
efficiency (1/W) 4295 3922 6361 2453 3189

Notes:

- There is one MPI thread per pysical GPU. There are 3 GPU's per node in debrecen2. As well as there is no gamma only VASP version available, the performance would be even higher.

- AMD Opteron 6174 is considerably slower than recent intel CPU's. They have only 1/2 of the speed of an intel core. It is best to use NCORE = 6 flag. This will group the six cores of the real cpu dies. Opteron 6174 actually two six core cpu's inside the same chassis. Thus at szeged's nodes there are 8 hexa core cpu's, totaling 48 cores.

- On intel cpu's NCORE > 1 is important when there are more than 100 MPI threads. A value 2 might be enough to restore the linear scaling and avoid the bandwidth limitations.

- The GPU port is the most energy efficient, although the gain is not exceptional. Keep in mind that there is no gamma only version of VASP yet for GPUs! Also the TDP of intel cpu's are generally are utilized at maximum due the turbo boost feature. However in typical VASP runs gpu's only draw 100-150W approximetely!

- supplemental, miscellaneous power draw per node is only a heuristic approximate value.

Performance of GPU port