Scaling of VASP
HSE06 hybrid calculations
Test benchmark case: VASP 5.4.1, gamma only (except of GPU is vasp_std)
ENCUT = 600.00 eV (ENAUG = 1200.00 eV)
PRECFOCK=Fast
NAME | budapest2 | uv2 | debrecen2 | szeged | fast |
---|---|---|---|---|---|
Perf. per Node (1/s) | 120.25 | 615.78 | 543.88 | 131.23 | 89.29 |
Perf. per Core (1/s) | 6.01 | 6.41 | 181.29 | 2.73 | 7.44 |
Total Perf. (1/s) 0 | 0.84 | 0.62 | 2.18 | 0.52 | 0.36 |
time of one SCF iteration (s) | 1188 | 1624 | 460 | 1905 | 2800 |
NCORE | 5 | 1 | 1 | 6 | 1 |
Total Cores | 140 | 96 | 12 | 192 | 48 |
Node count | 7 | 1 | 4 | 4 | 4 |
MPI processes per Node | 20 | 96 | 3 | 48 | 12 |
IALGO | DAV | DAV | DAV | DAV | DAV |
CPU | E5-2680V2 | E5-4610 | Nvidia K20x | Opteron 6174 | E5-2620 |
Wattage (W) | 115 | 95 | 235 | 115 | 115 |
count per Node | 2 | 16 | 3 | 4 | 2 |
suplement (W) | 50 | 50 | 150 | 75 | 50 |
perNode (W) | 280 | 1570 | 855 | 535 | 280 |
total (W) | 1960 | 1570 | 3420 | 2140 | 1120 |
efficiency (1/W) | 4295 | 3922 | 6361 | 2453 | 3189 |
Notes:
- There is one MPI thread per pysical GPU. There are 3 GPU's per node in debrecen2. As well as there is no gamma only VASP version available, the performance would be even higher.
- AMD Opteron 6174 is considerably slower than recent intel CPU's. They have only 1/2 of the speed of an intel core. It is best to use NCORE = 6 flag. This will group the six cores of the real cpu dies. Opteron 6174 actually two six core cpu's inside the same chassis. Thus at szeged's nodes there are 8 hexa core cpu's, totaling 48 cores.
- On intel cpu's NCORE > 1 is important when more than 100 MPI threads. A value 2 might be enough to restore the linear scaling and avoid the bandwidth limitations.