Scaling of VASP
HSE06 hybrid calculations
Test benchmark case: VASP 5.4.1, gamma only (except of GPU is vasp_std)
ENCUT = 600.00 eV (ENAUG = 1200.00 eV)
PRECFOCK=Fast
NAME | budapest2 | uv2 | debrecen2 | szeged | fast |
---|---|---|---|---|---|
Perf. per Node (1/s) | 120.25 | 615.78 | 543.88 | 131.23 | 89.29 |
Perf. per Core (1/s) | 6.01 | 6.41 | 181.29 | 2.73 | 7.44 |
Total Perf. (1/s) | 0.84 | 0.62 | 2.18 | 0.52 | 0.36 |
time of one SCF iteration (s) | 1188 | 1624 | 460 | 1905 | 2800 |
NCORE | 5 | 1 | 1 | 6 | 1 |
Total Cores | 140 | 96 | 12 | 192 | 48 |
Node count | 7 | 1 | 4 | 4 | 4 |
MPI processes per Node | 20 | 96 | 3 | 48 | 12 |
IALGO | DAV | DAV | DAV | DAV | DAV |
CPU | E5-2680V2 | E5-4610 | Nvidia K20x | Opteron 6174 | E5-2620 |
Wattage (W) | 115 | 95 | 235 | 115 | 115 |
count per Node | 2 | 16 | 3 | 4 | 2 |
suplement (W) | 50 | 50 | 150 | 75 | 50 |
perNode (W) | 280 | 1570 | 855 | 535 | 280 |
total (W) | 1960 | 1570 | 3420 | 2140 | 1120 |
efficiency (1/W) | 4295 | 3922 | 6361 | 2453 | 3189 |
Notes:
- There is one MPI thread per pysical GPU. There are 3 GPU's per node in debrecen2. As well as there is no gamma only VASP version available, the performance would be even higher.
- AMD Opteron 6174 is considerably slower than recent intel CPU's. They have only 1/2 of the speed of an intel core. It is best to use NCORE = 6 flag. This will group the six cores of the real cpu dies. Opteron 6174 actually two six core cpu's inside the same chassis. Thus at szeged's nodes there are 8 hexa core cpu's, totaling 48 cores.
- On intel cpu's NCORE > 1 is important when more than 100 MPI threads. A value 2 might be enough to restore the linear scaling and avoid the bandwidth limitations.
- The GPU port is the most energy efficient, although the gain is not exceptional. Keep in mind that there is no gamma only version of VASP yet for GPUs! Also the TDP of intel cpu's are generally are utilized at maximum due the turbo boost feature. However in typical VASP run's gpu's only draw 100-150W approximetely!
- summpemental, miscellaneous power draw per node is only a heuristic approximate value.