NIIF Systems

From Nano Group Budapest
Revision as of 09:13, 14 January 2022 by 13626(AT)wigner.mta.hu (talk | contribs) (→‎How to set up NIIF computers)
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)
Jump to navigation Jump to search

NIIF systems are maintained by NIIF.

There are useful guides about it in their wikipedia, but mostly in hungarian.

Legacy information: Computing Service Legacy

How to set up NIIF computers

The following links/pages contain howtos about the installation of NIIF supercomputers with our ab-initio programs. (VASP, TURBOMOLE, etc.)

Before you begin, make sure you can log in to skynet, and your shf3 is installed on skynet.

Debrecen2 and Debrecen3

Mostly available. Reliable, but comes with long queues.

Debrecen2 contains Nvidia K20x, and K40x GPUs Contains 16 Ivy bridge cores per node.

Debrecen3 contains Xeon phi accelators which we cannot use. We can only use the 24 Haswell cores per node. Removed

Debrecen - Allocating many nodes 4+ is unreliable

Mostly available and full with long queues. Since 2019-2020 the nodes are break down often. Not advised to start a job with large nodes involved.

The infiniband connection can be faulty if you try to allocate more than 2-3 nodes. Please check your multi-note calculations very often. Ps.: An openMPI implementation might come without bugs, but its still on the toDO list.


Contains 12 Westmere cores per node.

Budapest - REMOVED

Offline since 01/31/2018 Presumably a brick now? Mostly available and empty.

Contains 24 AMD K10 (Magny-Cours) cores per node. Keep in mind that a K10 core offers 50% performance that of an intel Haswell core in debrecen3.

It is wrapped up from 4 six-core physical CPU's with HyperTransport. You need to set to set NCORE=6 for VASP here, otherwise your job won't scale, since the interconnections between the nodes and even the cpu dies are rather slow.

Szeged - ONLINE

New software environment, with 2020's kernels etc.

Contains 48 AMD K10 (Magny-Cours) cores per node.

Works well with 1,2,4,8 nodes with 48,96,192,386 MPI threads with VASP 5.4.1 or 6.2.0. It is wrapped up from 8 six-core physical CPU's with HyperTransport. You need to set to set NCORE=6 for VASP here, otherwise your job won't scale, since the interconnections between the nodes and even the cpu dies are rather slow. On single node calculations you may use NCORE = 2,3. NCORE=12 wont work well because it would try to connect two 6-core CPU's.

Budapest2

Mostly available and empty. Not a big machine, but reliable. Contains Xeon phi accelators which we cannot use. We can only use the 20 ivy bridge cores per node. Use NCORE=5 here. (2,10 may also work)

Miskolc

Mostly available, and offers 320 ivy bridge cores with the best shared memory interconnection with them: numalink. Use impi build here also. Use NCORE=8 or 4 for maximum scaling.

Pecs - REMOVED

Offline since 03/30/2016 Presumably a brick now?

Pecs computer was always below ~20% uptime since its installation. Never been really useful than a heating brick due the its utterly low reliability.

VASP compilations

xxxxxxxx.impi

Statically linked compilation, with intel compilers and MKL

You need an installed intel MPI (impi) runtime to use this compilation.

Usable on: skynet, debrecen, budapest, szeged, budapest2, debrecen3

xxxxxxxx.impi.7.0cuda

Dynamically linked compilation, against intel compilers, MKL, and nvidia CUDA

This GPU port exculsively for nvidia GPU's at debrecen2

Keep in mind that there are no gamma only version available yet!

xxxxxxxx.ompi

Statically linked compilation, with intel compilers and MKL

You need an installed openMPI (ompi) runtime to use this compilation.

Not extensively tested, however it runs on budapest2

xxxxxxxx.mpt

Legacy builds with SGI's MPI implementation. Full with bugs, crashes nodes, no support till '2015. NOT TO USE!

Statically linked compilation, with intel compilers and MKL

You need an installed proprietary SGI machine, with MPT runtime to use this compilation.

Runs on, skynet, pecs, miskolc, debrecen

Please do not use this on debrecen, it poisons the nodes with memory consuming uninterruptible orphan processes, if your job tries to allocate more memory than allowed or any receives any other interrupt signal from slurm! (I dont know if skynet, pecs, miskolc are affected from this.)