Wigner RCP Systems
Skynet
This is our main SGI cluster installed in May 2011. A late descendant of the venerable TPA batch systems and the CEDRUS environment.
SGI Manuals
Specification
Scheduler: Slurm
name | batch | fast | fast2 | fast3 | uv |
name | n001-n036 (only 6 remaning) | n101-n104 | n105-n124 | n125-n130 | uv |
Type | SGI Rackable | SGI Rackable | HPE cluster | Supermicro | SGI UV2000 |
# of nodes | 6 (cluster) | 4 (cluster) | 20 (cluster) | 12 (cluster) | 1 (6 compute blades) |
total number of cores | 48 "Westmere" cores | 48 "Sandy Bridge" cores | 640 "Skylake-server" cores | 768 "Icelake-server" cores | 72 "Sandy Bridge" cores |
# of CPUs / node | 2 | 2 | 2 | 2 | 12 (2 per compute blade) |
# of cores / CPU | 4 | 6 | 16 | 32 | 6 (72 total) |
Memory / node | 36 GB | 64 GB | 128 GB | 256 GB | 768 GB (128 GB per blade) |
Memory bandwidth / cpu | 34 GB/s (DDR3@1066MHz 4 channel) | 43 GB/s (DDR3@1333MHz 4 channel) | 128 GB/s (DDR4@2666MHz 6 channel) | 205 GB/s (DDR4@3200MHz 8 channel) | 43 GB/s (DDR3@1333MHz 4 channel) |
Memory bandwidth / core | 8.5 GB/s | 7.2 GB/s | 8 GB/s | 6.4 GB/s | 7.2 GB/s |
Memory / core | 4 GB | 5 GB | 4 GB | 4 GB | 10 GB |
CPU | Intel Xeon E5620 @ 2.40 GHz | Intel Xeon E5-2620 @ 2.00 GHz | Intel Xeon Gold 6130 @ 2.10 GHz | Intel Xeon Gold 6338 @ 2.00 GHz | Intel Xeon E5-4610 @ 2.40 GHz |
Architecture | x86_64 / intel64 / em64t little-endian | x86_64 / intel64 / em64t little-endian | x86_64 / intel64 / em64t little-endian | x86_64 / intel64 / em64t little-endian | x86_64 / intel64 / em64t little-endian |
Interconnect between nodes/blades | Infiniband QDR - fat tree | none | Infiniband FDR - fat tree | Infiniband HDR - fat tree | NUMAlink 6 - enchanted hypercube |
Interconnect bandwidth | ~40 Gb/sec (x4 link per node) | ~56 Gb/sec (x4 link per node) | ~200 Gb/sec (x4 link per node) | ~160Gb/sec (x12 link per blade) | |
Interconnect latency | ~1.2μs (thru an Infiniband switch) | ~0.7μs (thru an Infiniband switch) | ~0.6μs (thru an Infiniband switch) | ~0.1μs (1-2 hop connections) | |
Rmax | 0.5 TFlops | 0.8 TFlops | 36.2 TFlops | 37.9 TFlops | 1.8 TFlops |
# of GPUs | none | none | none | none | none |
Operating system | SUSE Linux Enterprise Server 12 SP3. All nodes have the exactly same diskless image. | SUSE Linux Enterprise Server 11 | |||
Storage disk | 55TB (your $HOME folder) | ||||
Purpose | Large scale MPI paralell jobs | Bigmem, and SMP only jobs | |||
comments | ssh into the machine locally to run jobs |
Assignments
Project ID | Priority | Description | Participiants |
---|---|---|---|
diamond | high | Nano diamonds | Gergő Thiering, Péter Udvarhelyi |
sic | normal | Silicon Carbide | Viktor Ivády, Bálint Somogyi, András Csóré |
Before Login
SSH Client Setup
At the first login you have to accept the host key. Please check the host fingerprint to avoid MITM attacks! On your client machine set the following in $HOME/.ssh/config
:
VisualHostKey yes
Genuine fingerprint of the system: File:Skynet fp.gif File:Fingerprint skynet.gif
Explicit Connection Check
In order to have an explicit connection check before every login set in the corresponding MID file:
mid_ssh_port_check="ping nmap"
The former will check the connection and the latter the port state. To check the connection without login:
sshmgr -c MID
After Login
Install Shell Framework
cd $HOME git clone git://github.com/thieringgergo/shf3.git
Source and setup the Shell Framework in $HOME/.profile
:
source $HOME/shf3/bin/shfrc # set the prompt shf3/ps1 SKYNET[\\h] # set framework features shf3/alias yes shf3/screen yes shf3/mc/color yes # screen workaround if shf3/is/screen ; then source "/etc/profile.d/modules.sh" fi # tab complete source $HOME/shf3/bin/complete
Parallel Compressor
Enable the parallel compressor for the framework:
cd $HOME echo "sys_zip_xg_gz=pigz" > shf3/lib/sys/zip/xg/config.$USER
Module Environment
ESZR is our unified computing environment. Enable ESZR system modules in $HOME/.profile
:
# common module use /site/eszr/mod/common module load eszr/site module load eszr/sys/wrcp/skynet module load eszr/sys/wrcp/skynet.mpt # site specific module use /site/eszr/mod/site module load sgi/2011 source ${ESZR_ROOT}/env/alias
Available module commands:
Command | Alias | Descritpion |
---|---|---|
module avail | mla | Show available modules |
module list | mls | List loaded modules |
module display | mdp | About the module |
module load/unload MODULE | mld/mlu MODULE | Load / unload MODULE |
ESZR
ESZR the unified directory structure and module environment. The environment can be check by:
eszr
Directories
Accessing the scratch:
cd $ESZR_SCRATCH
Accessing the storage:
cd $ESZR_DATA
Synchronizing a directory to the storage:
dirsync DIR $ESZR_DATA
Sharing a directory with your Unix group:
dirshare DIR
Backup
Remote backup via ssh ca be made by editing the MID:
sshmgr -e MID
Enter backup variables in the MID file:
mid_ssh_backup_dir="${ESZR_DATA}/MID/${mid_ssh_user}" mid_ssh_backup_src="LIST"
where LIST is a space separated list of directories in your remote home to be saved. Then run:
sshtx backup MID
Compressing
Always compress data files saving storage space. You can use the following compressing programs:
Mode | Compress File | Compress Directory | Extract File | Extract Directory |
---|---|---|---|---|
Serial | gzip -9 FILE | gzip -9 -r DIR | gzip -d FILE.gz | gzip -d -r DIR |
Parallel | pigz -9 FILE | pigz -9 -r DIR | pigz -d FILE.gz | pigz -d -r DIR |
SSH File Transfer
Copy files or directories to a MID:
sshtx put MID space separated list of files
Receive files or directories from a MID:
sshtx get MID space separated list of files
SSH Mount
Mount:
sshmount MID
Unmount:
sshumount MID
Scheduler
The job scheduler is Slurm. In Slurm each user is assigned with one or more account which you have to set in the queue file.
General information about the partitions:
sinfo -l
Partition | Allowed Groups | Purpose |
---|---|---|
devel | pdevel | Development (1 node) |
batch | pbatch | Production |
General information on jobs:
squeue -l or sjstat or qstat
Pending job priorities:
sprio -l
Slurm accounts and priorities:
sshare -l
Job accounting:
sacct
Detailed user statistics for the last month:
eszracct -u userid
Job Setup
Setup the Queue file and edit the parameters:
cd $HOME/shf3/mid/que cp templates/wrcp/skynet . mcedit skynet
Job template is in $HOME/shf3/mid/que/templates/wrcp/skynet.job
Interactive Jobs
There are two ways of running interactive multi-threaded jobs in the queue: i) array jobs (many single thread) ii) OpenMP jobs. To run an array job:
runarr queue:sockets:cores command
where queue
is the queue MID, sockets
is the number of CPU sockets in a node, cores
is the number of cores per socket. The command
can be a shell script as well. In the shell script/program you can ask the local rank from the environment:
RANK=$SLURM_LOCALID
OpenMP jobs are very similar:
runomp queue:sockets:cores command
Be aware that jobs are running in your shell like any other program but executed by the queue on a compute node. Do not run large and long workload as an interactive job!
Job Monitoring
Average node utilization of a job:
jobmon JOBID
Per node utilization:
pcpview -j JOBID
Check the last 3 columns of cpu:
us - user load sy - system load id - idle
The user load should be around the maximum and the other two around 0. Maximum utilization is 100.
Node utilization chart:
pcpview -c -j JOBID
Maximum utilization is 8 (# of cores per node).
Parallel Modes
The parallel mode is set by the MODE
key in the job file. MPI modes have an MPI selector for the corresponding MPI subsystem.
MODE | Description |
---|---|
omp | OpenMP only |
mpi/MPI | MPI-only with the selected MPI subsystem |
mpiomp/MPI | MPI-OMP hybris with the selected MPI subsystem |
where
MPI | Description |
---|---|
mpt | SGI MPT MPI. [Manual] |
impi | Intel MPI. [Manual] |
ompi | Open MPI [Manual] |
Resource Specification
Three types of parallel mode is supported: MPI-only, OMP-only, MPI-OMP hybrid. Shell framework will set environment variables for OMP and parameters for the mpirun according to the following table. Number of OMP threads can be override by THRDS
. In case of SGE you can also specify the total number of slots per node by SLTPN
. In case of ESZR by setting the resource keys to eszr
the framework will use the default ESZR system settings and you need to set only the MPI mode.
Parallel Mode | # of MPI procs | # of MPI procs per node | # of OMP threads per MPI proc. |
---|---|---|---|
MPI-only (mpi) | NODES × SCKTS × CORES | SCKTS × CORES | 1 |
OMP-only (omp) | -- | -- | SCKTS × CORES |
MPI-OMP hybrid (mpiomp) | NODES × SCKTS | SCKTS | CORES |
CPU binding
With the BIND
key you can bind processes to CPUs. Please refer to the manual of the MPI subsystem. Usually, it is enough to set the parameters in the table below.
MPI | MPI-only | MPI-OMP hybrid |
---|---|---|
SGI MPT (mpt) | dplace -s 1 | omplace -s 1 |
Intel MPI (impi) | -binding pin=yes | -binding pin=yes |
Open MPI (ompi) | -bind-to-core -bycore | -- |
Compiling with Intel Compilers
Fortran compiler for serial is ifort
, for MPI parallel is mpiifort
. Load the corresponding compiler and/or parallel environment module by:
mld MODULE
Parallel environments are mutually exclusive. You can check the compiler by:
which mpiifort
MODULE | Mode | Parallel Environment | Target Systems |
---|---|---|---|
sgi/mpt/2.04 | mpt | SGI MPT | skynet, debrecen, pecs |
intel/mpi/4.0.3.008 | impi | Intel MPI | skynet, szeged, budapest |
MODULE | Compiler | Target Systems | Recommended Options |
---|---|---|---|
intel/2011sp1u2 | Intel 2011 SP 1 Update 2 | szeged, budapest | -O2 -xSSE2 -ip -vec-report0 |
intel/2011sp1u2 | Intel 2011 SP 1 Update 2 | skynet, debrecen, pecs | -O2 -xSSE4.2 -ip -vec-report0 -override_limits |
Static Linking Intel 2011
Makefile parameters for some the most usual cases or you can use the Intel link advisor.
Link parameters for Intel MKL Lapack:
MKL_PATH = $(MKLROOT)/lib/intel64 IFC_PATH = $(INTEL_IFORT_HOME)/lib/intel64 # link flags LDFLAGS = $(MKL_PATH)/libmkl_lapack95_lp64.a \ -Wl,--start-group \ $(MKL_PATH)/libmkl_intel_lp64.a \ $(MKL_PATH)/libmkl_intel_thread.a \ $(MKL_PATH)/libmkl_core.a \ -Wl,--end-group \ $(IFC_PATH)/libiomp5.a -lpthread
Link parameters for Intel MKL Scalapack with SGI MPT:
MKL_PATH = $(MKLROOT)/lib/intel64 IFC_PATH = $(INTEL_IFORT_HOME)/lib/intel64 # link flags LDFLAGS = $(MKL_PATH)/libmkl_scalapack_lp64.a \ $(MKL_PATH)/libmkl_blacs_sgimpt_lp64.a \ $(MKL_PATH)/libmkl_lapack95_lp64.a \ -Wl,--start-group \ $(MKL_PATH)/libmkl_intel_lp64.a \ $(MKL_PATH)/libmkl_intel_thread.a \ $(MKL_PATH)/libmkl_core.a \ -Wl,--end-group \ $(IFC_PATH)/libiomp5.a -lpthread
Compile and link parameters for Intel MKL FFT:
FFTW_PATH = $(INTEL_MKL_HOME) FFTW_INC = $(FFTW_PATH)/include/fftw FFLAGS = -I$(FFTW_INC) LDFLAGS = $(FFTW_PATH)/lib/intel64/libfftw3xf_intel.a
Profiling with Amplifier
To profile an OMP program set in the job file: PROF="amplxe-cl -collect hotspots"
To check the collected data:
amplxe-cl -report hotspots -r r000hs
Static Linking Intel 10.1
Makefile parameters for some the most usual cases.
Link parameters for Intel MKL Lapack:
MKL_PATH = $(INTEL_MKL_HOME)/lib/em64t IFC_PATH = $(INTEL_IFORT_HOME)/lib # link flags LDFLAGS = $(MKL_PATH)/libmkl_lapack95_lp64.a \ -Wl,--start-group \ $(MKL_PATH)/libmkl_intel_lp64.a \ $(MKL_PATH)/libmkl_intel_thread.a \ $(MKL_PATH)/libmkl_core.a \ -Wl,--end-group \ $(IFC_PATH)/libiomp5.a -lpthread
Link parameters for Intel MKL Scalapack with SGI MPT:
MKL_PATH = $(INTEL_MKL_HOME)/lib/em64t IFC_PATH = $(INTEL_IFORT_HOME)/lib # link flags LDFLAGS = $(MKL_PATH)/libmkl_scalapack_lp64.a \ $(MKL_PATH)/libmkl_blacs_sgimpt_lp64.a \ $(MKL_PATH)/libmkl_lapack95_lp64.a \ -Wl,--start-group \ $(MKL_PATH)/libmkl_intel_lp64.a \ $(MKL_PATH)/libmkl_intel_thread.a \ $(MKL_PATH)/libmkl_core.a \ -Wl,--end-group \ $(IFC_PATH)/libiomp5.a -lpthread
Compile and link parameters for Intel MKL FFT:
FFTW_PATH = $(INTEL_MKL_HOME) FFTW_INC = $(FFTW_PATH)/include/fftw FFLAGS = -I$(FFTW_INC) LDFLAGS = $(FFTW_PATH)/lib/em64t/libfftw3xf_intel.a
Visualization
Application | Required Modules | Shell Manager | Description |
---|---|---|---|
VMD | cuda/4.1.28 vmd/1.9.1 | vmdmgr | General visualizer |
Statically linked intel MPI VASP
You can use statically linked VASP version (4.3.1 and up) on skynet, or any machine (debrecen, budapest etc.) as long as you have an intel MPI runtime available.