Wigner RCP Systems

From Nano Group Budapest
Jump to navigation Jump to search

Skynet

This is our main SGI cluster installed in May 2011. A late descendant of the venerable TPA batch systems and the CEDRUS environment.

SGI Manuals

Specification

Scheduler: Slurm

name batch fast fast2 fast3 uv
name n001-n036 (only 6 remaning) n101-n104 n105-n124 n125-n130 uv
Type SGI Rackable SGI Rackable HPE cluster Supermicro SGI UV2000
# of nodes 6 (cluster) 4 (cluster) 20 (cluster) 12 (cluster) 1 (6 compute blades)
total number of cores 48 "Westmere" cores 48 "Sandy Bridge" cores 640 "Skylake-server" cores 768 "Icelake-server" cores 72 "Sandy Bridge" cores
# of CPUs / node 2 2 2 2 12 (2 per compute blade)
# of cores / CPU 4 6 16 32 6 (72 total)
Memory / node 36 GB 64 GB 128 GB 256 GB 768 GB (128 GB per blade)
Memory bandwidth / cpu 34 GB/s (DDR3@1066MHz 4 channel) 43 GB/s (DDR3@1333MHz 4 channel) 128 GB/s (DDR4@2666MHz 6 channel) 205 GB/s (DDR4@3200MHz 8 channel) 43 GB/s (DDR3@1333MHz 4 channel)
Memory bandwidth / core 8.5 GB/s 7.2 GB/s 8 GB/s 6.4 GB/s 7.2 GB/s
Memory / core 4 GB 5 GB 4 GB 4 GB 10 GB
CPU Intel Xeon E5620 @ 2.40 GHz Intel Xeon E5-2620 @ 2.00 GHz Intel Xeon Gold 6130 @ 2.10 GHz Intel Xeon Gold 6338 @ 2.00 GHz Intel Xeon E5-4610 @ 2.40 GHz
Architecture x86_64 / intel64 / em64t little-endian x86_64 / intel64 / em64t little-endian x86_64 / intel64 / em64t little-endian x86_64 / intel64 / em64t little-endian x86_64 / intel64 / em64t little-endian
Interconnect between nodes/blades Infiniband QDR - fat tree none Infiniband FDR - fat tree Infiniband HDR - fat tree NUMAlink 6 - enchanted hypercube
Interconnect bandwidth ~40 Gb/sec (x4 link per node) ~56 Gb/sec (x4 link per node) ~200 Gb/sec (x4 link per node) ~160Gb/sec (x12 link per blade)
Interconnect latency ~1.2μs (thru an Infiniband switch) ~0.7μs (thru an Infiniband switch) ~0.6μs (thru an Infiniband switch) ~0.1μs (1-2 hop connections)
Rmax 0.5 TFlops 0.8 TFlops 36.2 TFlops 37.9 TFlops 1.8 TFlops
# of GPUs none none none none none
Operating system SUSE Linux Enterprise Server 12 SP3. All nodes have the exactly same diskless image. SUSE Linux Enterprise Server 11
Storage disk 55TB (your $HOME folder)
Purpose Large scale MPI paralell jobs Bigmem, and SMP only jobs
comments ssh into the machine locally to run jobs

Assignments

Project ID Priority Description Participiants
diamond high Nano diamonds Gergő Thiering, Péter Udvarhelyi
sic normal Silicon Carbide Viktor Ivády, Bálint Somogyi, András Csóré

Before Login

SSH Client Setup

At the first login you have to accept the host key. Please check the host fingerprint to avoid MITM attacks! On your client machine set the following in $HOME/.ssh/config:

 VisualHostKey yes

Genuine fingerprint of the system: File:Skynet fp.gif File:Fingerprint skynet.gif

Explicit Connection Check

In order to have an explicit connection check before every login set in the corresponding MID file:

 mid_ssh_port_check="ping nmap"

The former will check the connection and the latter the port state. To check the connection without login:

 sshmgr -c MID

After Login

Install Shell Framework

 cd $HOME
 git clone git://github.com/thieringgergo/shf3.git

Source and setup the Shell Framework in $HOME/.profile:

 source $HOME/shf3/bin/shfrc
 # set the prompt
 shf3/ps1 SKYNET[\\h]
 # set framework features
 shf3/alias yes
 shf3/screen yes
 shf3/mc/color yes
 # screen workaround
 if shf3/is/screen ; then
   source "/etc/profile.d/modules.sh"
 fi
 # tab complete
 source $HOME/shf3/bin/complete

Parallel Compressor

Enable the parallel compressor for the framework:

 cd $HOME
 echo "sys_zip_xg_gz=pigz" > shf3/lib/sys/zip/xg/config.$USER

Module Environment

ESZR is our unified computing environment. Enable ESZR system modules in $HOME/.profile :

 # common
 module use /site/eszr/mod/common
 module load eszr/site
 module load eszr/sys/wrcp/skynet
 module load eszr/sys/wrcp/skynet.mpt
 # site specific
 module use /site/eszr/mod/site
 module load sgi/2011
 source ${ESZR_ROOT}/env/alias

Available module commands:

Command Alias Descritpion
module avail mla Show available modules
module list mls List loaded modules
module display mdp About the module
module load/unload MODULE mld/mlu MODULE Load / unload MODULE

ESZR

ESZR the unified directory structure and module environment. The environment can be check by:

 eszr

Directories

Accessing the scratch:

 cd $ESZR_SCRATCH

Accessing the storage:

 cd $ESZR_DATA

Synchronizing a directory to the storage:

 dirsync DIR $ESZR_DATA

Sharing a directory with your Unix group:

 dirshare DIR

Backup

Remote backup via ssh ca be made by editing the MID:

 sshmgr -e MID

Enter backup variables in the MID file:

 mid_ssh_backup_dir="${ESZR_DATA}/MID/${mid_ssh_user}"
 mid_ssh_backup_src="LIST"

where LIST is a space separated list of directories in your remote home to be saved. Then run:

 sshtx backup MID

Compressing

Always compress data files saving storage space. You can use the following compressing programs:

Mode Compress File Compress Directory Extract File Extract Directory
Serial gzip -9 FILE gzip -9 -r DIR gzip -d FILE.gz gzip -d -r DIR
Parallel pigz -9 FILE pigz -9 -r DIR pigz -d FILE.gz pigz -d -r DIR

SSH File Transfer

Copy files or directories to a MID:

 sshtx put MID space separated list of files

Receive files or directories from a MID:

 sshtx get MID space separated list of files

SSH Mount

Mount:

 sshmount MID

Unmount:

 sshumount MID

Scheduler

The job scheduler is Slurm. In Slurm each user is assigned with one or more account which you have to set in the queue file.

General information about the partitions:

 sinfo -l
Partition Allowed Groups Purpose
devel pdevel Development (1 node)
batch pbatch Production

General information on jobs:

 squeue -l
 or
 sjstat
 or
 qstat

Pending job priorities:

 sprio -l

Slurm accounts and priorities:

 sshare -l

Job accounting:

 sacct

Detailed user statistics for the last month:

 eszracct -u userid

Job Setup

Setup the Queue file and edit the parameters:

 cd $HOME/shf3/mid/que
 cp templates/wrcp/skynet .
 mcedit skynet

Job template is in $HOME/shf3/mid/que/templates/wrcp/skynet.job

Interactive Jobs

There are two ways of running interactive multi-threaded jobs in the queue: i) array jobs (many single thread) ii) OpenMP jobs. To run an array job:

 runarr queue:sockets:cores command

where queue is the queue MID, sockets is the number of CPU sockets in a node, cores is the number of cores per socket. The command can be a shell script as well. In the shell script/program you can ask the local rank from the environment:

 RANK=$SLURM_LOCALID

OpenMP jobs are very similar:

 runomp queue:sockets:cores command

Be aware that jobs are running in your shell like any other program but executed by the queue on a compute node. Do not run large and long workload as an interactive job!

Job Monitoring

Average node utilization of a job:

 jobmon JOBID

Per node utilization:

 pcpview -j JOBID

Check the last 3 columns of cpu:

 us - user load
 sy - system load
 id - idle

The user load should be around the maximum and the other two around 0. Maximum utilization is 100.

Node utilization chart:

 pcpview -c -j JOBID

Maximum utilization is 8 (# of cores per node).

Parallel Modes

The parallel mode is set by the MODE key in the job file. MPI modes have an MPI selector for the corresponding MPI subsystem.

MODE Description
omp OpenMP only
mpi/MPI MPI-only with the selected MPI subsystem
mpiomp/MPI MPI-OMP hybris with the selected MPI subsystem

where

MPI Description
mpt SGI MPT MPI. [Manual]
impi Intel MPI. [Manual]
ompi Open MPI [Manual]

Resource Specification

Three types of parallel mode is supported: MPI-only, OMP-only, MPI-OMP hybrid. Shell framework will set environment variables for OMP and parameters for the mpirun according to the following table. Number of OMP threads can be override by THRDS . In case of SGE you can also specify the total number of slots per node by SLTPN . In case of ESZR by setting the resource keys to eszr the framework will use the default ESZR system settings and you need to set only the MPI mode.

Parallel Mode # of MPI procs # of MPI procs per node # of OMP threads per MPI proc.
MPI-only (mpi) NODES × SCKTS × CORES SCKTS × CORES 1
OMP-only (omp) -- -- SCKTS × CORES
MPI-OMP hybrid (mpiomp) NODES × SCKTS SCKTS CORES

CPU binding

With the BIND key you can bind processes to CPUs. Please refer to the manual of the MPI subsystem. Usually, it is enough to set the parameters in the table below.

MPI MPI-only MPI-OMP hybrid
SGI MPT (mpt) dplace -s 1 omplace -s 1
Intel MPI (impi) -binding pin=yes -binding pin=yes
Open MPI (ompi) -bind-to-core -bycore --

Compiling with Intel Compilers

Fortran compiler for serial is ifort, for MPI parallel is mpiifort . Load the corresponding compiler and/or parallel environment module by:

 mld MODULE

Parallel environments are mutually exclusive. You can check the compiler by:

 which mpiifort
MODULE Mode Parallel Environment Target Systems
sgi/mpt/2.04 mpt SGI MPT skynet, debrecen, pecs
intel/mpi/4.0.3.008 impi Intel MPI skynet, szeged, budapest
MODULE Compiler Target Systems Recommended Options
intel/2011sp1u2 Intel 2011 SP 1 Update 2 szeged, budapest -O2 -xSSE2 -ip -vec-report0
intel/2011sp1u2 Intel 2011 SP 1 Update 2 skynet, debrecen, pecs -O2 -xSSE4.2 -ip -vec-report0 -override_limits

Static Linking Intel 2011

Makefile parameters for some the most usual cases or you can use the Intel link advisor.

Link parameters for Intel MKL Lapack:

 MKL_PATH = $(MKLROOT)/lib/intel64
 IFC_PATH = $(INTEL_IFORT_HOME)/lib/intel64
 # link flags
 LDFLAGS = $(MKL_PATH)/libmkl_lapack95_lp64.a \
 -Wl,--start-group \
 $(MKL_PATH)/libmkl_intel_lp64.a \
 $(MKL_PATH)/libmkl_intel_thread.a \
 $(MKL_PATH)/libmkl_core.a \
 -Wl,--end-group \
 $(IFC_PATH)/libiomp5.a -lpthread

Link parameters for Intel MKL Scalapack with SGI MPT:

 MKL_PATH = $(MKLROOT)/lib/intel64
 IFC_PATH = $(INTEL_IFORT_HOME)/lib/intel64
 # link flags
 LDFLAGS = $(MKL_PATH)/libmkl_scalapack_lp64.a \
 $(MKL_PATH)/libmkl_blacs_sgimpt_lp64.a \
 $(MKL_PATH)/libmkl_lapack95_lp64.a \
 -Wl,--start-group \
 $(MKL_PATH)/libmkl_intel_lp64.a \
 $(MKL_PATH)/libmkl_intel_thread.a \
 $(MKL_PATH)/libmkl_core.a \
 -Wl,--end-group \
 $(IFC_PATH)/libiomp5.a -lpthread

Compile and link parameters for Intel MKL FFT:

 FFTW_PATH = $(INTEL_MKL_HOME)
 FFTW_INC  = $(FFTW_PATH)/include/fftw
 FFLAGS = -I$(FFTW_INC)
 LDFLAGS = $(FFTW_PATH)/lib/intel64/libfftw3xf_intel.a

Profiling with Amplifier

To profile an OMP program set in the job file:
 PROF="amplxe-cl -collect hotspots"

To check the collected data:

 amplxe-cl -report hotspots -r r000hs

Static Linking Intel 10.1

Makefile parameters for some the most usual cases.

Link parameters for Intel MKL Lapack:

 MKL_PATH = $(INTEL_MKL_HOME)/lib/em64t
 IFC_PATH = $(INTEL_IFORT_HOME)/lib
 # link flags
 LDFLAGS = $(MKL_PATH)/libmkl_lapack95_lp64.a \
 -Wl,--start-group \
 $(MKL_PATH)/libmkl_intel_lp64.a \
 $(MKL_PATH)/libmkl_intel_thread.a \
 $(MKL_PATH)/libmkl_core.a \
 -Wl,--end-group \
 $(IFC_PATH)/libiomp5.a -lpthread

Link parameters for Intel MKL Scalapack with SGI MPT:

 MKL_PATH = $(INTEL_MKL_HOME)/lib/em64t
 IFC_PATH = $(INTEL_IFORT_HOME)/lib
 # link flags
 LDFLAGS = $(MKL_PATH)/libmkl_scalapack_lp64.a \
 $(MKL_PATH)/libmkl_blacs_sgimpt_lp64.a \
 $(MKL_PATH)/libmkl_lapack95_lp64.a \
 -Wl,--start-group \
 $(MKL_PATH)/libmkl_intel_lp64.a \
 $(MKL_PATH)/libmkl_intel_thread.a \
 $(MKL_PATH)/libmkl_core.a \
 -Wl,--end-group \
 $(IFC_PATH)/libiomp5.a -lpthread

Compile and link parameters for Intel MKL FFT:

 FFTW_PATH = $(INTEL_MKL_HOME)
 FFTW_INC  = $(FFTW_PATH)/include/fftw
 FFLAGS = -I$(FFTW_INC)
 LDFLAGS = $(FFTW_PATH)/lib/em64t/libfftw3xf_intel.a

Visualization

Application Required Modules Shell Manager Description
VMD cuda/4.1.28 vmd/1.9.1 vmdmgr General visualizer

Statically linked intel MPI VASP

You can use statically linked VASP version (4.3.1 and up) on skynet, or any machine (debrecen, budapest etc.) as long as you have an intel MPI runtime available.