ଜାତୀୟ ବିଜ୍ଞାନ ଶିକ୍ଷା ଏବଂ ଗବେଷଣା ପ୍ରତିଷ୍ଠାନ
ପରମାଣୁ ଶକ୍ତି ବିଭାଗ, ଭାରତ ସରକାରଙ୍କ ଏକ ସ୍ବୟଂଶାସିତ ପ୍ରତିଷ୍ଠାନ

राष्ट्रीय विज्ञान शिक्षा एवं अनुसंधान संस्थान
परमाणु ऊर्जा विभाग, भारत सरकार का एक स्वयंशासित संस्थान

National Institute of Science Education and Research
AN AUTONOMOUS INSTITUTE UNDER DAE, GOVT. OF INDIA

 
Cluster (High-Perf.) Computing
  • KALINGA Cluster
    • Applications Installed
    • Scheduler
    • Submitting Jobs
    • Publications
    • User Manual
  • HARTREE Cluster
    • Applications Installed
    • Scheduler
    • Submitting Jobs
    • Publications
    • User Manual
  • Do's and Don'ts for HPC usage
Kalinga
Kalinga

A HPC of 64 new compute nodes, 32 old compute nodes and 01 GPU node with 04 nos. of NVIDIA Tesla K40c has been named KALINGA for the School of Physical Sciences. It has 205 TB of storage with Luster Parallel File System and interconnected with 216-Port EDR 100Gb/s Infiniband Smart Director Switch populated with 108 ports.

Old compute nodes: Based on Intel® Xeon® CPU E5-2698 v3 @ 2.30GHz. Two sockets for each node (32-cores per node) with Memory: 512 GB (8 * 64 GB) per node.

New compute nodes: Based on Intel® Xeon® Gold 6248 CPU @ 2.50GHz. Two sockets for each node (40-cores per node) with Memory: 192 GB (12 * 16 GB) per node.

This cluster has been listed in the Jan-2021 top-supercomputer list in India maintained by C-DAC Bangalore. https://topsc.cdacb.in/filterdetailstry?page=30&slug=January2021
  • New Master node:
    • Processor: 2 x 20 cores (Intel® Xeon® Gold 6248 CPU @ 2.50GHz)
    • Memory: 192 GB (12 * 16 GB)
  • Old Master node:
    • Processor: 2 x 16 cores (Intel(R) Xeon(R) CPU E5-2698 v3 @ 2.30GHz)
    • Memory: 512 GB (8 * 64 GB)
  • New Computer nodes:(64 nodes)
    • Processor: 2 x 20 cores (Intel® Xeon® Gold 6248 CPU @ 2.50GHz)
    • Memory: 192 GB (12 * 16 GB)
  • Old Computer nodes:(32 nodes)
    • Processor: 2 x 16 cores (Intel(R) Xeon(R) CPU E5-2698 v3 @ 2.30GHz)
    • Memory: 512 GB (8 * 64 GB)
  • GPU node (NVIDIA) (1 node)
    • NVIDIA Tesla K40c (4 cards)
  • Storage Nodes MDS 1 & OSS1 (2 nodes)
  • Interconnect:
    • Primary Interconnect: 216-Port EDR 100Gb/s Infiniband Smart Director Switch populated with 108 ports (CS7520).
    • Secondary Interconnect: Ethernet (SSE-G2252) - 48 Port Gigabit Switch
  • Storage: 205TB with Luster Parallel File System.
  • Resource Manager: SLURM with fair-share scheduling.
  • Monitoring Tool: Ganglia
  • KVM Switch and KVM Console.
List of Applications installed in KALINGA
  • codes/abinit/8.8.2
  • codes/atomate/0.9.4
  • codes/BerkeleyGW/1.2.0-newnodes
  • codes/BerkeleyGW/1.2.0-oldnodes
  • codes/BerkeleyGW/2.1
  • codes/BerkeleyGW/2.1-newnodes
  • codes/BerkeleyGW/2.1-oldnodes
  • codes/BerkeleyGW/updated-2.1
  • codes/dftplus/20.1
  • codes/element_spdb
  • codes/exciting/nitrogen
  • codes/fluer/MAX-R4
  • codes/kkr/3.1
  • codes/nwchem/6.8
  • codes/nwchem/7.0.1
  • codes/olam/5.3
  • codes/psi4/1.1
  • codes/pyscf/1.7.5
  • codes/qe/5.4-intel2017u3
  • codes/qe/6.4.1-newnodes
  • codes/qe/6.4.1-oldnodes
  • codes/qe/6.5
  • codes/qe/6.6
  • codes/qe/6.6-newnodes
  • codes/qe/6.6-oldnodes
  • codes/sdpb
  • codes/siesta/4.1
  • codes/transiesta/4.1
  • codes/vasp/5.4.4
  • codes/vasp/5.4.4-wannier
  • codes/vasp/5.4.4-wannier-new
  • codes/wannier90/2.1.0
  • codes/westpy/4.2.0
  • codes/wps/4.0
  • codes/wrf/4.0
  • codes/yambo/4.5.2
  • codes/yambo/4.5.2-new
  • gpaw-setups/0.9.11271
  • codes/cmake/3.0.0
  • codes/cmake/3.12.4
  • codes/cmake/3.17.5
Availability of different compilers on KALINGA

Different compilers are available on HPC as modules. Here is the list of some of them.

GNU Compilers for Serial Code

  • compilers/gcc/7.5.0
  • compilers/gcc/9.2.0

Compilers for Parallel Code (OpenMPI/MPICH/MVAPICH)

  • compilers/mpich/3.2
  • compilers/mpich/3.3.2
  • compilers/mvapich2/2.2
  • compilers/openmpi/3.1.0
  • compilers/openmpi/4.0.5
  • compilers/wrf/mpich/3.3.2

Intel Compilers

  • compilers/intel/parallel_studio_xe_2017.3.191
  • compilers/intel/parallel_studio_xe_2018.3.051
  • compilers/intel/parallel_studio_xe_2020.4.912

Scientific Libraries

  • libs/boost/1.76.0-gcc-7.5
  • libs/boost/boost_1_76_0
  • libs/fftw/2.1.5
  • libs/fftw/3.3.8
  • libs/gmp/6.2.0
  • libs/gmp/6.2.1
  • libs/gsl/2.6
  • libs/hdf5/1.12.0
  • libs/hdf5/1.12.0-mpi
  • libs/hdf5/1.8.18
  • libs/jasper/1.900.1
  • libs/libcint/4.0.3
  • libs/libpng/1.2.59
  • libs/libxc/2.1.
  • libs/libxc/5.0.
  • libs/libxml2/2.7.
  • libs/libxsmm/1.1
  • libs/ncl-ncarg/6.6.
  • libs/netcdf/4.3.3.
  • libs/netcdf-c/4.7.
  • libs/netcdf-fortran/4.5.
  • libs/Q
  • libs/rapidjso
  • libs/scalapack/2.1.
  • libs/szip/2.2.
  • libs/wrf/hdf5/1.12.
  • libs/wrf/jasper/1.900.
  • libs/wrf/netcdf/4.3.3.
  • libs/wrf/szip/2.2.
  • libs/wrf/zlib/1.2.1
  • libs/zlib/1.2.1

Utilities

  • utils/automake/1.16
  • utils/cuda/11.0
  • utils/libtools/2.4.6
  • utils/make/4.3
  • utils/singularity/3.7.3
  • compilers/anaconda3/2020.7
Scheduler in KALINGA
  • -Not listed-
Submitting Jobs in KALINGA
  • -Not listed-

Unable to display PDF file.

Hartree
Hartree

New HPC cluster under school of chemical science has been installed and is named as Hartree cluster. Hartree cluster is inside the smart row solution.
SmartRow by itself a mini-datacenter which contains almost all the subsystem of Datacenter.

This cluster has been listed in the July-2018 top-supercomputer list in India maintained by C-DAC Bangalore. https://topsc.cdacb.in/filterdetailstry?page=30&slug=July2018

Subsystems in the SmartRow:

  • Electrical (Row Power, UPS Power) - 150 KVA Modular Frame with 90KVA Loaded with 30 minutes backup.
  • HVAC (CRV - Computer room vertical) - 2 nos. 35kW In-Row PAC.
  • Fire Detection system (Zone 1 & Zone 2) - Two zones
  • Fire suppression system (NOVEC 1230 suppression system)
  • Access control system (For Door opening)
  • Rodent repellent system - 3 nos. Standalone devices
  • CCTV
  • Monitoring with Rack Data Unit (RDU)

Hartree Cluster:

Hartree cluster is a HPC of 40 compute nodes and 2 master nodes in HA configuration.
It is based on Intel(R) Xeon(R) CPU E5-2697A v4 @ 2.60GHz. Two sockets for each node (32-cores per node) on Supermicro SuperServer SYS-6028R-TR with 64 GB of RAM per node.

HPL Benchmarking Results (for 36 nodes - 1152 cores):

  • Rmax of 38.81 Terra-Flops.
  • Rpeak of 53.24 Teraflops

The nodes are connected by Mellanox 108 port FDR chassis switch (MSX6506-NR) that can provide 56Gbps of throughput.
It also has 230 Terabytes of storage in Luster Parallel File system with an aggregate performance of around 5GBps on write performance.

  • Total IO Nodes: 4 (2 MDS & 2 OSS)
  • MDT Storage: Qscan (4TB in RAID 10)
  • OST Storage: DDN (230 TB in RAID 6)
  • Storage is from DDN with dual controller both working in active active mode.
  • Controller Enclosure : SFA 7700x
  • Disk Enclosure: SS7700
List of Applications installed in HARTREE
  • codes/gcc/dalton/2016
  • codes/gcc/ompi/3.0.0/globalarray/5.6
  • codes/gcc/ompi/64bitint/3.0.0/dalton/2016
  • codes/gcc/ompi/64bitint/3.0.0/globalarray/5.6
  • codes/gcc/openmolcas
  • codes/gcc/packmol/2018_002
  • codes/gcc/sapt/2016
  • codes/gcc/tinker/6.2.03
  • codes/intel/amber/16
  • codes/intel/columbus/7.0
  • codes/intel/cp2k/all/5.1
  • codes/intel/cpmd/parallel/3.15.1
  • codes/intel/dalton/parallel/2016
  • codes/intel/dalton/serial/2016
  • codes/intel/espresso/5.4.0
  • codes/intel/ga-5.7
  • codes/intel/gromacs/parallel/double/5.1.4
  • codes/intel/gromacs/parallel/gromacs-2020.1
  • codes/intel/gromacs/parallel/gromacs-2020.2
  • codes/intel/gromacs/parallel/gromacs-2020.2-plumed2.6
  • codes/intel/gromacs/parallel/single/5.1.4
  • codes/intel/gromacs/serial/double/5.1.4
  • codes/intel/gromacs/serial/single/5.1.4
  • codes/intel/gromacs-2019.6
  • codes/intel/lammps/lammps-11Aug17
  • codes/intel/namd/namd_2.12
  • codes/intel/namd/namd_2.1
  • codes/intel/newton-x/2-b1
  • codes/intel/nwchem/parallel/double/6.
  • codes/intel/nwchem/parallel/single/6.
  • codes/intel/plumd-2.
  • codes/intel/quantumespresso/6.
  • codes/intel/vasp/parallel/5.4.
Availability of different compilers on Hartree

Different compilers are available on HPC as modules. Here is the list of some of them.

GNU Compilers for Serial Code

  • compilers/gcc/10.1.0
  • compilers/gcc/11.2.0
  • compilers/gcc/5.1.0

Compilers for Parallel Code (OpenMPI)

  • compilers/mpi/gcc/openmpi/1.6.5
  • compilers/mpi/gcc/openmpi/2.0.4-32bit
  • compilers/mpi/gcc/openmpi/3.0.0
  • compilers/mpi/gcc/openmpi/3.1.6-32bit
  • compilers/mpi/gcc/openmpi/64bitint/3.0.0
  • compilers/mpi/intel/openmpi/1.6.5
  • compilers/mpi/intel/openmpi/3.0.0
  • compilers/mpi/intel/openmpi/4.1.2

Intel Compilers

  • compilers/intel/18.0.1.163
  • compilers/intel/mpi/18.0.1.163
  • compilers/intel/parallel_studio/parallel_studio_xe_2018_update1_cluster_edition
  • compilers/intel/parallel_studio/parallel_studio_xe_2020_update4_cluster_edition

Scientific Libraries

  • libs/hdf5/1.10.1
  • compilers/intel/mkl/18.0.1.163

Utilities

  • pdsh
  • util/cmake/3.18.0
  • util/python/Anaconda-2018/python-3.7
Scheduler in HARTREE
  • -Not listed-
Submitting Jobs in HARTREE
  • -Not listed-

Unable to display PDF file.

Please Do

  • Remember that you share the login nodes (and remote desktop environments where they are provided) with many other users. Be mindful of that. Users should therefore use an optimum number of processing cores by testing for scale up.
  • If you need to run something "strenuous" interactively, then use the batch system's qsub -I ... mechanism. This would apply to compile and/or testing code, pre-processing or post-processing data.
  • If you need to transfer a lot of small files, it is smarter to lump them into a single tar or zip file and transfer them as part of a larger lump of data.
  • If you are running jobs that are identical except for the input parameter and input file name, consider using job arrays instead of submitting many individual jobs of an almost identical form.
  • It is strongly recommended that users backup their files-folders periodically, as NISER will not be having a mechanism to backup users data.

Please Don't

  • Run processing for sustained periods on login nodes (or remote desktops). Your access to resources (CPU and RAM) is capped, but you should nonetheless avoid inconveniencing other users by running work on login nodes. Use the batch system instead.
  • Just because you can write a shell or Python script to rapidly fire qsub commands at the batch system, you should avoid doing so. Rapidly firing jobs at HPC can cause issues for the batch submission sub-system. Having large numbers of small jobs queued can also impact scheduler performance. Please use Job Arrays or Nimrod instead when you need to run large numbers of similar jobs.
  • Avoid abusing the login environment with the watch command. In many circumstances, the output of the command line that the watch is watching does not change anywhere near as frequently as the watch command will report.