NVIDIA DGX-1 is the first system developed for deep learning tasks. The system is based on the last generation GPUs which provide data processing speed comparable with 250 servers with x86 architecture.

HybriLIT includes 5 servers with DGX-1 with processors of two types:

  • 2 CPUs — Intel Xeon E5-2698 v4 20 cores;
  • 8 GPUs — NVIDIA Tesla V100.

with the following specifications:

CPU 80 cores
GPU 8 cards
RAM 512 GB
Storage 7.6 TB
NVLink bandwidth 300 GB/s
Ethernet 10 Gbit/s
InfiniBand 40 Gbit/s

Declared performance for NVIDIA Tesla V100

Double precision 7.8 Tflops
Single precision 15.7 Tflops
Deep learning 125 Tflops

Resource management via SLURM
5 servers with DGX-1 are united into one SLURM partition called – dgx. Computation time for tasks in dgx is up to 14 days.

These batch script parameters allow managing the resources of dgx partition: