Platform"HybriLIT"

Instructions for working with the Heterogeneous HybriLIT platform

  1. HybriLIT Platform Software and Hardware Environment
  2. Job Queues
  3. Data Storage and Processing Systems
  4. Basic steps for making calculations on the platform
  5. Getting Started: Remote Access to the Platform
  6. Lmod Module System
  7. Basic Linux Commands
  8. SLURM Job Scheduler

HybriLIT Platform Software and Hardware Environment

The HybriLIT training and testing facility is built on compute nodes equipped with multi-core Intel processors and NVIDIA GPU accelerators (see the Hardware section for details).

Compute Node Types

  1. CPU nodes with Intel Xeon Phi 7290 multi-core processors
  2. CPU nodes with Intel Xeon E5-2695 v2 multi-core processors
  3. GPU nodes with Intel Xeon E5-2695 v3 multi-core processors and NVIDIA K80 GPUs
  4. GPU nodes with Intel Xeon E5-2698 v4 multi-core processors and NVIDIA V100 GPUs

Job Queues

Jobs are submitted by placing them into a queue associated with a user account. Since HybriLIT is a heterogeneous platform, separate queues have been created to support different types of computing resources.

Currently, HybriLIT provides 5 queues:

  • interactive* — includes one compute node with 2 × Intel Xeon E5-2695 v2 (12 cores each). This queue is suitable for running test programs. The maximum wall time is 2 hours. (*The asterisk indicates that this is the default queue.)
  • cpu — includes 15 compute nodes, each with one Intel Xeon Phi (72 cores). This queue is intended for CPU-based applications. The maximum wall time is 7 days.
  • long — includes 6 compute nodes, each with one Intel Xeon Phi (72 cores). This queue is suitable for long-running applications with execution times of up to 4 weeks.
  • gpu_k80 — includes one compute node with 2 × Intel Xeon E5-2695 v3 (14 cores each) and 2 × NVIDIA Tesla K80 GPUs. This queue is intended for GPU-accelerated applications. The maximum wall time is 7 days.
  • gpu_volta — includes one compute node with 2 × Intel Xeon E5-2698 v4 (20 cores each) and 8 × NVIDIA V100 GPUs. This queue is suitable for long-running GPU applications with execution times of up to 7 days.

Operating System and Software Environment

The heterogeneous platform runs AlmaLinux 9.6 and uses the SLURM workload manager.
The installed software environment includes compilers and development tools for building, debugging, and profiling parallel applications, as well as the Lmod module system.

Data Storage and Processing Systems

To improve reliability and performance, several data storage and processing systems are available to users on the platform.

1. Home

User home directory:
/zfs/store5.hydra.local/user/l/login

This directory is intended for editing files and building/compiling programs. Each user has a 100 GB quota. The home directory is not intended for running computations and is not accessible on compute nodes.

2. Project

User project directory:
/lustre/projects/l/login

This directory is intended for running computations and storing working files (executables, input, and output files) during job execution. No quota is applied, and the directory is accessible on compute nodes.

3. Scratch Lustre 12×12*

High-performance scratch directory:
/lustre/scratch/l/login (*will be available later)

This directory is located on the Lustre 12×12 file system, which is optimized for high I/O workloads. It is intended for massively parallel applications with intensive input/output operations. Files stored in Scratch Lustre 12×12 are not guaranteed to be preserved for more than 90 days.

4. Scratch NFS/ZFS

Backup scratch directory:
/zfs/scratch/l/login

This directory is hosted on an NFS/ZFS file system and is intended for jobs that process a large number of small files (less than 10 MB each). Files stored in Scratch NFS/ZFS are not guaranteed to be preserved for more than 90 days.

5. Lustre (Legacy Home Directory)

Old user home directory:
/lustre/home/user/l/login

At present, files are being migrated from the previous storage system to the current project directory /lustre/projects. This process may take some time. Please do not worry — all data are preserved and will be fully transferred.

Basic steps for making calculations on the platform

Getting Started: Remote Access to the Platform

Remote access to the HybriLIT heterogeneous computing platform is available only via SSH.

DNS address:
hydra.jinr.ru

Detailed connection instructions for different operating systems are provided below.

For Linux Users

Open a terminal and run:

ssh USERNAME@hydra.jinr.ru

where USERNAME is the login you received during registration, and hydra.jinr.ru is the server address. When prompted, enter your password. Upon successful authentication, you will see a command prompt:

[USERNAME@hydra ~] $

This indicates that you are connected to the platform and located in your home directory. During the first connection attempt, you may see a warning about an unknown host. Type yes and press Enter to add the host to the list of known hosts.

For Windows Users

Windows users must use an SSH client such as PuTTY or MobaXterm.

Connecting with PuTTY

PuTTY does not require installation. Download putty.exe from: http://the.earth.li/~sgtatham/putty/latest/x86/putty.exe. Save it to a convenient location and run it.

PuTTY configuration steps:

  • In the Host Name (or IP address) field, enter: hydra.jinr.ru
  • In Saved Sessions, enter a name for the connection (e.g., hydra.jinr.ru)
  • To enable remote graphical applications (X11):
    • Go to Connection → SSH → X11
    • Enable X11 forwarding
    •  
  • Verify that Connection → SSH → Tunnels has Local ports accept connections from other hosts enabled
  • Return to the Session tab and click Save
  • Click Open to connect to the HybriLIT platform
  • Enter your login and password when prompted

After successful authentication, you will see:

This confirms that you are connected and located in your home directory. During the first connection, typeyes when prompted to confirm the server identity.

Installing and Using MobaXterm

  1. Visit mobaxterm.mobatek.net and download either:
    • the portable version (no installation required), or
    • the installer version (recommended below)

2. Run MobaXterm_Setup_XX.exe and follow the standard installation steps

3. After installation, launch MobaXterm using the desktop shortcut
4. From the top menu, select Sessions → New session

5. Enter hydra.jinr.ru in the Remote host field and click OK

6. A new tab will open. Enter your login and password. Upon successful authentication, you will see:

Lmod Module System

The platform uses Lmod 9.1.2 for dynamic environment management.

Lmod allows users to:

  • Switch between different compilers
  • Build applications using major programming languages (C/C++, Fortran, Java)
  • Use parallel programming technologies (OpenMP, MPI, OpenCL, CUDA)
  • Access software packages installed on the platform

Before compiling applications, required modules must be loaded.

Common module commands

 Loaded modules are not preserved between sessions.

To load modules automatically, add the following command to ~/.bashrc:

You can also use compilers and software installed via CernVM File System (CVMFS).


CernVM File System

Adding CernVM-FS to the available software stack provides access to CERN software.

To list available packages: ls /cvmfs/sft.cern.ch/lcg/releases

The directory  /cvmfs/sft.cern.ch/  is mounted dynamically and may disappear after a period of inactivity.
To remount it, simply run the command above again.

To use compilers and software packages: source <PATH_TO_ENVIRONMENT_FILE>

Basic Linux Commands

Command

Description
man <command> display manual for a command
man -k <keyword> search commands related to a keyword

File and directory operations

ls list files
ls -la detailed list including hidden files
cd <directory> change directory
cp <source> <destination> copy files
mv <source> <destination> move or rename files
ln -s <target> <link> create symbolic link
rm <file> delete file
rm -r <directory> delete directory recursively

File viewing and editing

cat <file> display file contents
more <file> paginated output
less <file> interactive file viewer (q to quit)
nano <file> edit file with Nano
vim <file> edit file with Vim
pico <file> edit file with Pico

Utilities

find <directory> -name <filename> find files
tar -zxvf <file> extract tgz or tar.gz archives
mc launch Midnight Commander
man mc Midnight Commander documentation

Standard Commands

pwd print current directory
whoami show current username
date show current date and time
time <program> measure execution time
ps -a list active processes
chmod <access_rights><file> change access rights to a file that you own
*Access types:
– r  read
– w  write
– x  execute
– –  no permission
User types:
— u  owner
— g group
— o others
Examples:
chmod a+r zara Grants read access to everyone (all=user+group+others)
chmod o-x zara Removes execute permission from others
chown <owner> <files> change file owner
chgrp <group> <files> change group
ls -l <file> view permissions

Process Control

<file> | grep <pattern> search text in file
man grep help about the command
ps axu | grep <username> list user processes
kill <pid> terminate process
killall <program> terminate all processes by name

SLURM Job Scheduler

SLURM is a scalable, fault-tolerant, open-source cluster resource manager and job scheduler that provides:

  • Allocation of exclusive or shared access to compute nodes
  • Execution and monitoring of parallel jobs
  • Job queue management and load balancing

1. Basic SLURM Commands

  • sbatch— submit a batch job
  • squeue— view job queue
  • sinfo— view node and partition status
  • scancel— cancel a job
  • scontrol— view or modify SLURM state

Example: 

2. Partitions (Queues)

HybriLIT provides the following partitions:

  • interactive* — 1 node, Intel Xeon E5-2695 v2 (12 cores ×2), up to 2 hours (default)
  • cpu— 15 nodes, Intel Xeon Phi (72 cores), up to 7 days
  • long— 6 nodes, Intel Xeon Phi (72 cores), up to 4 weeks
  • gpu_k80— 1 node, 2× Xeon E5-2695 v3 + 2× NVIDIA Tesla K80, up to 7 days
  • gpu_volta— 1 node, 2× Xeon E5-2698 v4 + 8× NVIDIA V100, up to 7 days

3. SLURM Script Files

Jobs are submitted using a script file, which is a standard Bash script with the following rules:

  • First line: #!/bin/sh or #!/bin/bash
  • Lines starting with # are comments
  • Lines starting with #SBATCH define SLURM parameters
  • All SLURM parameters must be defined before launching the application

Required / recommended parameters:

  • -p— partition
  • -n— number of processes
  • -t— wall time (required)
  • –gres— number of GPUs or coprocessors
  • –mem— memory in MB (optional)
  • -N— number of nodes
  • -o— output file

Examples

CPU job:

GPU job:

Examples for specific programming technologies will be provided in the corresponding sections.