Skip to content
Home > Documentation

Documentation

HPC4AI Documentation

Latest cluster news

17/01/2024

Four NVIDIA Grace Hopper Superchip boards are now available under the “gracehopper” Slurm partition.

27-11-23

IMPORTANT: an update to the Spack system repository broke compatibility with user installations.
To solve the issue you can either start with a fresh spack installation or update Spack using the following commands:

  1. cd ~/spack
  2. git pull
  3. git checkout develop

If this procedure doesn’t solve the problem, we suggest starting with a fresh installation:
1. rm spack .spack
2. spack_setup

26-08-23

You can now access Lustre filesystem also on the frontend node

Accessing the system

The required steps to access the cluster are:

  1. Request an account by filling out our registration form
  2. Once the account is approved, log into GitLab using the C3S Unified Login option 
  3. Upload your public SSH key to your account in the corresponding page
  4. Login through SSH at <username>@c3sfr1.di.unito.it
  5. With the same account you can log into the web calendar to book the system resources (more details below)
System Architecture

Nodes

  • 1 login nodes PowerEdge R740xd
    • 2 x Intel(R) Xeon(R) Gold 6238R CPU
    • 251 GiB RAM
    • InfiniBand network 
    • OmniPath network
  • 4 compute nodes Supermicro SYS-2029U-TRT
    • 2 x Intel(R) Xeon(R) Gold 6230 CPU
    • 1536 GiB RAM
    • 1 x  NVIDIA Tesla T4 GPU
    • 1 x  NVIDIA Tesla V100S GPU
    • InfiniBand network

Storage

The cluster is equipped with 2 network storages:

  • 50TB BeeGFS filesystem, mounted in /beegfs
  • 20TB Luster filesystem, mounted in /lustre/scratch – only for OmniPath network nodes

Spack environment

The system comes with a global Spack instance automatically sourced at login time. 

Users can setup their own Spack instance in their home folder. If so, global Spack automatic sourcing is disabled. 

If needed, a setup script called “spack_setup” can be used to setup a personal spack instance chained with the global one.

Using the reservation system

Resources can be accessed for batched computations through the Slurm queue manager (version 22.05). Plus, a custom Slurm plugin allows users to book scheduled access to resources through an external calendar, the Booked Scheduler

The system offers 4 jobs queues:

  • epito-compile (Time limit: 1h; Max resources: 2 cores, 1 node);
  • epito-1h (Time limit: 1h; Exclusive node access);
  • epito-12h (Time limit: 12h; Exclusive node access);
  • epito-booked (No time limit; Exclusive node access; reservation through the Booked Scheduler) .

Resource usage is accounted using credits, the Booked Scheduler currency. Initial credits count allow computation for a 5000 core-hours total, but more can be requested following this procedure.

To book resources on the system, first log into the web calendar. You can then select one or more epito nodes from the resource list to create a reservation. From the creation page, you can choose the starting and ending date and time, the resources needed and a mandatory title and description, as showed in the image below.

Once the reservation is correctly created, the system will prompt you with a confirmation message:

Now head to the system and login your user via ssh. You can check the corresponding Slurm reservation has been created with the command “scontrol show res”.

The output should look like the following:

You can now submit your batch script. Remember to specify the correct partition “epito-booked” and to use the reservation you created in the step before, or the job may not have the guaranteed resources or may be terminated after the default time limit. Here is a simple example of a script:

#!/bin/sh
#SBATCH –partition=epito-booked
#SBATCH –reservation=test_computing
#SBATCH –nodes=2
srun sleep 120

Using Slurm directives, the partition name, reservation and minimum number of nodes requested for the job are specified in the corresponding line. With “srun” you can specify the command to run on each node. You can find more information in the sbatch official documentation page.

Once the script is ready, you can submit the job with the “sbatch” command with the file and name as argument, and check its status by using the command “squeue” to list the job queue, showing the job id, partition, running time and the current status:

The script result and/or error logs will be created in your submitting working directory while the job runs in file named slurm-{jobid}.

HelpDesk

For any problem or question, e.g. to request the installation of additional software, please submit a ticket to the C3S helpdesk or send an email to support@hpc4ai.unito.it