Getting Started

What one type of compute node looks like

Overview

Remember all interactions with the cluster are via a Secure Shell (SSH) terminal application using a command line.

Once logged in, accessing software and python environments is via Environmental Modules.  To submit a job to run on the nodes in the cluster, users must use the SLURM command sbatch.

SSH Terminal Applications:

  • Mac has a terminal app under Applications/Utilities
  • Linux also has a terminal app – location in menus varies according to your linux distribution
  • Windows has clients available like PuTTY  mRemoteNG SmarTTY MobaXTerm
  • There are extensions/addons for Firefox and Chrome that give you an ssh terminal window.

You need to be familiar with baseline Linux commands used in terminal connections.

It is also good if you know how to use text editor like vi, vim and nano (nano is the most intuitive) – Here are links to tutorials for various editors:  vi     vim    nano  .  You can use text editors on your own computer then move the files to the cluster.

Logging in to Hummingbird with a terminal application

Hummingbird uses your campus CruzID and Gold password.  If you need to have someone who is visiting or a new faculty/staff member that doesn’t have a CruzID yet, please contact the team so that we can direct you to the process for getting them access.

Nota Bene: If you are off campus, you will be required to use Campus VPN to connect to Hummingbird. To install the VPN client to your machine, all you need to do is follow the VPN Installation Instructions.

The following are examples of Secure Shell (ssh) commands that may be used to log in from terminal apps like on Mac or Linux:

ssh <your_cruzid>@hb.ucsc.edu   or,

ssh -l <your_cruzid> hb.ucsc.edu

Moving files to your Hummingbird account requires the use of ‘sftp‘. (secure file transfer), one of the better applications for this is Filezilla it is available for MacOS, Windows and Linux.

Or you can use the following command to move files to and from Hummingbird from your computer’s terminal application.

sftp <your cruzid>@hbfeeder.ucsc.edu  (note: use this server name to move data to your home directory)

Various ways to move data between your local machine and the cluster DataTransfer (Cornell Virtual Workshop)

Putting your data on feeder.ucsc.edu directly takes the load off of the head node, but remember you have to execute your software from the head node using the SLURM batch queue system.

Storage on the cluster

Hummingbird has 3 areas of storage

  • /hb/home     (where your home directory lives- quota 1TB)
  • /hb/groups   (assigned as request for groups, e.g. courses)
  • /hb/scratch   (best place to use for your data – no quota, no backup)

Accessing Software Applications and Python with Modules

Understanding the use of Environmental Modules is required in order to access the vast amount of software already installed to Hummingbird (see “Software” tab).  This avoids the overly redundant installation of software or pythons environments to local home directories, and ensures that verified software is available for running across compute nodes

The Environment Modules package provides for dynamic modification of your shell environment. Module commands set, change, or delete environment variables, typically in support of a particular application. They also let the user choose between different versions of the same software or different combinations of related codes.

What does this mean?  In general users would set PATH variables in their login profile, e.g.  .bashrc, .bash_profile.  Using the module environment allows you to set these without editing your login files.

We have created a set of module files that setup the environment for executing a particular software or set of software packages.  If you issue the command “module available”  it will give you a list of the current available software packages on the cluster.   If you see a software package you wish to use then you would issue the command “module load <software package>, e.g.  module load nwchem

Several modules that determine the default  environment are loaded at login time.  Execute “module list” to review and “env” to see how this has expanded your environmental variables for your use.  Python 3.6.3 is also loaded at login time (for complete python list, execute “pip3 list“), though you can also swap to 2.7.11.

Useful Modules Commands

Here are some common module commands and their descriptions:

CommandDescription
module listList the modules that are currently loaded
module availList all software modules that are available for gnu compile family (for gnu7 software you will have to do a “module swap gnu gnu7”)
module spider <string*>Find software applications for all compiler families. Example: module spider R* searches for software beginning with “R”
module display <module_name>Show the environment variables used by <module name> and how they are affected
module unload <module name>Remove <module name> from the environment
module load <module name>Load <module name> into the environment
module swap <module one> <module two>Replace <module one> with <module two> in the environment

Loading and unloading modules

You must remove some modules before loading others.  You can do this by using module purge or module swap.

Some modules depend on others, so they may be loaded or unloaded as a consequence of another module command.

If you find yourself regularly using a set of module commands, you may want to add these to your configuration files (.bashrc for bash users, .cshrc for C shell users). Complete documentation is available in the module(1) and modulefile(4) manpages.

Compiling Software

If  you have VERIFIED that the software is NOT already available on Hummingbird (via “module spider <string>” or “module av”), or you have been instructed to compile for a course,  then you may need to compile your own software. You can also always put in a ticket for software installs/compiles to hb-team@ucsc.edu to provide a system-wide available compile and module install.

Hummingbird has the GNU 5.4.0 compiler set gcc,  g++, and gfortran.  The compiler family for gnu is loaded automatically to your environment.  For GNU7 you will have to use “module swap gnu gnu7”. Most software packages use the GNU compiler by default.  Make sure that you setup the compilation to install all software in your home directory, otherwise it will fail when trying to install to default locations.

If you the software you are compiling requires other libraries then issue the “module av” to see the list and then do the appropriate “module load <software package>, e..g  module load R_base.

Intel compilers (and subsequent Intel compiled applications) are not currently available to Hummingbird.

Using SLURM on Hummingbird

Slurm resource scheduling is the required on Hummingbird. NO jobs, applications, or scripts should be run on the login-node.  This includes the use of python. All applications and scripts must be run via Slurm batch for job submissions.   Compilations (See Compiling Software), however, can be done on the login-node without Slurm.

HB Partitions

HB Cluster nodes are grouped into partitions, and various scheduling parameters are associated with those partitions.

  • There are 4 public partitions (also called queues) on HB: 128×24, 256×44, Instruction, and 96x24gpu4. Additionally, there is one large-memory partition (1024×28) that public users are allowed to use, but the researcher who owns it gets priority access, and jobs may be canceled if they decide they need to use it.
  • Single Jobs are allowed to use a maximum of 72 cores (eg – 3 nodes).
  • Hummingbird uses different partitions to distinguish architectures or delegation. Partitions do not distinguish scheduling rules at this time.
  • Jobs should always include a time limit (wall clock). For example, a job for one hour would have this entry:
--time=0-00:01:00       ### Wall clock time limit in Days-HH:MM:SS
  • Compute nodes used by instructors/students are under the partition name of “Instruction”.
  • Compute nodes used by all other are named per “[Memory]X[CPUs]” (physical CPUs not logical processors). Example: “128.x24”.  This is the default partition consisting of 19 compute nodes.
  • The 256×44 partition is one node that makes more memory available for running jobs.
  • Hyperthreading is not enabled under HB. Therefore all nodes have the designation of “ThreadsPerCore=1” (no HT).

HB SLURM Constraints

The end-user can control what partition, and how many nodes/tasks are selected, but not how resources are scheduled. Users do have control over CPU resources (high level) and memory usage. Therefore, while a memory sizing parameter and cpu count designation is optional, both are highly advised to optimize cluster use.  Ensure you understand ALL of the following before using the Slurm scheduler to submit your jobs before starting:

  • You cannot login into a compute node if you do not have an active job on that node. This impacts the way you run in interactive mode on HB.  Review “Submitting JOBS to SLURM on HB”.
  • There are 4 partitions: Instruction, 128×24, 256×44, and 96x24gpu4
  • Job queues are scheduled against available resources.
  • CPU Cores (processors) are a consumable resource for scheduling.
    • Usage should be limited to a maximum 48 cores per job
    • More than one job/step may run on the same node concurrently if you use the correct srun/mpirun parameters.
    • If you only designate the number of nodes and processors in a job, a whole node will be consumed by your job (a.k.a task). Therefore, use the qualifier “cpus-per-task” (-c) whenever possible. By using the –cpus-per-task=X options, Slurm also will know that each task requires X processors per node, so provides you more control.
    • Logical cores are recognized, but hyperthreading is not supported for any partition nodes.
    • For MPI runs, an MPI rank is mapped to a single cpu. MPI tasks will not fork threads with OpenMP
    • Task affinity (pinning tasks to processors/cores) is not supported.
  • Memory defaults to all non-system memory per node. Therefore, a single job on a compute node will allocate all node memory. This prevents others from sharing node space and creating the potential for your program to crash.  Therefore, use the qualifier “#SATCH mem=XX” whenever possible in your SLURM script.
  • Job memory accounting is only available when using “srun” or “mpirun”.

Submitting JOBS to SLURM on HB

Submission Methods

The slurm utilities “sbatch” and “salloc” allocate resources to a job.

  • sbatch is used to submit a job script for later execution.The sbatch command is designed to submit a script for later execution and its output is written to a file. Command options used in the job allocation are almost identical. The most noticeable difference in options is that the sbatch command supports the concept of job arrays, while srun does not. Another significant difference is in fault tolerance. Failures involving sbatch jobs typically result in the job being requeued and executed again, while failures involving srun typically result in an error message being generated with the expectation that the user will respond in an appropriate fashion.
  • srun is used to submit a job for execution in real time. The srun command is designed for interactive use, with someone monitoring the output. The output of the application is seen as output of the srun command, usually output to terminal. Use of “srun” on HB requires the use of “salloc” if NOT used within an sbatch script. This is because HB does not allow direct logins to compute nodes without an allocation to that compute node.  Specifically, you must first login to the allocated node BEFORE using srun (or mpirun).  You cannot use “srun … –pty /bin/bash” on the headnode following allocation.

Creating SBATCH Scripts for use with the job scheduler

Start by reviewing the section, below, “Examples of Slurm Submissions”. But first, ensure you understand the difference between a serial run and a parallel run (See “Srun versus mpirun”, below).  There are also very helpful example scripts located in the directory  /hb/software/scripts on the cluster.

The following link provides more detail: Creating scripts for jobs

If you are more familiar with the Sun Grid Engine (or OpenGrid/PBS), then this link provides information on SLURM conversion

 “Srun” versus “mpirun”

Each invocation of “mpirun” or “srun” within a job is known as a job step. Memory is managed more effectively when using “srun” or “mpirun”, but cannot be tracked if you don’t execute your job with one or the other preceding your command.

Either “srun” or “mpirun” should be used to invoke parallel programs when using an “sbatch” script.

Both “mpirun” (or “srun” if used in conjunction with “prun”) inherit, by default, the pertinent options of the sbatch or salloc which it runs under.

srun” has 2 modes of operation: 1) Create job allocation and 2) spawn an application

  • srun” can be used within an existing “salloc” job allocation or from within an “sbatch” script to allocate and spawn serial jobs for parallel execution (independent, sequential jobs) such as with python or other.
  • sbatchis the preferred method to invoke programs since it consumes less resources and is more efficient. Batch scripts can be reused with minor, if any, modification.  Batch scripts also do not tie up node resources (as does “salloc” if improperly exited).
  • Do not use srun to launch parallel tasks unless  you utilize “prun” or “mpirun” (see “Examples of Submissions”, below) from an interactive bash/shell task. In other words, DO NOT execute srun <mpi.exe>“. See Note, below.
  • You can run interactive jobs (to see all output) if you use an “salloc” job allocation. Once you have logged into an allocated compute node, “srun” can be invoked to request other node resources. However, the use of “srun” (or “mpirun”)  is not required if the 1 allocated node will suffice.  When external resources are granted, your tasks will be run across those resources as a single job.
  • The “–exclusive” option to “srun” can be used to allocate a node for exclusive use (otherwise, other jobs will share the nodes CPU  resources).

mpirunis the preferred method to launch parallel tasks across resources.  “mpirun” will  inherit, by default, the pertinent options of the sbatch or salloc which it runs under.

Note: Parameters such as “cpu_bind”, “cores-per-socket”, etc will NOT function. HB Slurm is NOT configured for “Task/Affinity” to provide node specific resource management (e.g. pinning tasks to specific processors).

Note on MPI:  OpenHPC does not support the “direct” launch of MPI (parallel) executables via “srun” for the default Open MPI (openmpi). In leiu of “srun” the user must launch an MPI job with “mpirun”. Specifically, you cannot directly run execute: “srun <mpi-program>”

*The OpenHPC compile of openmpi ( 1.10.6) does not provide the PMI library for Slurm due to GPL licenses issues posed by some applications.”

Examples of SLURM Submissions

Job Allocation – Batch, Parallel

Batch jobs are submitted via an “sbatch” script. Batch runs are the preferred method for executing your  application scripts.  Programs should be executed with either “srun” (single) or “mpirun” (parallel) preceding the program path/name. In the case of single (non-mpi), if you do NOT use srun, you will not get memory accounting.

An MPI job launched via a job script and submitted with “sbatch” follows. A good resource for learning how to use “sbatch” (after reading “HB Slurm Contraints) is:
Slurm Job Scheduler

Example job script (job.mpi):

#!/bin/bash

#SBATCH -p Instruction   # Partition name
#SBATCH -J test        # Job name
#SBATCH --mail-user=<cruzid>@ucsc.edu
#SBATCH --mail-type=ALL
#SBATCH -o job%.j.out    # Name of stdout output file
#SBATCH -N 2        # Total number of nodes requested (128x24/Instructional only)
#SBATCH -n 16        # Total number of mpi tasks requested per node
#SBATCH -t 01:30:00  # Run Time (hh:mm:ss) - 1.5 hours (optional)
#SBATCH --mem=1G # Memory to be allocated PER NODE


export OMPI_MCA_btl=tcp,sm,self
module load gromacs-5.1.4

# Use of -p replaces the need to use "#SBATCH --cpus-per-task"
mpirun -np $SLURM_NTASKS $EXE

Example submission of the above script:

$ sbatch job.mpi

Example followup to monitor:

$ squeue     (show jobs submitted and the current run state)
$ scontrol show job [job id]
$ scancel [job id]
$ sinfo -l

Example “post-run” job lookup ( memory/CPU usage):

$ sacct -o reqmem,maxrss,averss,elapsed,alloccpus -j [job id]

Job Allocation – Interactive, Serial

1. First, use “salloc” to Request a node resource to execute your interactive program.

Allocate 1 G of memory on 1 node.  DO NOT FORGET TO INCLUDE MEMORY and CPUS-per-task or else  you will prevent others from using the compute node! If  you need the whole node (no CPU or memory sharing), then add the option “–exclusive”. Note: the use of the “Instruction” partition bellow is for demo purposes only. Please use one of the standard partitions (e.g. – 128×24) unless you are doing this for a class (at which point Instruction is appropriate).

$ salloc --partition=Instruction --time=01:00:00 --mem=500M --ntasks=1 --cpus-per-task=1
salloc: Granted job allocation 184247

Display SLURM Variables to verify and identify allocation

$ export | grep SLURM


SLURM_NODELIST=hbcomp-001
SLURM_JOB_NAME=bash
SLURM_NODE_ALIASES=(null)
SLURM_NNODES=1
SLURM_JOBID=184247
SLURM_NTASKS=2
SLURM_TASKS_PER_NODE=1
SLURM_CPUS_PER_TASK=1
SLURM_JOB_ID=185121
SLURM_SUBMIT_DIR=/hb/home/
SLURM_NPROCS=2
SLURM_JOB_NODELIST=hbcomp-001
SLURM_JOB_CPUS_PER_NODE=1
SLURM_CLUSTER_NAME=hbhpc
SLURM_SUBMIT_HOST=hummingbird
SLURM_JOB_PARTITION=Instruction
SLURM_JOB_NUM_NODES=1
SLURM_MEM_PER_NODE=500

2. Login to your allocated Node 

$ ssh $SLURM_NODELIST

3. Next, run Commands on your allocated Node

On the scheduled node (to which you logged in) execute your script or a command on the 1 node.

$ module load <app_module>
$ <program> <parameters>

OR, use srun to schedule an executable to run on another resource(s). The below implementation is uncommon since srun is primarily intended for parallel programs.

$ srun -N 1 -n 1 --cpus-per-task=1 --mem=100MB 'hostname'
hbcomp-018

4. Exit the node that was allocated.

$ exit

5.  Last, relinquish the resource.

WARNING: You MUST “EXIT” the shell created by the salloc command. Note that “squeue” will also show you are still running a bash shell on the allocated node.  Yes, this is a second “exit”! This shell can persist even after the job has timed out and the nodes deallocated.

$ exit
salloc: Relinquishing job allocation 184247

Alternatively,  you can simply cancel resource allocation/job (if you don’t need the full time scheduled originally) as shown below:

$ scancel $SLURM_JOB_ID
salloc: Job allocation 184242 has been revoked

Job Allocation, Interactive Parallel:

An MPI job launched via “prun” (interactive) is the same as Job Allocation for Serial.  The only difference is in STEP 3 when multi-node allocation is desired.   STEP 3 can also be executed as a single node run with straight “mpirun”).

Note on “prun” When an MPI module has been loaded into your environment “prun” will setup the appropriate launch script; with default HB this will be “mpirun” .  Prun will identify the nodes (or other parameters) assigned by the scheduler (from “srun”) and pass to the MPI program. This then is creates the only condition under which you can use “srun” to launch a parallel program.

STEP 1. $ salloc -N 1
STEP 2. $ ssh $SLURM_NODELIST
STEP 3.

$ module load <app_module>
$ srun -p Instruction -n 8 -N 2  –mem=1G –cpus-per-task=1 –pty /bin/bash
$ prun ./<mpi_program>
[prun] Master compute host = hbcomp-004
[prun] Resource manager = slurm
[prun] Launch cmd = mpirun ./mpi_pi
process output….
process output….
process end….

STEP 4. $ exit
STEP 5. $ exit

NOTES, STEP 3:

  1.  This requests 2 nodes (-N 2) and we are saying we are going to launch a maximum of 8 tasks per node (–ntasks-per-node=8). We are saying that you want to run a login shell (bash) on the compute nodes. The option –pty is important. This gives a login prompt and a session that looks very much like a normal interactive session but it is on one of the compute nodes. If you forget the –pty you will not get a login prompt and every command you enter will get run 16  = (-N 2) x (–ntasks-per-node=8) times.
  2. In lieu of “prun”, you could also have run on the allocated node (versus  multiple) with :

mpirun -n 8 ./mpi_pi

UC Santa Cruz Research Computing