Displaying graphical windows from compute nodes running under SALLOC allocations

For a small subset of our users, there is occasionally the need to interactively run jobs on compute nodes on the cluster, including displaying GUI windows from those interactions. This process is a bit complicated and thus is not recommended for users not strongly familiar with command-line concepts. You will not be able to just copy-paste this text, and thus must understand the nature of the arguments you are supplying to the commands.

Requesting an allocation

To start with, you will need to get an allocation to a compute node on the cluster. The salloc command takes most the same arguments as we would be supplying in a typical SBATCH script, so this should all look familiar:

salloc --partition=default --time=01:00:00 --mem=500M --cpus-per-task=1 --ntatsks=1

We start start with running the salloc command, providing the instructions on (1) which partition to run on (default in the example), (2) how long our job is allowed to run for (one hour in the example), (3) how much memory to allocate on the node for our job (500 MB of RAM in this case), and (4) how many cores we want on the compute node for our allocation. We specifically set --ntasks=1 as we are only running one primary task, which is the allocation we are going to be connecting to. The number of --cpus-per-task thus defines the number of cores we have access to within the task.

Special note – Hummingbird does not have a default partition, thus if you try to blindly use the above command, it will fail. Our primary partition is the 128x24 but your circumstances may vary, so adjust accordingly.

If your allocation request is successful, you should see a response like:

salloc: Granted job allocation 999999

The number at the end is you job ID, which can be viewed in the queue of running jobs via squeue. To see the details of your job, you can run

squeue -l -j 999999

You should get a result similar to:

Fri Oct  6 14:52:36 2023
             JOBID PARTITION     NAME     USER    STATE       TIME TIME_LIMI  NODES NODELIST(REASON)
            999999    128x24  jobname rkparson  RUNNING       0:01   1:00:00      1 hbcomp-007

Of interest to us is the last item on the job line: the node where you job is running. In the example above, it is hbcomp-007.

The Hummingbird cluster has a policy that users cannot log into compute nodes unless they have a job running on the node. This is for security reasons and prevents uses from having unauthorized access. That said, once you have a running allocation on a node, you can the SSH into that node from the head node. Since these devices all are part of the same shared cluster environment, you do not need to pass any additional username or password; just running

ssh hbcomp-###

is sufficient to be able to log in.

Getting graphical windows on your allocation

Now that you have an allocation, our next step is getting visual access to the node you’ve been assigned.

How graphical windows are rendered on computers is a complicated issue, and the solutions vary depending on the operating system: Linux has native window rending (called ‘xwindows’), MacOS requires the use of an additional program called XQuartz (https://www.xquartz.org/), while Windows has multiple ways of solving this issue. The easiest one-product solution I’ve had experience working with is MobaXTerm, which has most the necessary features available for free. Additional options include running XMing and using Putty or some other SSH agent. Users have have installed WSL should be able to the necessary graphical libraries within their Linux instance, but this may not always be entirely sufficient and more configuration may be necessary.

Whichever OS you are using, the process for getting a graphical window on the compute node is the same; we will need to pass the graphical window information through a chain of SSH sessions (commonly called a tunnel) via a process known as X Forwarding.

In a new terminal window, run the command

ssh -XY -N -L 2022:<destination_node>:22 <cruzid>@hb.ucsc.edu &

Make sure to replace <destination_node> with the compute node where your job is running (that we identified above) and <cruzid> with your CruzID name. The arguments -XY tells SSH we want do X Forwarding, the -N argument tells SSH that this session is not for interactive use, but just for establishing the tunnel. 2022 is the port number we are using for our tunnel. If this worked, you should see output similar to:

[1] 9999

The 9999 in this case is the process number for our SSH tunnel. We see this because the & at the end of the above command pushes our process into the background, so it can continue running without interaction. With our tunnel in place, we have a means of directly connecting to our assigned compute node, so we do that next:

ssh -XY -p 2022 localhost

This uses the port we generated before (2022), and connects to it, again forwarding our X information through it to the destination node. If everything worked, you should now have a command prompt that says

[<cruzid>@<destination_node> ~]$

where the <cruzid> is your user name and the <destination_node> is the compute node we wanted to tunnel to. We can verify this is working but running a simple X-window application, called xclock. This should open a new GUI window showing a clock in it. This clock is being entirely generated on our compute node, and the graphical aspects are being pushed to our local system over SSH!

From here, we can now begin doing interactive work on our compute node via whatever applications you need to run – Matlab, R, Python, etc.

Ending your tunnel session

Once you are done with your tunnel session, we now need to do some cleanup. First, we can exit from our compute node by running the ‘exit’ command. This should return you back to the command prompt on your local system.

ssh -XY -N -L 2022:<destination_node>:22 <cruzid>@hb.ucsc.edu

Next, we run the command fg. This will cause the tunnel process we backgrounded before to come to the foreground. Your command prompt should now show the text of the command we ran above:

One you see this text, press the combination of keys ‘Control-C’ on your keyboard. This should cancel the command and return you to your command prompt again. With this, our tunnel is now closed.

Finally, if we are done with our node allocation as well, we can run the command

scancel 999999

where 999999 is the job number we were given when we first started our salloc job.

With that, all processes should be exited and cleanup is done.

UC Santa Cruz Research Computing