Category Archives: Maintenance and Upgrades

Complete list of upgrades and improvements for the Summer 2021 Hummingbird maintenance period.

Winter Maintenance Progress Report

We have another update on our progress!

Late last week, we worked directly with the providers of the storage file system (BeeGFS) to identify some bugs and make some improvements to their BeeGFS-specific data replication tool. This tool is used for moving or replicating data stored on BeeGFS-based systems. The new tool provided at least an order-of-magnitude improvement in our copy speeds, so that we now anticipate being done with the data replication by early next week.  There will be some loose ends to tie up before we can return the system to service, so our goal right now is back to full service by Wednesday January 21, 2026 at 8am.

In order for us to concentrate our efforts on finishing, we will not be holding office hours again this Thursday. Office hours will resume on Thursday January 22, 2026.

We apologize for this extended outage, but everyone on the team and especially the sponsors of the new system, felt that it was better to take our time, be cautious that we preserve your data (and metadata), and do this manuever carefully. We will continue to strive to finish the maintenance and get the systems back online as quickly as possible. Your patience is greatly appreciated.

 

No Hummingbird Office Hours Today – Maintenance Continues

Office Hours for January 8, 2026 is cancelled. Maintenance shutdown continues…

We wanted to give you an update on where we are at with the data migration and the on-going maintenance service outage. We have escalated a request to the storage system vendor as well as the Science Engagement team at the Energy Sciences Network (ESNet – part of the DoE Labs). One of our main goals is keeping your data safe, so rest assured – no data will be lost during this manuever. That said, it’s always good practice to make a backup of any critical data, such as run results, outside of Hummingbird.

One lesson learned here is that size matters. This is where we ask for your help in the future. Please stick to our quotas (1TB per user in /home), curate your data (i.e. download only what you need and keep only the essential), share data where possible (via groups), and refrain from using HB’s storage system as long-term storage. When you’re done with data, please clean up after yourselves.

We are cancelling today’s office hours, and the team will continue to work towards getting the system back online.

Updated Winter Maintenance Status

Hello Hummingbird User Community,

We currently are planning to restore the cluster to normal (but improved) operations NLT January 15, 2026. Please continue to monitor this website for updates.

The Hummingbird Cluster is currently down for maintenance while we move to a new storage system.  We need to transfer nearly 750TB and until all data transfers are complete, we are unable to resume cluster operations. This migration is an essential part upgrading our computational capabilities with a new multi-PB storage service for our cluster, which will provide not only more storage, but also much faster file input/output once it is online.

These improvements are part of a $1.7M campus investment in shared HPC.Given the large inconsistency of the transfer speeds we are observing, we are unable to provide a concrete estimate for when we will be able to resume service. Our goal is January 15, 2025, but due to on-going technical issues, this may change.

We are doing everything we can to speed this process along as much as possible and will provide further information as we are able.

 

Winter Maintenance 2025

Hello Hummingbird user community, we will be taking an extended break on the cluster over the winter holidays.

We will be offline starting Sunday, December 21, 2025 at 8am and will resume service by (latest) Friday, January 2, 2026 at 8pm.  During this time, we will be conducting a large-scale data migration to a newly redesigned storage backend that should removed the speed bottlenecks that we’ve been observing over the past few months on the cluster.

All cluster access will be shut off starting at 8am, so please ensure all jobs and file transfers are completed prior to this time. I will put the queues into DRAIN mode (no new jobs) the morning of the Wednesday, December 17, 2025 at 8am. All jobs already submitted before this time will be able to run to completion, or will be canceled on Sunday morning; no new jobs will be allowed to start.

If you have any questions or concerns, please reach out on Slack or open a ticket by emailing hummingbird@ucsc.edu.

Summer Maintenance 2025

The Hummingbird cluster will be going down for our Summer maintenance from Sunday, August 24, 2025 at 9:00pm through Monday September 1, 2025 at 9:00am. We will begin draining the cluster on Saturday, August 23, 2025 starting at 5:00pm: NO NEW JOBS will be allowed starting at that time. The cluster will be FULLY SHUTDOWN (storage included) on August 24.

The main focus behind this maintenance window will be shifting the Hummingbird cluster into a new location at our colocation facility – this new location will allow us room to expand like never before giving us 8 racks dedicated to Hummingbird with 8 additional in the future, should we need it. Additionally, we will be performing normal patching and software updates to the cluster, most notably bringing us up to Alma 9.6, BeeGFS 8.1, and GNU compilers version 15. Since this both a hardware shift and software upgrade, we will need the additional time to ensure everything is working. That said, we will endeavor to try to restore service and get the cluster back online as quickly as possible, hopefully well before the end of our scheduled end-time.

Planning ahead will make this transition easier, so please let us know if you have any questions or concerns about completing your jobs or retrieving your results before the maintenance window begins. We will endeavor to do this upgrade as quickly and efficiently as possible, as always we hope to have the cluster back by the end of the period, but check in on Slack for updates on timing.

Please Note that Hummingbird Office Hours will be suspended Thursday August 21 and 28, 2025. Tentatively scheduled to resume on September 4, 2025, contingent on the cluster status.

Winter Maintenance 2024

Hello Hummingbird user community, we will be offline for a few days in early January to perform our winter maintenance on the cluster.

We will be offline starting Thursday, January 2nd at 8am and will resume service by (latest) Sunday, January 5th at 8pm. Hopefully this will actually be a shorter window than scheduled, as we only need to do minor OS updates across the cluster (no big moves and shifts this time).

All cluster access will be shut off starting at 8am, so please ensure all jobs and file transfers are completed prior to this time. I will put the queues into DRAIN mode (no new jobs) the morning of the Tuesday December 31st at 8am (this will allow any remaining jobs 48 hours to complete).

If you have any questions or concerns, please reach out on Slack or open a ticket by emailing hummingbird@ucsc.edu.

Extended Summer Maintenance

Unfortunately, the Hummingbird maintenance window will be extended for another few days due to the complexity of the maintenance and the intervening holiday weekend. We made great progress, but there is still work to do. So far, we have performed the following major improvements:

  • External backup of all core operating system files (not user files)
  • Hummingbird has been upgraded to Alma Linux 9
  • Infiniband and BeeGFS drivers have been updated
  • Newest version OpenHPC has been installed and configured (HPC management software)
  • Most base packages for the login node have been installed

Remaining work:

  • Network changes pending verification
  • Node provisioning images are still under construction
  • Software modules reconstruction pending
  • Partition rebalancing / queue reconfigurations (we may possibly defer to a later time)
  • Verify cluster operations (after the above is completed)

While we expect the work to go smoothly, we cannot at this time give a definitive time and date for the restoration to full functionality. We will be posting regular updates to this email list as well as to the Hummingbird Slack channel and website (as appropriate).

Please feel free to contact us at hummingbird@ucsc.edu should you have any questions or concerns. We look forward to a new and improved Hummingbird soon!

Hummingbird Summer Maintenance 2024

The Hummingbird cluster will be going down for our Summer maintenance from Wednesday June 26 at 5:00pm through Monday July 8, 2024 at 9:00am. New job submissions will be restricted after 5pm on the first day of maintenance, but access for users who are retrieving results and copying data will remain available until Sunday June 30, 2024. Beginning on Monday July 1, 2024, access will be restricted for system upgrades until the cluster returns to service on or about July 8, 2024 at 8:00am.

Users are encouraged to retrieve results prior to the beginning on the maintenance window.

The main focus of this maintenance will be to upgrade the cluster’s operating system from CentOS 7 to Alma Linux 9. This upgrade is critical for security and continued support of the cluster, while also opening the door to additional features like the ability to run containerized workflows (i.e. – Docker and Singularity). CentOS 7 operating system goes into End-of-Life on June 30, 2024, at which point we will no longer be able to get security updates.

Planning ahead will make this transition easier, so please let us know if you have any questions or concerns about completing your jobs or retrieving your results before the maintenance window begins. We will endeavor to do this upgrade as quickly and efficiently as possible, as always we hope to have the cluster back by the end of the period, but check in on Slack for updates on timing.

Here is a detailed view of the maintenance steps:

  1. Create external backup of all core operating system files (note – this is not client research data; this is ONLY system-critical files)
  2. Break the mirroring of the boot drive (this allows for us to roll back to previous state easily if needed)
  3. Target one of the boot disks from the mirror pool; format the disk and install the new OS on it (This will be Alma Linux 9.4, which as an end of active support in June 2027)
  4. Install InfiniBand drivers (These need to be custom built to get the optimal functionality)
  5. Install BeeGFS drivers (This is how we communicate with our file storage system)
  6. Install and configure OpenHPC supplemental ecosystem (This provides SLURM and all the software needed to run the cluster)
  7. Verify cluster operations
  8. Return cluster to service on or before July 8, 2024
  9. Supplemental work (see below) with additional testing and verification:
    • Small reconfigurations in some of our queues
    • Restructuring our module system including removing no longer used modules
    • Adding additional cluster maintenance software to ease and enrich the services we can provide (potentially allowing for us to provide an open science gateway for web-based job submission)

Winter Maintenance 2023

Heads up! Hummingbird Winter Maintenance will happen from 21 DEC 2023 through 04 JAN 2024. The system will be off-line so that we can migrate data over to the new high-speed, parallel access file servers. Users will need to have all jobs completed and results copied off by midnight on 20 DEC 2023. Jobs that are still running at that time will be terminated.

Since we are copying over home directories, please lend us a hand by taking a few minutes to curate your home directory. If there are old data, or data that you are not using, please delete it to reduce the overall size of the data we must copy. Be aware that if there are any results or data that you really need to keep safe, it’s best to copy them off of Hummingbird well before the maintenance window is to begin.

We plan to resume normal operations by 05 JAN 2024 (start of the new term), but because we have over 200TB of data to copy, and we cannot resume operations until the copy is completed, we may run over into the weekend.

If you are unsure if you can complete your work by then, require assistance moving, copying or deleting data; or need help formulating checkpoints so you can efficiently resume a job after the maintenance period ends, please contact hummingbird@ucsc.edu to open a ticket

Short Maintenance Window: Wednesday September 20, 2023 at 7:00am

Dear Hummingbird Users,


Please be advised that we are conducting a short maintenance cycle on Wednesday September 20, 2023 from 6:00am to 7:00am.

During this window, we will be rebooting the cluster login node. At that time, you will not be able to login to the cluster. In-flight jobs will continue, but pending jobs may be disrupted. If you submitted a job before the maintenance window and it was in a pending status, please check to make sure the job properly launched after the maintenance window is announced to be closed. You can still log into hb-feeder to access files and move data, but please log in to that machine directly (not through the login node). Any “screen” or “tmux” (or similar) sessions should be closed manually before the maintenance window begins. Sessions left open will be terminated automatically on reboot.

Please feel free to email hummingbird@ucsc.edu if you have any questions or concerns regarding this maintenance window.