Gaussian 09

Introduction

Gaussian is a computational chemistry suite of programs used for electronic structure modelling. Gaussian is named after the type of orbitals used to speed up Hartree-Fock calculations – Gaussian orbitals rather than Slater-type orbitals. The software uses ab initio calculations to predict the energy, molecular structure, vibrational frequencies, and molecular properties of molecules and reactions in a variety of chemical environments. Key features of the software include investigations of molecules and reactions, predicting and interpreting spectra, and exploring diverse chemical arena.

The current version of the software installed on Hummingbird is Gaussian 09. The software runs in serial and in parallel. See here for the documentation for Gaussian09 (currently unavailable, May2022).

The Gaussian software has been installed on hummingbird but you have to ask to be added to the group of allowed users in order to run this software.

Parallelism with Gaussian

Please note that the version available on Hummingbird Systems does not have the Linda parallelization component. Parallel performance is however available within a single node, using the %NProcShared parameter in the Gaussian input file. For example, to run using 12-core nodes, so add the following line to the top of your Gaussian input file:

%NProcShared=12

Using Gaussian

Use module to manage access to software. To use the default version of Gaussian 09, type:

module load gaussian

To then make use of the application, run (for example):

g09 < test.com > test.out

Please run all your calculations out of a sub-folder under your $SCRATCH directory. Please do not run calculations in your home directory area.

Using Gaussian with SLURM on Hummingbird

We HIGHLY recommend that you make a copy of the example Gaussian SLURM file as the start point for running on the Hummingbird cluster. This file is located at /hb/software/scripts/g09.slurm

Access Restrictions

Gaussian is available to the general user community at UCSC subject to the License To Use Agreement. Note however, use is not enabled by default for all users, and thus you must request access in order to run this application.

Notes on Memory and Storage:

Some jobs, especially MP2, may consume large memory and disk storage resources. Instead of running these kinds of jobs in distributed memory Linda-parallel mode it might be better to use a shared-memory parallel approach. For larger systems Gaussian09 also allows a mixed-mode approach using shared-memory-parallelism within nodes and Linda only between nodes.

Using shared memory parallel execution can save a lot of disk space usage (roughly eight times), since tasks on the same node will share a copy of the scratch file whereas each Linda-parallel task will create its own copy of the scratch data file. The savings of up to a factor of eight can be quite significant because the minimum disk required for MP2 frequencies is a multiple of N^4 (where N is the number of basis functions).

For a one-node job (eight cores) use, for example, something like:

%mem=16gb
%nprocshared=8

and for multiple nodes job (for example, two nodes), use something like:

%mem=16gb
%NProcShared=8
%NProcLinda=2

The parameter NProcLinda should equal the number of nodes used for your job. The total number of the processors used to run the g09 job is NProcLinda X NProcShared.

For very large jobs, you might consider setting two Gaussian09 parameters, %Mem and %MaxDisk, that affect the amount of memory and disk, respectively, in order to produce good general performance. For the types of calculations that obey %MaxDisk, the disk usage will be kept below this value. See the Gaussian Efficiency Considerations web page for details. There are some examples in the directory $g09root/g09/tests/com.

When using multiple processors with shared memory, a good estimate of the memory required is the amount of memory required for a single processor job times the number of cores used per node. In other words, %Mem represents the total memory requested for each node. For distributed memory calculations using Linda, the amount of memory specified in %Mem should be equal to or greater than the value for a single processor job.

When setting %mem, remember that some memory will be used by the operating system. Also Gaussian needs some memory on top of what you reserve for data with %mem. So for example on Edison, of 64GB memory on the node, about 61GB is available to jobs. Gaussian will use a few GB more, so if you set %mem much higher than about 55GB, it may fail due to being unable to allocate memory.

UC Santa Cruz Research Computing