National Computational Infrastructure
NCI National Facility
Newsletter September 2002

Table of Contents

Summer School on High Performance Computing The APAC National Facility is hosting a Summer School on High Performance Computing during the week December 2-6, 2002, at the Australian National University. The workshop will be of benefit to students who already have some experience in writing computer code and who are using or planning to use intensive computation in their research.

Applications close on Sept 16.

OzVis02 in Sydney The second OzVis will be held in Sydney on Dec 3-4, 2002. This is a workshop held by the Australian Visualisation and Virtual Reality Interest Group to follow on from the first workshop held at ANU last year. A call for abstracts has gone out and further details on the workshop will follow.
IVEC invitation for visitors Outstanding researchers, academics, scientists and technicians in the fields of high performance computing and visualisation are invited to express their interest in participating in a Visitor's Scheme conducted by the Western Australian Interactive Virtual Environments Centre (IVEC) based in Perth, Western Australia.

The Western Australian Interactive Virtual Environments Centre (IVEC) is a joint venture of Central TAFE, CSIRO's Division of Exploration and Mining, Curtin University of Technology (Curtin), and The University of Western Australia (UWA). IVEC is a Partner in the Australian Partnership for Advanced Computing (APAC) and a Foundation Member of the Centre for Networking Technologies for the Information Economy (CeNTIE).

As part of its participation in APAC's Education Program, IVEC has committed to conduct a Visitor's Scheme coordinated with the other APAC Partners. It is expected that up to three Visitors will be funded over the period October 2002 to November 2003. To launch the Scheme, IVEC is seeking Expressions of Interest (EOI) from prospective Visitors interested in participating.

Further details can be obtained from IVEC at the link in the left column.

I/O using /short on the SC Users should be aware that /short is a global filesystem shared by all 500 cpus of the system and accessible to compute nodes only over the network. As such it is not suitable for "general purpose IO" - in general, the interval between accesses to /short in your program should be many minutes. If you are doing IO more often than this, please use /jobfs instead. Copy the active files (preferrably in one large tar file) to /jobfs at the start of the job and copy the results back to /short when completed. It is quite likely a mixture of /short and /jobfs use is best - checkpoint files should always be written to /short.

/short can sustain very high IO rates if accessed correctly. See the relevant user guide section for general details on SC filesystems as well as details of using /short and other filesystems for high performance IO. All these pages can be summarized by saying that the only filesystem suitable for small frequent IO is /jobfs while other filesystems require IO done in multi-megabyte chunks for good performance. Contact us at help@nf.apac.edu.au for advice on the best option for your job's IO.

New Nodes Installed An additional 5 ES45 nodes (20 processors) have been added to the system with another 2 to be added shortly. These additional nodes have 4Gb of memory and will eventually bring the total number of cpus to 508.
APAC NF Courses Staff of the APAC National Facility offer several courses to help users obtain the maximum benefit from their time allocation on the SC. These courses can be run at any of the APAC partner organisations whenever there is sufficient demand. Please contact us if you are interested in attending or hosting a course.
Software Updates There have been a number of updates to software packages on the system. (Software available is listed here.) Remember to set the PBS software flag for your package to ensure that the package is available when your job runs on a node. The PBS software keyword flag is listed for each package under the appropriate software web page.

The HDF5 data format now includes Zlib and Fortran support.

The optimisation package CPLEX from ILOG has recently been installed. Details on using CPLEX are available on the software web page.

Abaqus An older version (Abaqus-5.18-17) has been installed on the system to allow some old projects to function. We have noted that this version does not appear to work in parallel, and we therefore suggest people move to the latest version of abaqus as soon as possible.

Gaussview will be released soon, and will allow Gaussian users to graphically view output files.

Gaussian:People with an interest in testing Gaussian with Linda support should register their interest by emailing help@nf.apac.edu.au.

NW-Chem: As part of the work with the Chemistry expertise program, HP/Compaq and ANU have been collaborating to make NW-Chem available on the SC. We believe that a working version will be available after the SC operating system upgrade in September. At this stage we plan to install two versions of NW-Chem (4.0.1 and 4.1). There are some issues with some methods in both of these versions. We therefore ask that people interested in using NW-Chem contact us at help@nf.apac.edu.au, and specify the type of calculation you may want to perform.

The default Molpro has now been updated to Molpro2002.3. Details on using molpro are on the software web page.

Our trial license for Qchem has now expired. People with interest in using Qchem on a more permanent basis are encouraged to fill out the software request form.

CT&T Expertise Program The APAC Computational Tools & Techniques Expertise program is being developed through the National Facility. Ben Evans will be visiting researchers from MAS and partner projects to discuss their requirements for tools and techniques, with the main focus being current or potentially large HPC projects. If you have an interest in discussing this further, please contact him on Ben.Evans@anusf.anu.edu.au.
Advance notice of downtime Advance notice is given of an expected downtime in mid September to allow for a system upgrade. Further details will follow.
GrangeNet The APAC National Facility will be connected to GrangeNet in the not too distant future. GrangeNet is a joint venture between APAC and AARNet (and commercial partners) to provide a high bandwidth service to HPC users across the eastern states and CSIRO, funded by the Federal Government. GrangeNet will be connected through the local AARNet Regional Network Organizations. Futher details on GrangeNet and participating institutions can be found on http://www.grangenet.net

One of the GrangeNet activities is in the area of data grids, There are two staff co-located with the National Facility who are working on GrangeNet data grid demonstrator projects. They may be able to assist with projects that intend to manipulate and transfer large data sets. In the first instance, contact help@nf.apac.edu.au if you think your project may be a suitable candidate.

Users are reminded that (new) requests for more than 20 Gbytes of storage on the mass data storage system should be made by using the form at http://nf.apac.edu.au/accounts/forms/massdata.php

Job turnaround and batch usage The following information is repeated from the previous newsletter.

Users often ask for an explanation of the time it takes for their jobs to run as their experiences have changed as the SC has been more heavily used. Typically there are over 1000 cpus worth of jobs on the system queued, running or suspended and there are fewer than half that many physical cpus. So the average turnaround delay will be roughly equal to the requested runtime. In general, single cpu jobs will not queue for long, but will be suspended for parallel jobs. And parallel jobs may queue for some time until the requisite number of cpus are available. Overall, the priority regime of the queuing system aims to have no jobs delayed significantly more than others (in terms of percentage of the requested time).

Because of the increasing demand, we have had occasion to suggest to users that they should ensure that parallel jobs are using cpus sufficiently well to justify the number requested, for example a 4 cpu job with %cpu < 40 might be better run as a 2 cpu job. Note that IO can severely impact the efficiency of parallel programs. We will continue to promote efficient use of the system to ensure equal access for all users.

As much as possible, batch job requests for all resources (memory, number of cpus, walltime and jobfs) should reflect the requirements of the job as closely as possible. Your job will not start until the requested resources are available so an excessive request may delay your job start. While running, your job will have dedicated access to the resources requested so an excessive request may unnecessarily delay other users' jobs. In particular excessive memory and/or jobfs requests may result in your job tying up a node with large memory or disk whilst jobs in genuine need of these resources are left queued.

For parallel users, remember that any parallel job requesting more than 4 cpus must ask for a multiple of 4 cpus to facilitate scheduling of batch jobs.

Email problems, suggestions, questions to