National Computational Infrastructure
NCI National Facility
Newsletter August 2005

Table of Contents

Status of the SGI Altix Cluster We apologize to users that the SGI AC service has not been as reliable as both we and users would like. We believe the SGI engineers are getting close to resolving the remaining problems. All the 1664 processors are now fully installed, so throughput of jobs should improve.

There will be a major system downtime starting 8:30am on Friday 19th August extending over the weekend. This will complete the NUMALink interconnect. After this it will be possible to run parallel jobs of up to 1536 processors.

The login node and 128 cpus will be available again later on Friday. The rest of the system will be back in production by Monday 22nd August and hopefully earlier.

Grant allocations and charging Users are reminded that allocations which were made in 'SC equivalent' hours for the second half of this year have been scaled downwards by a factor of 2 to partially reflect the increased speed per processor of the AC. Note that time used on the LC is now being charged at half the rate of time used on the AC.

The APAC board has decided that for the purposes of indicating the in-kind value of grants to bodies such as the ARC, a figure of $0.50 per processor hour on compute systems should be used.

Passwords on APAC machines When users are given details of their account on the APAC AC or LC they are given a password and asked to change it when they first log on. The new password should be at least 8 characters long and include a mixture of upper case, lower case, numerals and characters that are neither letters or numerals. Please do not use recognisable words or names in your password or consecutive symbols on the keyboard.

For security reasons we ask that passwords are not communicated in emails or recorded where they can be easily seen.

Restriction on number of processors There is currently a restriction in place limiting parallel jobs requesting more than 8 CPUS to multiples of 8. This restriction is necessary for efficient scheduling of a mix of parallel jobs of various sizes and single processor jobs. It is intended to be a temporary constraint and will be lifted as soon as possible.
MPI on the AC The Message Passing Interface, MPI, is implemented on the AC using SGI's Message Passing Toolkit, MPT. To use this effectively under the PBS queuing scheduler for batch jobs there is a locally written wrapper function. This provides alternative mechanisms for starting MPI jobs, for example, both mpirun and prun. At login to the AC, users have modules for the current MPT and the mpirun command loaded.
OpenMP on the AC
  • To ensure that OpenMP jobs are scheduled on the same shared memory segment you should always use -lncpus=N:N in your qsub request.
  • OpenMP codes should be timed using differing numbers of threads to investigate the optimal number of threads for efficient use. There are several reasons for this. As the processors on the AC are faster than those of the SC the cost of starting up the threads can outweigh the cost of the computations in the loop. Make sure that you do as much computation as possible in each loop to balance the start-up cost. The shared memory programming style assumes uniform memory access across the entire memory address space. This is not true for NUMA architectures such as the Altix. To get good performance from OpenMP the code must ensure that data accessed by each process is local to the processor on which that process is currently executing. As the Altix uses a first-touch policy for data allocation you should ensure that data is initialised using the same distribution as it is going to have in later computations.
Intel compilers on the AC Several versions of the Intel compilers are installed on the AC as different updates of the Intel 7, 8 and 9 compilers. By default, users have version 8.1.029 of the Intel 8 Fortran compiler and version 8.1.033 of the Intel 8 C/C++ compiler loaded at login. This can be seen by typing module list. The full list of versions can be seen by typing module avail intel-fc or module avail intel-cc. There are also several versions of the Intel MKL library installed on the system. You can change to any other version using the module swap command. Some user code has proved to be sensitive to the version of the compiler used so you may need to experiment with different versions.
Floating Point Exceptions One of the major practical differences users are noticing between the SC and the AC is that, by default, programs do not generate floating point exceptions (FPEs). For example, a program performing a divide-by-0 will generate an IEEE floating point Inf result and continue executing, possibly propagating the exceptional value throughout their solution. On the ia64 processor of the Altix, floating point operations on these exceptional values are handled in software leading to slow execution as well as results that may be of dubious value.

Most users expect their programs to crash (generate a floating point exception) in these circumstances. Forcing this behaviour on AC depends on which language you use:

  • Fortran: Use the ifort compiler option -fpe0. Please read the ifort man page to understand the various -fpe options. On the AC system we have changed the default compiler behaviour from -fpe3 to -fpe0, so be aware of this when porting code.

  • C/C++: Use the C99 floating point rounding and exception handling routines. See man fenv for details. As an example:
                     #define _GNU_SOURCE   // needed for icc but not icpc
                     #include < fenv.h >
    
                     ....
    
                     feenableexcept(FE_DIVBYZERO | FE_OVERFLOW);
                 
Debugging with Totalview The Totalview license allows debugging of parallel jobs to 24 processors but these processors must be on the one host. Use the qsub request -lncpus=N:N if N is greater than 8. It is possible to debug programs using more processors than this but contact help@nf.apac.edu.au for details.
SIESTA on the AC As part of the APAC Computational Tools & Techniques program, the computional chemistry package SIESTA (Spanish Initiative for Electronic Simulations with Thousands of Atoms) is now being supported on the AC. The SIESTA project is principally being supported via IVEC assisted by the APAC-NF. For more details on SIESTA please see our software page http://nf.apac.edu.au/facilities/software. If you have an interest in using SIESTA please contact us.
Snark installation Also under the APAC CT&T program, the geophysics framework Snark will soon be available on the AC. Snark is principally being supported through VPAC and assisted by the APAC-NF. For more details on Snark please see our software page http://nf.apac.edu.au/facilities/software. If you have an interest in using Snark please contact us.
Software update New software is continually being added to the AC. Details on using software is given at http://nf.apac.edu.au/facilities/software/
Some packages have restricted access and these are marked accordingly and information given on the software web page.
Users are informed of new installations or updates in the message of the day at login.
Requests for new packages or updates can be made using the software request form.
APAC05 Details of the APAC05 conference, Empowering Research Communities are given at www.apac.edu.au/apac05.
There are also workshops on Nimrod, the Access Grid, Grid Portals and the Globus Toolkit and a dedicated student forum.
APAC Courses Staff of the APAC National Facility provide a range of training courses on using the National Facility machines and programming techniques. The full list is given on the training web page. These courses are currently being updated to be relevant to the Altix Cluster.

If you would like any of these courses given at your partner site please contact us and it can be arranged. As these are hands-on courses we ask that a representative from the site arranges for access to a computer teaching lab with web access and a method of obtaining secure shell logins to the AC.

Email problems, suggestions, questions to