You are here: Home Centre For Innovation Registration and Charging Getting started with Iridis

Getting started with Iridis

Introduction

This guide assumes a working familiarity with Linux and batch systems. If you have any comments or questions please email support@einfrastructuresouth.ac.uk.

Logging On

Your username will be sent to you by email. The preferred hostname to use to login is eisgate.soton.ac.uk. If you have trouble connecting, please let us know using the email address above. From this gateway you should be able to access the login nodes.

Making this second step automatic

This bit is slightly more involved, in that you will need to adjust your local configuration settings. There are a number of ways to set things up, but we recommend using netcat and OpenSSH's ProxyCommand option. Our example, when properly modified, will work on most UNIX-like operating systems, including Macs, provided that OpenSSH is being used. It will also work for Windows, but is quite tricky to get working unless you use Cygwin or similar to give you a UNIX-like environment.

To get it working, you will first need to have generated your SSH key pair, and for the public key to be available in your ~/.ssh/authorized_keys file (one key per line) on both eisgate and Iridis. Once you have done this, you can set up your local (usually on your desktop, in a file called ~/.ssh/config) SSH configuration similar to the following (make sure you change the username and paths to the SSH keys):

## Example ~/.ssh/config

## Global settings
# These apply to all hosts, and are intended to help when the connection may be
# a bit flakey.  I'm using compression to keep the overall bandwidth down.
Host *
    Protocol 2
    Compression yes
    ServerAliveCountMax 5
    ServerAliveInterval 30

## Settings specific to Iridis and eisgate

# NB The SSH key being used must have it's private part on this local
# machine. Since that is passphrase-protected, the passphrase will be asked for
# twice, as we are really logging in twice in very quick succession. If you use
# an agent, such as ssh-agent or gpg-agent, then you should not see this.  You
# could also use different SSH keys for each of the two machines if you wished
# to do so.

Host eisgate
    User eisst666
    HostName eisgate.soton.ac.uk
    PreferredAuthentications publickey
    IdentityFile ~/.ssh/iridis_rsa

Host iridis
    User eisst666
    HostName iridis3_c.soton.ac.uk
    PreferredAuthentications publickey
    IdentityFile ~/.ssh/iridis_rsa
    ProxyCommand ssh eisgate nc %h %p

The key thing here is the ProxyCommand setting at the very bottom. This means that when you

ssh iridis
your SSH client first opens an SSH connection to eisgate, where it runs netcat (nc), passing it the hostname and port that SSH were originally passsed. This causes SSH to Iridis to be tunneled via the SSH connection to eisgate, turning eisgate into an almost transparent SSH proxy.

A nice side-effect of this type of set up is that commands such as scp and rsync work as you would expect, and you can effectively ignore the presence of eisgate from now on.

To do the same on Windows without something like Cygwin, you will have to check whether or not your SSH client supports the ProxyCommand option. If you are using PuTTY, you should notice that it has a "proxy" tab, at the bottom of which is the "local command" box. This is where the same command as above can be placed. Transferring files is a little more involved, due to the tunnel set up. This requires extra software, such as PuTTY's plink.exe. Further details on using this with an SCP client can be found detailed on other sites around the web, such as this one

Hosts

There are 3 Linux login nodes (each with two quad core 2.4 GHz Nehalem processors and 64 GB RAM), named iridis3_a.soton.ac.uk, iridis3_b.soton.ac.uk and iridis3_c.soton.ac.uk. These nodes are functionally equivalent so it does not matter which one you use. For security reasons only login via SSH is permitted, and this must be done using keys.

The cluster is generally referred to collectively as 'Iridis'.

Compute Nodes

There are 1008 compute nodes in total, with 924 of them available for use by members of the consortium. The compute nodes are diskless, with two 6-core Westmere processors (giving a total of 12 processor cores per node) and 22GB of memory available to the user.

File store

The GPFS parallel file store is provided by two IBM DS4700's each with 4 expansion trays filled with 1TB disks, giving ~55 TB of usable space per DS4700 based on RAID5 4+P. Each DS4700 is connected to two IO nodes.

The IO nodes are dual quad-core 2.5 GHz processors with 8GB of memory.

By default users have access to two folders named identically to their username, each one being found under the /home and /temp partitions. Users are allocated an initial quota of 100GB in /home and 500GB in /temp. The user's /temp directory can be used for more temporary working space. An increase in the quota's on /home and /temp can be requested if a genuine need arises (but we would expect you to compress large files that are not currently needed).

Note that quotas on Iridis 3 may also be applied to the maximum number of files (actually the number of inodes, which is almost the same thing). The reason for this is that a major expense of providing file store is the cost and time spent in backups - and this is as much related to the number of files you have as the total size of the files. If you do need to generate and store a lot of files we would advise using tar to reduce directories that are not needed for a while to a single file (you can always untar them into /temp if you need temporary access to them).

Network

IO & inter-node communication is via a fast Infiniband network. Management functions are controlled via a GigE network. The infiniband network is composed of groups of 32 nodes connected by DDR links to a 48 port QDR leaf-switch. The leaf switches then have 4 trunked QDR connections to 4 QDR 48-port core switches. (Giving 4 redundant pathways for extra bandwidth and resilience).

Submitting and Running Jobs

All long-running compute-intensive applications be should be run on the compute nodes (not the login nodes as this degrades performance for all users and makes you quite unpopular). There are over 1000 computational nodes which can be accessed via jobs submitted the PBS/Torque resource manager. Most jobs will be run in batch mode, though interactive jobs are possible. The order in which jobs are run is determined by the Moab scheduler.

Note that processes using more than an hour of dedicated CPU time on the login nodes will be killed automatically. Use an interactive job on a compute node with a longer time limit if you want to run compute-intensive jobs without affecting other users. In a similar vein, please do not run any memory-intensive or parallel processes on the login nodes, as this can cause a major nuisance to other users if you hog most of the processor or memory resources. Again an interactive job can be used for such cases.

A special queue, called 'consort', has been created for the use of consortium members. Simple jobs can be submitted thus:

qsub ­q consort my_script

Please consult the qsub manpage for detailed information on what flags it takes for jobs with more complicated requirements. These flags can also be placed within the job scripts themselves, preceeded by the characters '#PBS'. For example, this line placed within the my_script file:

#PBS -l walltime=30:00:00 
#PBS -q consort

is equivalent to

qsub -q consort -l walltime=30:00:00 my_script

If you specify both, then the resources requested on the command line take precedence.

MPI

A number of different MPI implementations are available on Iridis. Please check the available software for the one that you wish to use. MPI on Iridis works in the standard way, with job scripts calling mpirun or mpiexec as appropriate.

Software

We are using Environment Modules to provide us with a dynamic environment via the 'module' command. To see the entire list of installed software, issue:

module avail

For a sublist, you might wish to try something like “module avail <string>”, for example:

module avail amber

to see all installed versions of software starting with that string. Once you know which particular package you wish to work with, you can 'load' or 'add' it to your environment, using the “module load <package name>”, for example:

module load amber/10.0/intel

If at any time you need to see what you have added to your environment, you can issue the command:

module list

There are a great many other commands available, and 'module help' without any further arguments should give you a brief list of all of them.

Compiling

If you are using MPI and have successfully used the module command to add your chosen MPI implementation to your current environment, then you can use one or more of the provided wrappers to compile your program. These wrappers should ensure that the MPI libraries are linked appropriately.

For C, use mpicc. Similarly mpif90, mpif77 and mpiCC are provided for Fortran 90/77 and C++ respectively.

Example Jobs for Applications

The directory /local/software/examples contains examples of batch jobs for some common applications. Each example contains a README file, a sample job file and associated input files plus sample output. You can copy them to your own file space, to experiment with, using the command 'copy_example'.

If you just use the copy_example command on its own it will list which examples are available. You can then use the command again to copy the desired example. For instance:

copy_example amber

Will copy an AMBER example run to your own filestore, normally to ~/amber_example (where ~/ is shorthand for your 'home' directory). Change to this directory with:

cd ~/amber_example

The README file should tell you more about the example. To run this example as a batch job, use qsub to submit it:

qsub run_amber

The output files produced by the job should be similar to those in the subdirectory sample_output, and adapting them to your own job should be reasonably straightforward.