========================= Running jobs on saguaro ========================= Etiquette ========= In general: * Do not run simulations or any other CPU-intensive calculations on the head node for longer than about a minute. All computation must be done through the batch queuing system. * Your disk space is limited (about 1 GB) so you need to copy off data to a local machine (laptop, iMac) and delete data. You can use the /scratch directory (unlimited) but files there are deleted automatically after 30 days. * Don't snoop around in other peoples' directories or copy files without express permission. * The use of University computer equipment is only allowed to carry out work assigned to you by your instructor, line manager, or laboratory head. Private use is not allowed. The Computer, Internet, and Electronic Communications Policy (`ACD 125`_) of the University applies and must be complied with. In particular for this course: * Do not run jobs for longer than 24h per run. * Do not use more than 4 cores ("nodes") per run unless you have my explicit permission. * Do not run more than 2 jobs concurrently unless you have my explicit permission. * Only use the course billing account "phy598s113" for work related to the class (see below under `Queuing system`_). (The course only has a limited number of CPU-hours available and these rules help to prevent someone---possibly accidentally---wasting the whole course's allocation of 10,000 CPU-h. By the way, one CPU-h costs $0.01, i.e. a 4-core run for 24h is about $1.) Log in ====== Log in to the head node of the saguaro cluster via `ssh`_:: ssh -l ASURITE saguaro.fulton.asu.edu You need to provide your ASU password to log in. This is your own home directory and your own work space, tied to your username (your ASURITE id). Only you can write here. Make a directory for today's practical in your home directory on saguaro:: mkdir P12 .. Note:: Any *Linux* or *Mac OS X* system will have :program:`ssh` (and :program:`scp`) installed. However, *Windows* users will have to install a ssh client. The free `PuTTY ssh client`_ is highly recommended. Copying files ============= Use the `scp`_ command to transfer individual files (in the following, *ASURITE* is a placeholder for your own ASURITE ID, which acts as your user name on saguaro):: scp saguaro_gromacs.pbs ASURITE@saguaro.fulton.asu.edu:P12 or whole directories:: scp -r Argon_input_files ASURITE@saguaro.fulton.asu.edu:P12 You can also use ``scp`` to copy results back:: scp -r ASURITE@saguaro.fulton.asu.edu:P12/MD . .. SeeAlso:: * On saguaro you can also use `curl`_ to get files directly from URLs. * The `rsync`_ command is in many ways more comfortable than `scp`_ but it's also more complicated. Software on saguaro =================== Gromacs 4.5.5 on saguaro ------------------------ I compiled two versions of Gromacs on saguaro: one to run simulations (the :ref:`"MPI version `"), the other one to run a short analysis (the :ref:`"serial version `" You have my explicit permission to use theses version of Gromacs and look around in my Library directory. .. _serial-version: Serial version for quick analysis ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ The serial version of Gromacs should only be used for trying out very short runs or a quick analysis. If it takes longer than a minute then it should be submitted as a job (but if you submit analysis, make sure that you only use a single core, i.e. ``#PBS -l nodes=1`` in your script). To use the serial (i.e. non parallel) version of Gromacs:: . /home/obeckste/Library/Gromacs/versions/serial-4.5.5/bin/GMXRC You can then run :program:`grompp`, :program:`g_msd`, etc... .. _MPI-version: MPI version (for simulations) ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ I compiled a parallel (MPI-enabled) version of Gromacs 4.5.5 for saguaro. To use it:: module load openmpi/1.4.5-intel-12.1 . /home/obeckste/Library/Gromacs/versions/4.5.5/bin/GMXRC Note that this will *not work on the login node* because use of the MPI libraries is restricted to compute nodes. However, it *will* work as part of a `Gromacs PBS script for saguaro`_. "MPI" stands for "`message passing interface`_" and is a protocol that supports writing parallel code that can run on thousands of CPUs. We are using the `Open MPI`_ implementation of the MPI library. Modules ------- The `module system`_ makes many different software packages available. The `module`_ command can be used to learn which programs are available: See :: module avail for a list. The ``module load`` command then loads the software into your environment, e.g. :: module load openmpi/1.4.5-intel-12.1 and :: module list shows what you have loaded. * We need the MPI library and Intel compilers for our version of gromacs, hence ``module load openmpi/1.4.5-intel-12.1``. * A2C2 staff compiled a version of gromacs (``module load gromacs/4.5.4``) but it uses double precison arithmetic and is only half as fast as the version we are using. Queuing system ============== Instead of directly running a calculation, you write a small shell script and hand this script over to a *batch queuing system*. Saguaro uses the OpenPBS_ queuing system. Workflow -------- The typical workflow is 1. prepare input files in a work directory (If you will generate substantial amounts of data (>500MB) then this should be done in a scratch directory in ``/scratch/ASURITE``) 2. adapt a queuing system script (see below for an example ``saguaro_gromacs.pbs``) 3. submit the job to the queuing system:: qsub saguaro_gromacs.pbs 4. monitor the status of your jobs :: qstat A "Q" means that the job is waiting in the queue, "R" is running, "C" is complete. 5. Once your job is complete, look at the output. If it failed, debug. 6. Copy back a complete job to your own disk space (laptop, iMac workstation). Important queuing system commands --------------------------------- :program:`qsub` submit a script to the queuing system, known as "submitting a job"; when the job is launched successfully, the job id will be printed :program:`qstat` check the status of jour job(s); shows a list of job ids and job names together with their status ("Q" for still waiting in queue, "R" for running, "C" for complete) :program:`qdel` terminate a running job: ``qdel JOB_ID`` (you will only be billed for the CPU-h the job has consumed so far) Note on allocations and CPU-h ----------------------------- The system keeps track of how many CPU-h are being used. They all come out of the course's account, *phy598s113*. You **always have to provide the account name when running a queing system script**. You do this (see below) by either providing the ``-A account`` flag to ``qsub``:: qsub -A phy598s113 saguaro_gromacs.pbs or (simpler!) by adding a line to the script that automatically sets the flag:: #PBS -A phy598s113 near the top of the script. In fact, you can add many additional ``qsub`` options to a script by starting a line with ``#PBS``. See ``man qsub`` on saguaro for more options. Gromacs PBS script for saguaro ------------------------------ You can use the following *queuing system script* to run our version of Gromacs on saguaro:: #!/bin/bash #PBS -N GMX_MD #PBS -l nodes=4 #PBS -l walltime=00:10:00 #PBS -A phy598s113 #PBS -j oe #PBS -o md.$PBS_JOBID.out # host: saguaro # queuing system: PBS # max run time in hours, 1 min = 0.0167 WALL_HOURS=0.167 DEFFNM=md TPR=$DEFFNM.tpr LIBDIR=/home/obeckste/Library cd $PBS_O_WORKDIR . $LIBDIR/Gromacs/versions/4.5.5/bin/GMXRC module load openmpi/1.4.5-intel-12.1 MDRUN=$LIBDIR/Gromacs/versions/4.5.5/bin/mdrun_mpi # -noappend because apparently no file locking possible on Lustre # (/scratch) mpiexec $MDRUN -s $TPR -deffnm $DEFFNM -maxh $WALL_HOURS -cpi -noappend You will have to change parameters, depending on how you want to use it. * give your job a name (instead of "GMX_MD") --- very useful when you run many jobs and need to check on them with ``qstat``. * adjust the run time of the job (both in the ``-l walltime=HH:MM:SS`` line and in the ``WALL_HOURS=hours`` (where ``hours`` is a decimal number). Your job will only run this long but it will shut down cleanly (``mdrun`` will stop itself after 0.99*hours). * modify the default filename variable ``DEFFNM`` and the filename of your TPR file (``TPR``) * Note that your output files will look like ``md.part0001.xtc``: This is due to the ``-noappend`` flag for `mdrun`_, which we need for "continuation runs" (``-cpi`` flag), i.e. continuing a simulation seamlessly after it ran out of time. If you're confident that your simulation will complete in the allocated time then you may remove the ``-noappend`` flag. PBS and accounts ---------------- Saguaro has OpenPBS_ installed. Note that there many different queuing systems that all implement slightly different version of ``qsub`` and friends so you need to read the local man pages (``man qsub``). Some other useful commands on saguaro:: showq shows the current queue (all waiting jobs), :: showq -i all jobs. You can see how many hours the course has still got with the :: mybalance command. If you see other projects then that means that you are also enlisted in another research group with allocations on saguaro. In this case, check which one is your default project (the one that gets billed if you don't use the ``-A account`` flag):: mybalance -d You can also see the default project with :: glsuser $USER .. Links .. _ssh: http://linux.die.net/man/1/ssh .. _scp: http://linux.die.net/man/1/scp .. _rsync: http://linux.die.net/man/1/rsync .. _curl: http://linux.die.net/man/1/curl .. _module system: http://modules.sourceforge.net/ .. _module: http://linux.die.net/man/1/module .. _OpenPBS: http://www.pbsworks.com/ .. _mdrun: http://manual.gromacs.org/online/mdrun.html .. _PuTTY ssh client: http://www.chiark.greenend.org.uk/~sgtatham/putty/ .. _ACD 125: http://www.asu.edu/aad/manuals/acd/acd125.html .. _message passing interface: http://en.wikipedia.org/wiki/Message_Passing_Interface .. _Open MPI: http://www.open-mpi.org/