The principle behind the batch queue system is that the user prepares a submission script for their job which identifies the resources they need - here we will worry only about cpu time. Vanguard will then read the submission script and attempted to locate a free cpu for you, if none are available it will hold your job in a queue until a cpu is available for you to use automatically. Jobs should only need to be submitted once and will run as soon as the resources are available for them. Jobs started interactively on the cluster (eg you do an rsh albion then g03 myfile) will only run for 5 minutes before stopping. The batch management is via Torque (based on OpenPBS) and maui software and is managed by vanguard. Please don't run jobs on vanguard or even processes there (such as molden etc) as vanguard is an extrememly busy machine.
The batch queue is currently running (01st Oct 2011) the queues below:
|
Queue name |
Length |
Number of nodes in queue/ size of memory per machine/cores per node | Default
# of nodes/ Default # cores per node |
Max No of running jobs / user at once |
| parallel | 28 days | 51 / 4Gb/ 4 | 1 / 1 |
16 |
| quad | 28 days | 30 / 8Gb / 4 | 1 / 4 | 8 |
| octa | 28 days | 8 / 16Gb / 8 | 1 / 8 | 4 |
| bigmem | 28 days | 8 / 16Gb / 8 | 1 / 8 | 4 |
| dellr610 |
28 days |
12 / 32Gb / 12 |
1 / 12 |
6 |
There is also a total load of 24 running jobs and up to 128 processors at any one time by a single user. Jobs that are queued accumulate a priority score based on how long they have been queued and how much cpu the user has used recently so that the cluster is shared out equally to all users when under a high load. I.e if you have not used the cluster alot recently and your job has been queued as long as another user who has used lots of cpu recently then your job should have a higher prioroty than theirs for starting next. When under a low load the cluster will attempt to run as many jobs as possible regardless of previous use by a user. All these numbers can be changed if they do not allow efficent use of the cluster. All jobs wanting to run on the cluster must be submitted to one of the above queues using a submission script examples of which are given below. If all the nodes for a queue are in use then the cluster will automatically assign unused nodes from other queues to the 'full' queue. All input files and output files will need to be referenced in the submission script to make them available for the jon run and to return the results back to you at the end.
Note that for all queues except the parallel the default number of processors is equal to the total on one node, for the parallel queues the default is 1 processor allowing 4 jobs to run simultaneously.% qsub job.in(where job.in is the submission script), this will return a line like
% qstatand it'll return something like
28.vangaurd gau_test jon 00:01:09 R longThe R means running, a Q would mean queued but not running. It also displays the walltime that the job has taken (here 1min 9sec), other use of qstat with different options may give cputime (which is affected by number of cpus) or a number that is very very low because of the use of more than one node.
....
....
% qstat -qThis will return the time limts for each queue, the number of jobs running, the number queued.
% qstat long
% qdel 28.vanguard(or whatever you JOBID was)
% qstat -nwill return a list of jobs and their hosts, you can then rsh to that machine and find your running output files in a directory like name.number where name is your username and number is your JOBID number.
% checkjob 28(or whatever your JOBID was) will return a reason for why your job is still queued and not running.
rcp vanguard:$PBS_O_WORKDIR/gau_input.com .To copy the output back at the end of the calculation:
rcp gau_input.log vanguard:$PBS_O_WORKDIR/.
It is recommened that all jobs that first submitted to the express queue to check that the input files are correct before submitted to the longer queues.
Options of the batch queues submission scripts are normally listed at the top of the script and always start on a new line and with the characters #PBS, any lines being with # are assumed to be comments. There are two principle ways to send a input file to the cluster, embed it within the submission script (eg gaussian) or to prepare it beforehand and have the submission script copy it to the cluster at the right time.
#PBS -q parallelit is important to pick the right queue because if all jobs are submitted to long then the cluster system can not schedule the jobs efficently.
#PBS -N gln248_free
The names and loactions of these files can be changed using
something like:
#PBS -o wat_mp2.outWhich could be combined and simplified to
#PBS -e wat_mp2.err
#PBS -j eo
#PBS -e wat_mp2.err
# copy the input file from the directory the job was submitted from
# on vangaurd to the $WORKDIR (we assume user has already done cd $WORKDIR)
rcp vangaurd:$PBS_O_WORKDIR/fred.in .
# copy the output file to same directory on that we submitted from
rcp fred.out vangaurd:$PBS_O_WORKDIR/.
rcp vangaurd:$PBS_O_WORKDIR/fred.in .with
rcp juno:$PBS_O_WORKDIR/fred.in .or
rcp jaguar:$PBS_O_WORKDIR/fred.in .
# Name of job
#PBS -N gau_test
# combine std out and err and send them back to the
# working directory
#PBS -j eo
#PBS -e gau_test.err
# Queue to use
#PBS -q parallel
#PBS -l nodes=1:ppn=4
# Gather some details about the host for reference later
echo Running on host `hostname`
echo Time is `date`
# move into the temp directory for this job
cd $WORKDIR
echo Working directory is $WORKDIR
# prepare the g03 input file
# this next line means copy everything below until we see
# EOF into a file called blyp.com
cat > blyp.com << EOF
% chk=blyp
% mem=3500mb
% nproc=4
#P blyp/6-31+g* SCF=DIRECT maxdisk=1000000
Water test case - v bad geometry
0 1
O
H,1,r21
H,1,r21,2,a312
Variables:
r21=0.99
a312=104.5
EOF
# The EOF above marks the end of the g03 input file
# Run the job
g03 blyp
# copy the output files back to vanguard the submission directory
rcp blyp.log vanguard:$PBS_O_WORKDIR/.
This submits a job gau_test to queue parallel on the cluster and would return the output files and the checkpoint file to the vanguard directory where it was submitted from. The standard out and standard error of the jobs will also be returned to the directory the job was submitted from.
For parallel Gaussian add the lines
#PBS -l nodes=1:ppn=X
% nproc=Xwhere X is 4 for the parallel and quad queues, 8 for the octa and bigmem queues and 12 for the dellr610a queue and then submit
| Queue | %nproc | %mem |
| parallel |
4 |
3500mb |
| quad |
4 |
6500mb |
| octa/bigmem | 8 | 14000mb |
| dellr610 |
8 |
28000mb |
| dellr610a |
12 |
28000mb |
To run a charmm jobs things are alittle more complex because of the number of input and output files, however do note that we are actually going to run two charmm jobs from within one submission script, this is useful for trying to run follow on jobs with charmm or anyother program:
# Name of job
#PBS -N dyn_test
# names of std out and std err to be sent back to this directory
#PBS -j eo
#PBS -e charmm_test.err
# Queue to use
#PBS -q parallel
#PBS -l nodes=1:ppn=1
# Gather some details about the host for reference later
echo Running on host `hostname`
echo Time is `date`
# move into the temp directory for this job
cd $WORKDIR
echo Working directory is $WORKDIR
# copy the input and parameter files over
# param files come from a parameter directory
# other files from the current working directory
rcp vanguard:params/top.inp .
rcp vanguard:params/par.inp .
rcp vanguard:$PBS_O_WORKDIR/min1.i .
rcp vanguard:$PBS_O_WORKDIR/dyn.i .
rcp vanguard:$PBS_O_WORKDIR/ige3.psf .
rcp vanguard:$PBS_O_WORKDIR/ige3a.crd .
# Run the job
export runc34=/home/.2/charmm34b2/exec/gnu/charmm34.pgi64
$runc34 < min1.i > min1.o
$runc34 < dyn.i > dyn.o
# copy the output files back to vanguard, where the submission took place
rcp min1.o vanguard:$PBS_O_WORKDIR/.
rcp min1.crd vanguard:$PBS_O_WORKDIR/.
rcp dyn.o vanguard:$PBS_O_WORKDIR/.
rcp dyn.crd vanguard:$PBS_O_WORKDIR/.
This submits a job called dyn_test to the parallel queue, when the job runs it moves the job into a directory and copys files in from vanguard, charmm is then run at at the end the output files are copied back to vanguard. We then clean up the scratch directory.
For 8 cpu parallel charmm you must submit to the job to the parallel queue and the script must be altered as in the example below.
# Name of job
#PBS -N test_solv
# names of std out and std err to sent back to this directory just use
# the names without the preceeding path
#PBS -j eo
#PBS -e errors
# Queue to use
#PBS -q parallel
#Request 2 nodes with 4 processors per node i.e. total 8 cpu's
#PBS -l nodes=2:ppn=4
# This jobs working directory is set below
echo Running on host `hostname`
echo Time is `date`
cd $WORKDIR
echo Working directory is $WORKDIR
#copy some file
rcp vanguard:$PBS_O_WORKDIR/dimer.psf .
rcp vanguard:$PBS_O_WORKDIR/test9.crd .
rcp vanguard:$PBS_O_WORKDIR/test9.res .
rcp vanguard:$PBS_O_WORKDIR/test10.i .
rcp vanguard:$PBS_O_WORKDIR/top_all27_prot_na.rtf .
rcp vanguard:$PBS_O_WORKDIR/par_all27_prot_na.prm .
# Set up the parallel environment
export PATH=/usr/local/mpich2-1.2.0_pgi902/bin:$PATH
export charmm=/home/.2/charmm35b3/exec/gnu/charmm.pgi90.mpich1.20.qchem
cat $PBS_NODEFILE | sort | uniq > mpd.hosts
mpdcleanup -f mpd.hosts
mpdboot --rsh=/usr/bin/rsh -n `wc -l < mpd.hosts`
# Run the job
mpiexec -n `wc -l < $PBS_NODEFILE` $charmm < test10.i > test10.o
#release the nodes from the mpi system
mpdallexit
# copy the output files back to vanguard do not overwrite
# the starting crd/restart file
/bin/rm test9.crd test9.res
rcp *.o vanguard:$PBS_O_WORKDIR/.
rcp *.crd vanguard:$PBS_O_WORKDIR/.
rcp *.traj vanguard:$PBS_O_WORKDIR/.
rcp *.res vanguard:$PBS_O_WORKDIR/.
Hopefully this should be all that is needed. The cluster is set up so that machines on the same switches will be utalised for parallel runs as far as possible.
Parallel work using Amber9 can be performed, for one cpu (serial) work it should be easy to see what changes are needed.
# Name of job
#PBS -N wt_cole7_equil
# combine std err and out and send them back to the submission dir
#PBS -j eo
#PBS -e equil.err
# Queue to use
#PBS -q parallel
#PBS -l nodes=2:ppn=4
# This jobs working directory is set below
echo Running on host `hostname`
echo Time is `date`
cd $WORKDIR
echo Working directory is $WORKDIR
#copy some file
rcp vanguard:$PBS_O_WORKDIR/wt1.res .
rcp vanguard:$PBS_O_WORKDIR/wt.top .
rcp vanguard:$PBS_O_WORKDIR/equil.in .
# clean nodes, prepare the nodes file and link the nodes togeather
cat $PBS_NODEFILE | sort | uniq > mpd.hosts
mpdcleanup -f mpd.hosts
mpdboot --rsh=/usr/bin/rsh -n `wc -l < mpd.hosts`
# Run a parallel mpirun job
mpiexec -n `wc -l < $PBS_NODEFILE` \
$AMBERHOME/exe.pgi64/sander.MPI -O -i equil.in \
-o equil.out -c wt1.res -r equil.res -x equil.traj \
-inf wt1.inf -p wt.top -ref wt1.res
#Clean up the nodes
mpdallexit
# copy the output files back to vanguard
rcp equil.out vanguard:$PBS_O_WORKDIR/.
rcp equil.res vanguard:$PBS_O_WORKDIR/.
rcp equil.traj vanguard:$PBS_O_WORKDIR/.
rcp equil.inf vanguard:$PBS_O_WORKDIR/.
Other types of job can also be submitted using similar scripts, a side-effect of the batch submission is that input files with $ in themcanget scrambled, a way around this is to use something like % and then use sed to convert the % to $, this is not a problem for input files that are copied from vanguard.
A Gamess example:
### Name of job
#PBS -N gamess_test
# combine the std out and std err and send to a file called errors in the
# submission directory
#PBS -j eo
#PBS -e errors
### Queue to use
#PBS -q parallel
#PBS -l nodes=1:ppn=1
### Gather some details about the host for reference later
echo Running on host `hostname`
echo Time is `date`
# move into the temp directory for this job
cd $WORKDIR
echo Working directory is $WORKDIR
# prepare the gamess input file
# Here we have to use % instead of $ because tcsh doesn't
# like it so we then use sed to convert it back
cat > tmp.inp << EOF
!
%CONTRL SCFTYP=RHF RUNTYP=OPTIMIZE COORD=ZMT NZVAR=0 %END
%SCF DIRSCF=.TRUE. %END
%SYSTEM MWORD=6 %END
%STATPT OPTTOL=1.0E-5 %END
%BASIS GBASIS=N31 NGAUSS=6 NDFUNC=1 %END
%DFT DFTTYP=SVWN %END
%SCF DIRSCF=.TRUE. %END
%GUESS GUESS=HUCKEL %END
%DATA
Methylene...1-A-1 state...RSVWN/6-31G*
Cnv 2
C
H 1 rCH
H 1 rCH 2 aHCH
rCH=1.09
aHCH=110.0
%END
EOF
sed s/\%/\$/g tmp.inp > meth.inp
# Run the job
/home/.2/gamess_2009r3.linux/rungms meth > meth.log
# copy the output files back to vanguard
rcp meth.log vanguard:$PBS_O_WORKDIR/.
For parallel gamess set the number of nodes you want using
#PBS -l nodes=2:ppn=4
#PBS -q parallel
then replace the rungms line with something like
/home/.2/gamess_2007/rungms.pbs meth 01 8 > meth.log
where the first number is 01 and the second number is the nodes * ppn e.g. here 4*2=8
and gamess should run in parallel.
An Aces2 example:
### Name of job
#PBS -N aces2_test
# combine the std out and std err and send to a file called errors in the
# submission directory
#PBS -j eo
#PBS -e errors
### Queue to use
#PBS -q parallel
#PBS -l nodes=1:ppn=1
### Gather some details about the host for reference later
echo Running on host `hostname`
echo Time is `date`
# move into the temp directory for this job
cd $WORKDIR
echo Working directory is $WORKDIR
# prepare the gamess input file
# NOTE you must use ZMAT as the input file here
cat > ZMAT << EOF
Water CC-LR/DZP at experimental equilibrium geometry
O
H 1 R
H 1 R 2 A
R=0.958
A=104.5
*ACES2(CALC=CCSD,BASIS=DZP,EXCITE=EOMCC)
%excite*
1
1
1 5 0 6 0 1.0
EOF
# Set up the ACES 2 evironment
# For an AMD64 machine use source /home/.2/aces2/cshrc.amd64 instead
source /home/.2/aces2/cshrc
cp /home/.2/aces2/basis/GENBAS .
# Run the job
xaces2 > water.log
# copy the output files back to vanguard
rcp water.log vanguard:$PBS_O_WORKDIR/.
### Name of job
#PBS -N dock_test
# combine the std out and std err and send to a file called errors in the
# submission directory
#PBS -j eo
#PBS -e errors
### Queue to use
#PBS -q parallel
#PBS -l nodes=1:ppn=1
### Gather some details about the host for reference later
echo Running on host `hostname`
echo Time is `date`
# move into the temp directory for this job
cd $WORKDIR
echo Working directory is $WORKDIR
# prepare the docking directory rcp -r copies entire directories and their
# contents
rcp vanguard:$PBS_O_WORKDIR/INDOCK .
rcp vanguard:$PBS_O_WORKDIR/split_database_index .
rcp vanguard:$PBS_O_WORKDIR/dock52 .
rcp vanguard:$PBS_O_WORKDIR/vdw.parms.amb.mindock .
rcp -r vanguard:$PBS_O_WORKDIR/dist .
rcp -r vanguard:$PBS_O_WORKDIR/grids .
rcp -r vanguard:$PBS_O_WORKDIR/crds .
# Run the job
./dock52
# copy the output files back to vanguard
rcp OUTDOCK vanguard:$PBS_O_WORKDIR/.
rcp test.3 vanguard:$PBS_O_WORKDIR/.
rcp test.eel3 vanguard:$PBS_O_WORKDIR/.
# Name of jobTo follow it up with an RDock job a script like below can be used to run on one node with two cpus with 1/2 the configs done on one cpu and half on the other.
#PBS -N a_mpi
# names of std out and std err to sent back to this directory just use
# the names without the preceeding path
#PBS -j eo
#PBS -e a_mpi_err
# Queue to use
#PBS -q parallel
# This jobs working directory is set below
echo Running on host `hostname`
echo Time is `date`
cd $WORKDIR
echo Working directory is $WORKDIR
#set up the p4pg file
echo "$HOSTNAME 0 /home/.2/zdock-2.3/zdock.mpi" > pgfile
echo "$HOSTNAME 1 /home/.2/zdock-2.3/zdock.mpi" >> pgfile
#copy some file
rcp vanguard:$PBS_O_WORKDIR/xol_l_b.pdb .
rcp vanguard:$PBS_O_WORKDIR/ige_r_a.pdb .
# Run the job
export P4_RSHCOMMAND=rsh
time /usr/local/mpich-1.2.7p1/bin/mpirun -np 2 -p4pg pgfile \
/home/.2/zdock-2.3/zdock.mpi \
-R ige_r_a.pdb -L xol_l_b.pdb -o test_a.out
# copy the output files back to vanguard
rcp test_a.out vanguard:$PBS_O_WORKDIR/.
# Name of job
#PBS -N ige_xol_rdock
# names of std out and std err to sent back to this directory just use
# the names without the preceeding path
#PBS -j eo
#PBS -e errors
# Queue to use
#PBS -q parallel
# This jobs working directory is set below
echo Running on host `hostname`
echo Time is `date`
cd $WORKDIR
echo Working directory is $WORKDIR
#copy some file
export ZDOCK_HOME=/home/.2/zdock-2.3
rcp vanguard:$PBS_O_WORKDIR/ige_xol.out .
rcp vanguard:$PBS_O_WORKDIR/xol_l_a.pdb .
rcp vanguard:$PBS_O_WORKDIR/ige_r_b.pdb .
rcp vanguard:$ZDOCK_HOME/rdock_jon.pl rdock.pl
rcp vanguard:$ZDOCK_HOME/pdb2crd .
rcp vanguard:$ZDOCK_HOME/deltaG .
rcp vanguard:$ZDOCK_HOME/create_lig .
rcp vanguard:$ZDOCK_HOME/BND.charmm .
rcp vanguard:$ZDOCK_HOME/RTF.charmm .
rcp vanguard:$ZDOCK_HOME/amino.rtf .
rcp vanguard:$ZDOCK_HOME/param.prm .
# Run the job, the wait allows for the situation where one background job
# ends before the other - we will wait for _both_ to finish
chmod u+x rdock.pl pdb2crd deltaG create_lig
./rdock.pl -d ./ -x 1 1000 -o out -i ige_xol_a1.out &
./rdock.pl -d ./ -x 1001 2000 -o out1 -i ige_xol_a2.out &
wait
# copy the output files back to vanguard
rcp ige_xol_a1.out vanguard:$PBS_O_WORKDIR/.
rcp ige_xol_a2.out vanguard:$PBS_O_WORKDIR/.
### Name of jobNote if you use the dellr610 queue you can use the dedicated GFS2 iSCSI array for the common space
#PBS -N qchem_test
#names of std out and std err to sent back to this directory just use
#the names without the preceeding path
#PBS -j eo
#PBS -e errors
### Queue to use
#PBS -q quad
#Tell PBS we want 1 machine with 4 cores
#PBS -l nodes=1:ppn=4
###Get some useful info for debugging later on
echo Running on host `hostname`
echo Time is `date`
# QCHEM
export HOSTNAME=`hostname`
export QC=/home/.2/qchem3102
source $QC/bin/qchem.setup.sh
QCSCRATCH=/cluster/$HOSTNAME/${PBS_O_LOGNAME}.${PBS_JOBID}
QCLOCALSCR=$WORKDIR
mkdir -p $QCSCRATCH
cd $WORKDIR
cp /home/.2/qchem3102/samples/DFT_glutamine.in .
cat $PBS_NODEFILE | sort | uniq > mpd.hosts
mpdcleanup -f mpd.hosts
$QC/bin/mpi2/mpdboot --rsh=/usr/bin/rsh -n `wc -l < mpd.hosts`
qchem -pbs -np `wc -l < $PBS_NODEFILE` DFT_glutamine.in > DFT_glutamine.out
$QC/bin/mpi2/mpdallexit
rcp DFT_glutamine.out vanguard:$PBS_O_WORKDIR/.
/bin/rm -rf /cluster/$HOSTNAME/${PBS_O_LOGNAME}.${PBS_JOBID}
/bin/rm -rf /cluster/$HOSTNAME/${PBS_O_LOGNAME}.${PBS_JOBID}.*
$QCSCRATCH=/clussan/iscsi1/${PBS_O_LOGNAME}.${PBS_JOBID}
#PBS -VNote if you use a STREAM file as the initial input to Charmm which then streams the actual charmm script then you must assign the actual charmm script to be streamed from a unit number that is not 99 (99 is the default) because qchem also uses unit 99 and this causes charmm to stop reading the streamed file after qchem has run. Something like:
#PBS -j oe
#PBS -o errors
#PBS -N cqtest2
#PBS -q quad@vanguard
#PBS -l nodes=1:ppn=4
echo "Time is `date`"
echo "Running on host(s) `cat $PBS_NODEFILE`"
echo "with likely master node `hostname`"
cd $WORKDIR
cat $PBS_NODEFILE | sort | uniq > mpd.hosts
# QCHEM
export HOSTNAME=`hostname`
export QC=/home/.2/qchem3102
source $QC/bin/qchem.setup.sh
QCSCRATCH=/cluster/$HOSTNAME/$PBS_O_LOGNAME.$PBS_JOBID.global
QCLOCALSCR=$WORKDIR
mkdir -p $QCSCRATCH
export PATH=$QC/bin/mpi2/:$PATH
# CHARMM+QCHEM
QCHEMINP=cq.inp
QCHEMEXE=qchem\ -pbs
QCHEMCNT=qchem.inp
QCHEMOUT=w2.qcout
CHARMMEXE=/home/.2/charmm35b3/exec/gnu/charmm.pgi90.mpich1.20.qchem
export QCHEMINP QCHEMEXE QCHEMCNT QCHEMOUT QCSCRATCH QCLOCALSCR CHARMMEXE PATH
rcp vanguard:$PBS_O_WORKDIR/$QCHEMCNT .
rcp vanguard:$PBS_O_WORKDIR/w2.inp .
# each charmm instance also starts "nproc" number of qchem instances
# hence the manual 2 below (and a PARA 2 in input file)
# gives 2 charmm x 2 qchem = 4 (quad)
cat $PBS_NODEFILE | sort | uniq > mpd.hosts
mpdcleanup -f mpd.hosts
$QC/bin/mpi2/mpdboot --rsh=/usr/bin/rsh -n `wc -l < mpd.hosts`
$QC/bin/mpi2/mpiexec -n 2 $CHARMMEXE < w2.inp > w2.out
$QC/bin/mpi2/mpdallexit
# mpiexec -n `wc -l < $PBS_NODEFILE` $CHARMMEXE < w2.inp > w2.out
rcp w2.out vanguard:$PBS_O_WORKDIR/.
rcp test.coor vanguard:$PBS_O_WORKDIR/.
rcp w2.qcout vanguard:$PBS_O_WORKDIR/.
/bin/rm -rf /cluster/$HOSTNAME/${PBS_O_LOGNAME}.${PBS_JOBID}
/bin/rm -rf /cluster/$HOSTNAME/${PBS_O_LOGNAME}.${PBS_JOBID}.*
* MPI does not allow rewind on stdin, thus loops in CHARMM will fail.should work in this case.
* Streaming the whole inputfile is a workaround (see e.g. parallel.doc).
* The default unit is 99 but this causes problems with qchem so here we use 77 instead.
*
open unit 77 read form name w2.inp
STREam unit 77
STOP
Due to the problems with the $ character in the submission scripts it is recommeded that perl scripts are not embedded into the submission scripts but instead the perl script is prepared beforehand and copied to the cluster using rcp like in the Charmm example above.
### Name of job
#PBS -N perl_test
# combine the std out and std err and send to a file called errors in the
# submission directory
#PBS -j eo
#PBS -e errors
### Queue to use
#PBS -q parallel
#PBS -l nodes=1:ppn=1
### Gather some details about the host for reference later
echo Running on host `hostname`
echo Time is `date`
# move into the temp directory for this job
cd $WORKDIR
echo Working directory is $WORKDIR
# prepare the working directory
# remember to make the perl scripts executatble using chmod
rcp vanguard:$PBS_O_WORKDIR/all_mut.pl .
rcp vanguard:$PBS_O_WORKDIR/extract.pl .
rcp vanguard:$PBS_O_WORKDIR/template.pdb .
chmod u+x all_mut.pl extract.pl
# run the perl script
./all_mut.pl
# gather the energies from the output files
./extract.pl > final_energies
# copy the wanted files back to vanguard
rcp final_energies vanguard:$PBS_O_WORKDIR/.
# Name of jobFor a zmatrix example (CH3CF3):
#PBS -N nw_test
# names of combined std out and err
#PBS -j eo
#PBS -e nw_run1.err
# Queue to use
#PBS -q parallel
#PBS -l nodes=1:ppn=4
# This jobs working directory is set below
echo Running on host `hostname`
echo Time is `date`
# get a unique directory name and move into it
cd $WORKDIR
echo Working directory is $WORKDIR
# prepare the nwchem input file
# this next line means copy everything below until we see
# EOF into a file called nw_blyp.nw
cat > nw_blyp.nw << EOF
start dft_test
charge -1
memory 1500 mb
geometry units angstroms
P 0.000000 0.000000 0.000000
O 0.000000 0.000000 1.498500
O 1.335749 0.000000 -0.959419
O -1.429242 0.228301 -0.793491
O 0.246763 1.801356 -0.145602
O -0.147288 -1.642459 -0.263059
C -2.148059 1.476757 -0.919444
C -0.590939 2.681214 0.605684
C -0.240092 -2.267327 -1.548966
C -2.070013 2.274822 0.384177
H 1.681536 0.894983 -1.060784
H -1.730066 2.046658 -1.740307
H -3.165862 1.194862 -1.157882
H -0.413637 3.685782 0.230216
H -0.337654 2.638306 1.657363
H -0.359894 -3.329649 -1.370195
H -1.098029 -1.891222 -2.096032
H 0.660549 -2.094186 -2.126827
H -2.715661 3.148885 0.328365
H -2.410151 1.643125 1.195446
end
print low
basis "cd basis"
C library "Ahlrichs Coulomb Fitting"
P library "Ahlrichs Coulomb Fitting"
H library "Ahlrichs Coulomb Fitting"
O library "Ahlrichs Coulomb Fitting"
end
basis "ao basis"
C library 6-31+g*
P library 6-31+g*
H library 6-31g
O library 6-31+g*
end
dft
XC becke88 lyp
grid ssf euler lebedev 75 11
end
scf
maxiter 30
end
driver
maxiter 30
end
task dft optimize
EOF
# The EOF above marks the end of the nwchem input file
#Prepare the nodes file for parallel nwchem runs
mpdallexit
cat $PBS_NODEFILE | sort | uniq > mpd.hosts
mpdboot --rsh=/usr/bin/rsh -n `wc -l < mpd.hosts`
export nwchem=/home/.2/nwchem5.1/bin/nwchem.mpich2
mpiexec -n `wc -l < $PBS_NODEFILE` $nwchem nw_blyp > nw_blyp.out
mpdallexit
# copy the output and restart required files back to vanguard
rcp nw_blyp.out vanguard:$PBS_O_WORKDIR/nw_run1.out
rcp dft_test.db vanguard:$PBS_O_WORKDIR/dft_test.db
rcp dft_test.movecs vanguard:$PBS_O_WORKDIR/dft_test.movecs
geometry
zmatrix
C
C 1 CC
H 1 CH1 2 HCH1
H 1 CH2 2 HCH2 3 TOR1
H 1 CH3 2 HCH3 3 -TOR2
F 2 CF1 1 CCF1 3 TOR3
F 2 CF2 1 CCF2 6 FCH1
F 2 CF3 1 CCF3 6 -FCH1
variables
CC 1.4888
CH1 1.0790
CH2 1.0789
CH3 1.0789
CF1 1.3667
CF2 1.3669
CF3 1.3669
constants
HCH1 104.28
HCH2 104.74
HCH3 104.7
CCF1 112.0713
CCF2 112.0341
CCF3 112.0340
TOR1 109.3996
TOR2 109.3997
TOR3 180.0000
FCH1 106.7846
end
end
|
qsub |
Submit a job to the batch queue system |
|
qstat |
Get the status of the batch queue system |
|
qdel |
Delete a job from a queue |
|
pbsdsh |
Distribute a task to the nodes of a PBS job |
|
qalter |
Changes the characteristics of a PBS job that is waiting to run |
|
qmgr |
Displays, adds, changes, or deletes PBS server, queue and node configuration information. General users can only display information about the PBS configuration |
|
qmove |
Moves PBS jobs between queues. |
|
qmsg |
Sends a message to a PBS job |
|
qorder |
Exchange to oder of two PBS jobs in a queue. |
|
qrerun |
Rerun a PBS job |
|
qselect |
List PBS job identifiers for jobs meeting selection criteria |
|
qsig |
Send a signal to a PBS job |
|
xpbs |
An X-Windows interface for using PBS and monitoring PBS jobs |
|
xpbsmon |
An X-Windows interface for monitoring PBS batch nodes. |
|
checkjob |
Get information about specific job details wrt to the queue and policy. |
[jon@vanguard /tmp]-% qstat -qWe then submit a job to the express queue, then then check it a few times to see it queued and also running, it usually takes about 10 seconds to go from Q to R (if cpus are available)
server: vanguard.carmay.office
Queue Memory CPU Time Walltime Node Run Que Lm State
---------------- ------ -------- -------- ---- --- --- -- -----
parallel -- -- 168:00:0 -- 14 0 -- E R
bigmem -- -- 504:00:0 -- 6 0 -- E R
medium64 -- -- 336:00:0 -- 23 0 -- E R
long -- -- 504:00:0 -- 16 1 -- E R
[jon@vanguard /tmp]-% qsub g03_run.inNow we submit 7 jobs to see the fairshare working, at any one time on this queue we should only see 3 jobs running by the same person, since each job is the same we expect to see 3 run then when they stop 3 more run and finally the seventh job run. If the jobs were different lenghts then they would not go in bacthes of three. In fact we were lucky here to see the 7th job get an E which means exiting or in our case its copying the output files back to the file servers.
557.vanguard
[jon@vanguard /tmp]-% qstat
Job id Name User Time Use S Queue
---------------- ---------------- ---------------- -------- - -----
557.vanguard gau_test jon 0 Q express
[jon@vanguard /tmp]-% qstat
Job id Name User Time Use S Queue
---------------- ---------------- ---------------- -------- - -----
557.vanguard gau_test jon 0 R express
[jon@vanguard /tmp]-% qstatRight last example for now, user jon1 submits 4 jobs to express queue, user jon submits one, here we see that the 4th jon1 job is held and that the user jon job runs in preference to make the usuage more fair.
Job id Name User Time Use S Queue
---------------- ---------------- ---------------- -------- - -----
558.vanguard gau_test jon 0 Q express
559.vanguard gau_test jon 0 Q express
560.vanguard gau_test jon 0 Q express
561.vanguard gau_test jon 0 Q express
562.vanguard gau_test jon 0 Q express
563.vanguard gau_test jon 0 Q express
564.vanguard gau_test jon 0 Q express
[jon@vanguard /tmp]-% qstat
Job id Name User Time Use S Queue
---------------- ---------------- ---------------- -------- - -----
558.vanguard gau_test jon 0 R express
559.vanguard gau_test jon 0 R express
560.vanguard gau_test jon 0 R express
561.vanguard gau_test jon 0 Q express
562.vanguard gau_test jon 0 Q express
563.vanguard gau_test jon 0 Q express
564.vanguard gau_test jon 0 Q express
[jon@vanguard /tmp]-% qstat -a
vanguard:
Req'd Req'd Elap
Job ID Username Queue Jobname SessID NDS TSK Memory Time S Time
--------------- -------- -------- ---------- ------ --- --- ------ ----- - -----
561.vanguard jon express gau_test -- 1 -- -- 03:00 Q --
562.vanguard jon express gau_test -- 1 -- -- 03:00 Q --
563.vanguard jon express gau_test -- 1 -- -- 03:00 Q --
564.vanguard jon express gau_test -- 1 -- -- 03:00 Q --
[jon@vanguard /tmp]-% qstat
Job id Name User Time Use S Queue
---------------- ---------------- ---------------- -------- - -----
564.vanguard gau_test jon 0 Q express
[jon@vanguard /tmp]-% qstat
Job id Name User Time Use S Queue
---------------- ---------------- ---------------- -------- - -----
564.vanguard gau_test jon 0 E express
[jon@vanguard /tmp]-% qstat
Job id Name User Time Use S Queue
---------------- ---------------- ---------------- -------- - -----
582.vanguard gau_test jon1 00:02:24 R express
583.vanguard gau_test jon1 00:00:16 R express
584.vanguard gau_test jon1 00:00:40 R express
585.vanguard gau_test jon1 0 Q express
586.vanguard gau_test jon 00:00:20 R express
Last update: Sat Oct 1 10:18:38 CST 2011
Comments to: jon _at_ sinica.edu.tw
These pages were created using vim
-a very much vi-improved.