Updates and Results Talks and Posters Advice Ideas Important Figures Write-Ups Outreach How-To Funding Opportunities GENETIS
  Place to document instructions for how to do things  ELOG logo
Message ID: 31     Entry time: Thu Dec 13 17:33:54 2018
Author: s prohira 
Subject: parallel jobs on ruby 
Project: Software 

On ruby, users get charged for the full node, even if you aren't using all 20 cores, so it's a pain if you want to run a bunch of serial jobs. There is, however, a thing called the 'parallel command processor' (pcp) which is provided on ruby, (https://www.osc.edu/resources/available_software/software_list/parallel_command_processor) that makes it very simple.

essentially, you make a text file filled with commands, one command per line, and then you give it to the parallel command processor and it submits each line of your text file as an individual job. the nice thing about this is that you don't have to think about it. you just give it the file and go, and it will use all cores on the full node in the most efficient way possible.

below i provide 2 examples, a very simple one to show you how it works, and a more complicated one. in both files, i make the command file inside of a loop. you don't need to do this-you can make the file in some other way if you choose to. note that you can also do this from within an interactive job. more instructions at the above link.

test.pbs  is just a minimal thing, where you need to submit the same command but with some value that needs to be incremented 1000 times (e.g. 1000 different jobs).

effvol.pbs is more involved, and shows some important steps if your job produces a lot of output, where you use the $TMPDIR or the pbs workdir. (if you don't know what that is, you probably don't need to use it). each command in this file stores an output file to the $TMPDIR directory. this directory is accessed faster than the directories where you store your files, and so your jobs run faster. at the end of the script, all of the output files from all of the run jobs, are copied to my home directory, because $TMPDIR is deleted after each job. also this file shows the sourcing of a particular bash profile for submitted jobs (if you need this. some programs work differently when submitted than jobs run on the login nodes on ruby).

i recommend reading the above link for more information. the pcp is very useful on ruby!

Attachment 1: test.pbs  304 Bytes  | Hide | Hide all
#!/bin/bash

#PBS -A PCON0003
#PBS -l walltime=01:00:00
#PBS -l nodes=1:ppn=20


touch commandfile
for value in {1..1000}
do
    line="/path/to/your_command_to_run $value (arg1) (arg2)..(argn)"
    echo ${line}>> commandfile
    
done


module load pcp
mpiexec parallel-command-processor commandfile




Attachment 2: effvol.pbs  549 Bytes  | Hide | Hide all
#!/bin/bash

#PBS -A PCON0003
#PBS -N effvol
#PBS -l walltime=01:00:00
#PBS -l nodes=1:ppn=20
#PBS -o ./log/out
#PBS -e ./log/err

source /users/PCON0003/osu10643/.bash_batch

cd $TMPDIR
touch effvolconf
for value in {1..1000}
do
    line="/users/PCON0003/osu10643/app/geant/app/nrt -m /users/PCON0003/osu10643/app/geant/app/nrt/effvol.mac -f \"$TMPDIR/effvol$value.root\""
    echo ${line}>> effvolconf
    
done


module load pcp
mpiexec parallel-command-processor effvolconf


cp $TMPDIR/*summary.root /users/PCON0003/osu10643/doc/root/summary/

ELOG V3.1.5-fc6679b