|
Updates and Results
Talks and Posters
Advice
Ideas
Important Figures
Write-Ups
Outreach
How-To
Funding Opportunities
GENETIS
|
| Place to document instructions for how to do things |
 |
|
|
Message ID: 32
Entry time: Mon Dec 17 21:16:31 2018
|
| Author: |
Brian Clark |
| Subject: |
Run over many data files in parallel |
| Project: |
|
|
|
To analyze data, we sometimes need to run over many thousands of runs at once. To do this in parallel, we can submit a job for every run we want to do. This will proceed in several steps:
- We need to prepare an analysis program.
- This is demo.cxx.
- The program will take an input data file and an output location.
- The program will do some analysis on each events, and then write the result of that analysis to an output file labeled by the same number as the input file.
- We need to prepare a job script for PBS.
- This is "run.sh"; this is the set of instructions to be submitted to the cluster.
- The instructions say to:
- Source a a shell environment
- To run the executable
- Move the output root file to the output location.
- Note that we're telling the program we wrote in step 1 to write to the node-local $TMPDIR, and then moving the result to our final output directory at the end. This is better for cluster performance.
- We need to make a list of data files to run over
- We can do this on OSC by running
ls -d -1 /fs/scratch/PAS0654/ara/10pct/RawData/A3/2013/sym_links/event*.root > run_list.txt
- This places the full path to the ROOT files in that folder into a list called run_list.txt that we can loop over.
- Third, we need to script that will submit all of the jobs to the cluster.
- This is "submit_jobs.sh".
- This loops over all the files in our run_list.txt and submits a run.sh job for each of them.
- This is also where we define the $RUNDIR (where the code is to be exeucted) and the $OUTPUTDIR (where the output products are to be stored)
Once you've generated all of these output files, you can run over the output files only to make plots and such.
|
|
|
|
|
#!/bin/bash
#where should the outputs be stored?
OutputDir="/fs/scratch/PAS0654/shell_demo/outputs"
echo '[ Processed file output directory: ' $OutputDir ' ]'
export OutputDir
#where is your executable compiled?
RunDir="/users/PAS0654/osu0673/A23_analysis/araROOT"
export RunDir
#define the list of runs to execute on
readfile=run_list.txt
counter=0
while read line1
do
qsub -v RUNDIR=$RunDir,OUTPUTDIR=$OutputDir,FILE=$line1 -N 'job_'$counter run.sh
counter=$((counter+1))
done < $readfile
|