| <p>To analyze data, we sometimes need to run over many thousands of runs at once. To do this in parallel, we can submit a job for every run we want to do. This will proceed in several steps:</p>
<ol>
<li>We need to prepare an analysis program.
<ol>
</ol>
<ol>
<li>This is demo.cxx.</li>
<li>The program will take an input data file and an output location.</li>
<li>The program will do some analysis on each events, and then write the result of that analysis to an output file labeled by the same number as the input file.</li>
</ol>
</li>
<li>We need to prepare a job script for PBS.
<ol>
<li>This is "run.sh"; this is the set of instructions to be submitted to the cluster.</li>
<li>The instructions say to:
<ol>
<li>Source a a shell environment</li>
<li>To run the executable</li>
<li>Move the output root file to the output location.</li>
</ol>
</li>
<li>Note that we're telling the program we wrote in step 1 to write to the node-local $TMPDIR, and then moving the result to our final output directory at the end. This is better for cluster performance.</li>
</ol>
</li>
<li>We need to make a list of data files to run over
<ol>
<li>We can do this on OSC by running <code>ls -d -1 /fs/scratch/PAS0654/ara/10pct/RawData/A3/2013/sym_links/event*.root > run_list.txt</code></li>
<li>This places the full path to the ROOT files in that folder into a list called run_list.txt that we can loop over.</li>
</ol>
</li>
<li>Third, we need to script that will submit all of the jobs to the cluster.
<ol>
<li>This is "submit_jobs.sh".</li>
<li>This loops over all the files in our run_list.txt and submits a run.sh job for each of them.</li>
<li>This is also where we define the $RUNDIR (where the code is to be exeucted) and the $OUTPUTDIR (where the output products are to be stored)</li>
</ol>
</li>
</ol>
<p>Once you've generated all of these output files, you can run over the output files only to make plots and such.</p>
<p> </p> |