| <p>We ran valgrind (a profiler) on AraSim to find out where it was spending most of its time, we ran it with callgrind to give an output file of the time and opened that file with qcashegrind to display it in a legible format. This let us know that the function Param_RE_Tterm_approx was taking up about 75% of the time because it was being called hundreds of millions of times. Now more specifically the standard math functions pow() and exp() were being called in this function every time and took up about 44% and 20% respectively. They were called about the same number of times as Param_RE_Tterm_approx but they are only called in the else block in an if-else statement.</p>
<p> </p>
<p>JULY 31st Update:</p>
<p>I used the C++ library future and set up iterations of a loop to run asynchronously as they do not rely on each other's calculations. This launches a seperate thread whenever I call one. The struggle with this is that launching a thread and getting its results takes time as well, so there had to be enough calulations in a thread in order for it to make up any time. The purpose of the function GetVm_FarField_Tarray is to create arrays of doubles called Earray and Tarray and those are the only values that get altered in this function. Tarray is created with simple calculations so I left that out of threads but making Earray calls the other function that takes up so much time.</p>
<p> </p>
<p>I edited the files signal.cc and signal.hh to work with 4 threads and ran 10,000 neutrinos with 4 cores. The cpu time was 1:48:41 but the real walltime it used only took 00:54:40. The time these tests for 10,000 neutrinos with base AraSim normally take anywhere from 1:35:00 to 1:45:00 or in that range. It should still work with less threads available it would just run slower and probably at the cpu time, which might be slightly slower than base AraSim but as soon as more threads are available it runs much faster as we see here. I still need to check for the accuracy of these edits, the number passed is in a reasonable range, but I will need to investigate further by giving the exact same neutrinos to both programs and making sure their outputs are identical.</p>
<p> </p>
<p>I also have some further ideas to speed it up even more. Currently every iteration of the loop calls a thread but what would be faster would be to either</p>
<p>A: Set it up so it can call more threads at a time and thus go faster, which would be an easy edit but only work if more cores are given</p>
<p>B: Set it up so the current threads run more than one iteration at a time, I've started experimenting with this but having trouble making sure that everything involving pointers stays safe, but the benefit of this is that the threads would get called and joined significantly less and would cause a great speed increase without adding more threads.</p>
<p>All these developments are new after me spending a while trying many different ways to speed it up or run parts in parallel like this, so even if this isn't accurate or valid I believe I would be able to make it so with more time and work.</p>
<p>Attached I have a zip that contains the two files I edited as well as an output file from my first test. I will continue to test and improve this but am unable to work on this again until August 19th so I wanted to share my current status before I am unavailable.</p> |