Updates and Results Talks and Posters Advice Ideas Important Figures Write-Ups Outreach How-To Funding Opportunities GENETIS
  GENETIS, Page 7 of 13  ELOG logo
New entries since:Wed Dec 31 19:00:00 1969
IDdown Date Author Subject
  126   Tue Jan 19 19:00:30 2021 Alex MLatest Run Details/Running on Slurm

OSC managed to figure out our issue with running XF from an interactive job on Slurm. Previously, we were losing our connection to x11 forwarding. The solution is to use sinteractive instead of srun to obtain an interactive job. Here's the syntax:

sinteractive -A PAS0654 -t <time> -N 1 -n 8 -p serial

The -p serial flag denotes the type of partition to request. It's important to specify this, as otherwise it will default to the debug node, which has a limit of 1 hour.

 

We're going to begin a new run. The title is Machtay_2021_1_15_NPOP50_Asym . It is using the latest version of the GA Ryan has been working on (Latest_GA_Asym.cpp). We're using 6% reproduciton, 72% crossover, and 22% mutation. We are using 80% roulette selection. This correlates to the three numbers passed into the GA being 3 36 8 (since we're using 50 individuals).

  125   Fri Dec 11 17:47:48 2020 Ethan FahimiFriday Updates
Ethan F Fixed the issued AREA was having with finding test_{ind}.txt, now to fix problems with finding Veff and the project should be working.
   
   
   
   
   
   
  124   Mon Dec 7 22:51:08 2020 Alex MImportant Runs

Today I removed some of the run directories which had very little or no data or weren't worth keeping around. There are still a few that I think can be removed, but I'm keeping them until we can get a consensus that they can definitely be removed. Below I listed the names and descriptions of the runs that I think we should definitely preserve going forward. In general, the more data contained in the run directory, the more important it is to keep around. 

Name Description Symmetry NPOP Generations Roulette/Tourney Crossover Reproduction  Mutation  Penalty Neutrinos  
Machtay_20200824_Real_Run

First real run with significant amounts of data after the summer improvements.______________________________________

Symmetric 10 15 100% Roulette 100% 0% 100% Yes 100k  
Machtay_20200827_Asym_Length_Run First asymmetric length run after summer improvements. Asymmetric length 10 17 100% Roulette 100% 0% 100% Yes 100k  
Machtay_20200831_Asym_Length_and_Angle Asymmetric length and angle run after summer improvements. Asymmetric length and angle 10 42 100% Roulette 100% 0% 100% Yes 150k  
Machtay_20200911_Symmetric Longer symmetric run with fewer neutrinos. Symmetric 10 35 100% Roulette 100% 0% 100% Yes 30k  
Machtay_20200914_Asymmetric_50_Individuals Longer asymmetric run with fewer neutrinos. Asymmetric (all dimensions) 50 26 100% Roulette 100% 0% 100% Yes 30k  
Machtay_20201016_Symmetric_Improved_GA First run using improvements to GA based on Ryan's paperclip/fast loop analysis. Symmetric 50 10 50/50 75% 10% 15% Yes 30k  
Machtay_20201023_300K_Nus_50_Individuals Started with all identical individuals to demonstrate evolution; replaced penalty with hard cutoff. Increased Nus for higher fitness score precision. Asymmetric (all dimensions 50 25 50/50 75% 10% 15% No 300k  
                       
                       

 

 

  123   Wed Dec 2 15:24:39 2020 Alex MGuide for Loop Errors

Attached is a .txt file you can find in the Evolutionary_Loop directory as Loop_Error_List.txt. It's a list of the current errors we sometimes experience in the loop, along with how to fix them if you encounter them while running. If people encounter errors that aren't in the list, let everyone know in the #genstudents chat on slack and update the file with the error message and when it was encountered (what state the loop was in) and the possible cause and solution if you know it. 

Attachment 1: Loop_Error_list.txt
Below is a list of errors we may encounter in the loop as of 11/25/20:

Error: "Pre-while, pre-for"
Description: This is an error you'll encounter after AraSim has "completed." The loop will hang after outputting "Pre-while" and then "Pre-for." This comes from the fitness function--the loop is indicating that it is inside the fitness function right before it enters the two loops it runs over the AraSim data. Hanging here indicates that there was an issue running AraSim. Specifically, it indicates that at least one of the jobs for the *first* individual in AraSim failed. 

Potential causes:
This may be caused by an issue in generating the gain files used for AraSim. These gain files are placed in the AraSim directory under the names a_{num}.txt, where {num} represents the individual number. You can check a_1.txt in the AraSim directory and see if it's complete (if it isn't, you can usually tell just by opening the file and seeing that only two lines have been printed to it).

One way of looking for the cause of this error is to look at the job error files. Inside the {runname} directory are directories containing the error and output files from the AraSim jobs. These are .../{gen}_AraSim_Errors and .../{gen}_AraSim_Output, where {gen} is the generation number. One example of an error message I've seen in the error files was 

"
terminate called after throwing an instance of 'std::out_of_range'
  what():  basic_string::substr
/var/spool/slurmd/job2345909/slurm_script: line 37: 171196 Aborted                 (core dumped) ./AraSim setup.txt $runNum outputs/ a_${num}.txt > $TMPDIR/AraOut_${gen}_${num}_${Seeds}.txt

"

This appeared in all of the error files for the first individual's jobs. 

Resolution:

The best way to resolve this is to start by checking the error files. In the case of the error message above, it would be best to go to the AraSim directory and check for the a_{num}.txt file. If you see just one (ex: a_1.txt), then that's likely to be the culprit (especially if the file is obviously not completely filled)--these files should be removed, so if one is left it may not have been moved correctly, likely due to permissions errors. Remove the a_{num}.txt file and restart from the AraSim job submissions (potential speed up: add part of the error message to the self-correcting phrases in Part_D2_AraSeed.sh to only rerun those individuals).

It's also possible that the issue was caused in XF. Make sure you follow the instructions above and start back at stage 2 to restart from the beginning of XF if starting back from AraSim doesn't work. Try to take notes on the differences you see to add to this.

It's also notable that this may be caused by permissions issues. Every time someone is handing the loop off to someone else, the OpenPermissions.sh script should be run (passing the {runName} as an argument). Look in that script to determine which files need to have open permissions. If the person with ownership of the closed files isn't available to open them, you can remove them and start back from where they would have been created. This usually occurs in AraSim and the AraOut files in Antenna_Performance_Metric should have their permissions fixed or be removed (for that generation only).

*****************************************************************************

Error: <Loop hangs while outputting dimensions and fitness scores>

Description: 
This is similar to the error above, except that instead of hanging on the first individual, it hangs on some later individual. 

Resolution: 
The instructions for resolving this should be the same as the ones above. This seems to be less common and is usually resolved by the self-correcting code in Part_D2. Regardless, if you encounter this error the first step should be to follow the instructions below in case there is just one or a handful of failed AraSim jobs. If that doesn't work, step back to stage 5 to resubmit the AraSim jobs after clearing out the possible offending files. If that doesn't work, step back to XF (you can always just step back to XF at the beginning if you're unsure that stepping back to AraSim will resolve this to potentially save time).

It's also possible that there is an error in just a handful (or even just one) of the AraSim jobs. This might be caused by opening permissions after someone has already taken over running the loop. In this case, you might be able to start the loop back up without needing to resubmit all of the AraSim jobs or step back all the way to XF. To do this, you'll need to figure out which AraSim job failed. Check the AraSim error files and output files for that generation (specifically, check to see if one is *missing*). You should be able to figure out which individual the loop is stuck on by counting how many sets of dimensions and fitness scores were printed to screen before the loop started hanging. Go to /Antenna_Performance_Metric (inside .../Evolutionary_Loop) and list all of the AraOut files corresponding to that individual and check them to see if any of them appear incomplete (AKA don't have an effective volume at the bottom).

Once you find the individual jobs that failed, you can set up the loop to only rerun those jobs. First, go to the AraSim flags directory inside the RunName directory and populate the the flags like so:

>for i in `seq 1 <NPOP>
>do
>for j in `seq 1 <jobs per individual>
>do
>echo <gen> > ${i}_${j}.txt
>echo $i >> ${i}_${j}.txt
>echo $j >> ${i}_${j}.txt
>done
>done

This will populate all of the flag files needed for AraSim to move on. Remove the flag files corresponding to the identified failed AraSim jobs. Next, go into the AraSim error file directory in the RunName directory (/<gen>_AraSim_Errors) and replace any text inside the error files corresponding to the failed AraSim jobs with the phrase "segmentation violation" (spelling and capitalization matter!). This is one of the phrases used in the self correcting part of the loop in Part_D2 and indicates to the loop to resubmit the AraSim job for that individual. 

After doing this, you should be able to return to .../Evolutionary_Loop and change the savestate in /savestates to 6 from 7. Now you can start back the loop and it will tell you that it's waiting for the AraSim jobs to finish. After 1/2 minutes it will notice that the error files have "segmentation violation" in them and will resubmit only the AraSim jobs you specified as having failed.

*****************************************************************************

Error: "cannot connect to X11 forwarding" (or something to that effect)

Description: 

This usually occurs during XF, but it may occur during the display of plots. In the case of plots being unable to display, the loop should still be able to operate, though plots might not update. However, if this message occurs during XF the data for the gain patterns for the antennas won't be properly created. This can occur at the first opening of the XF GUI or on the second part (after the xfsolver jobs have run).

Potential causes: 

First, you should make sure that you are logged in to OSC using <ssh -XY userID>. The -XY allows x11 forwarding, which is needed for the XF GUI to appear. Also remember to indicate that you need X11 forwarding when requesting your interactive job (using --x11). It's also possible that your connection to the X11 forwarding can be interrupted after a long time (I've seen the loop work for several generations over multiple interactive job submissions and then suddenly get this error). 

Resolution: 

My advice is to log out and back in to OSC each time your interactive job ends. This is an uncommon error but it's easy to miss. Once you've logged back in, you'll need to restart the save state back to the part of XF where you got this error (either 2 or 3 depending on which par the error appeared in).

*****************************************************************************

Error:  

  122   Mon Nov 30 17:00:31 2020 Ethan FahimiMonday Updates
Alex M Kicked the loop back up. Helped Ethan and Parker with their projects during the working meeting.
Ryan Changed the tournament/roulette ratio, reproduction_no, and crossover_no to be read in variables in the GA to increase the quality of life when running the algorithm and to prep the code for some testing. Format for how to call the code is written at the top of the cpp program file.
Ethan F Worked with Alex M on further fixing AREA. The first generation works now, we are making minor fixes to get subsequent generations up and running.
   
   
   
   
  Draft   Mon Nov 30 16:59:39 2020 Ethan FahimiDaily Update 10/30
Name Progress Plans
Alex M    
Alex P    
Ryan

Attempted to make a  new mutation function for the algorithm to try and address the concerns about hitting local maximums from Wednesday. Unfortunately, the idea was unsuccessful when I put it through testing. I would like some input from some of the experts before trying something else. Otherwise, the version I had earlier this week has still been very consistent about optimizing the runs outside of sometimes hitting a local max at about 90/100.

 

 
Ben    
Ethan Moved AREA onto my own directory with Alex's help. Began fixing issues with it (small win, no more permissions issues!). Continue fixing issues with AraSim on my user.
Parker    
Elliot    
Leo    
Evelyn    

 

  120   Mon Nov 23 18:02:40 2020 Ryan DeboltMonday Updates
Alex M Kept working with Amy, Alex P, Julie, and Ben on the AraSim fix. We fixed our issue from last week but have a new one in stage 2. It looks like the issue has to do with resetting the values for V_forfft right before stage 2 (around line 963). Check here for the current version of Report.cc: /users/PAS0654/pattonalexo/EFieldProject/11_23_20
Ryan Fixed the issue with the segmentation faults in the GA (simple fix one line changed) and worked with Kai to start creating a spreadsheet of results. Results seem to suggest that 2 tournament to 8 roulette seems to be the ideal selection ratio and the initial test seem to suggest a small amount of reproduction is ideal but more testing is needed to confirm this.   
   
   
   
   
   
  119   Fri Nov 20 17:50:05 2020 Alex MWorking Meeting 11/20/20
Alex M We worked with Amy to track down the error we were getting with the AraSim fix. Alex P and I had gotten AraSim to compile, but running gave the error (at the bottom). We did a binary search for the source of the error and determined it comes from the if statement around line 640 in Report.cc:

if ( event->Nu_Interaction[0].LQ > 0 && (fabs(viewangle-signal->CHANGLE_ICE)<=settings1->OFFCONE_LIMIT*RADDEG) ) {
Ryan Kept testing different rations of tournament and roulette. A trend seems to appear that while full roulette was the worst run overall, it rises until a ratio of about 8R:2T with a max score of 99.22% match and then it slowly falls from that point. Need to do more testing to verify averages and to fix strange segmentation faults in crossover when running 6T and 8T ratio. 
   
   
   
   
   

AraSim EField project Error:
 


 *** Break *** segmentation violation

===========================================================
There was a crash.
This is the entire stack trace of all threads:
===========================================================
#0  0x00002b845aeb445c in waitpid () from /lib64/libc.so.6
#1  0x00002b845ae31f52 in do_system () from /lib64/libc.so.6
#2  0x00002b84562bebf4 in TUnixSystem::StackTrace() () from /cvmfs/ara.opensciencegrid.org/trunk/centos7/root_build/lib/libCore.so
#3  0x00002b84562c13ea in TUnixSystem::DispatchSignals(ESignals) () from /cvmfs/ara.opensciencegrid.org/trunk/centos7/root_build/lib/libCore.so
#4  <signal handler called>
#5  0x000000000044fd70 in void std::vector<double, std::allocator<double> >::emplace_back<double>(double&&) ()
#6  0x00000000004c3c0e in Report::Connect_Interaction_Detector(Event*, Detector*, RaySolver*, Signal*, IceModel*, Settings*, Trigger*, int) ()
#7  0x0000000000436eb3 in main ()
===========================================================


The lines below might hint at the cause of the crash.
You may get help by asking at the ROOT forum http://root.cern.ch/forum
Only if you are really convinced it is a bug in ROOT then please submit a
report at http://root.cern.ch/bugs Please post the ENTIRE stack trace
from above as an attachment in addition to anything else
that might help us fixing this issue.
===========================================================
#5  0x000000000044fd70 in void std::vector<double, std::allocator<double> >::emplace_back<double>(double&&) ()
#6  0x00000000004c3c0e in Report::Connect_Interaction_Detector(Event*, Detector*, RaySolver*, Signal*, IceModel*, Settings*, Trigger*, int) ()
#7  0x0000000000436eb3 in main ()
===========================================================

  118   Thu Nov 19 16:42:06 2020 Alex PattonDaily Update 11/19/20

I hopped on with Alex M to catch him up with what has been done and changed within AraSim so far. I looked into using pointers to access arrays outside of their declared scope and it seemed reletively easy to set up. I set up a pointer corresponding to the arrays V_forfft, T_forfft, and vmmhz_filter and set them at the end of stage one in order to be used in stage two. Also made sure to delete the pointers at the end of stage 2 (Very Important). After this and finding the correct scope to put the deletes in, we complied and didn't get an error in Report.cc. Once it compiled I signed off to study for exam and Alex M will now take over and start to run tests to make sure that with the same seed this gives the same results as a base copy of AraSim. After this the next step would be to start implementing a way to save all the variables from stage one and reading them into stage 2.

  117   Fri Oct 30 17:30:16 2020 Ethan FahimiDaily Update 10/30
Name Progress Plans
Alex M    
Alex P    
Ryan

Attempted to make a  new mutation function for the algorithm to try and address the concerns about hitting local maximums from Wednesday. Unfortunately, the idea was unsuccessful when I put it through testing. I would like some input from some of the experts before trying something else. Otherwise, the version I had earlier this week has still been very consistent about optimizing the runs outside of sometimes hitting a local max at about 90/100.

 

 
Ben    
Ethan Moved AREA onto my own directory with Alex's help. Began fixing issues with it (small win, no more permissions issues!). Continue fixing issues with AraSim on my user.
Parker    
Elliot    
Leo    
Evelyn    

 

  116   Mon Oct 26 17:59:10 2020 Ethan FahimiDaily Update 10/26
Name Progress Plans
Alex M    
Alex P    
Ryan I have the testing loop finished with plotted results now. The program was able to reach optimal results very quickly. It used 100% tournament selection with a cutoff on the outer radius. The algorithm was using an asymmetric algorithm and the ideal bicone it was being compared to was an arbitrarily picked one from an actual symmetric run individual we knew to stay within the outer radius. All individuals for each generation are plotted on this graph. And the fact that these bicones started as asymmetric shows that we can very easily find symmetric answers if they are indeed ideal.   
Ben    
Ethan Tried to fix permission issues with AREA, as well as learned about running the loop and listening to Jorge's thoughts on our project. Possibly move the AREA project onto the project space as it may solve permissions issues. Alex is looking into it.
Parker    
Elliot    
Leo    
Evelyn    

 

Attachment 1: fitness.png
fitness.png
  115   Mon Oct 19 17:02:12 2020 Ethan FahimiDaily Update 10/19
Name Progress Plans
Alex M    
Alex P    
Ryan Alex P. and I put the outer radius constraint into the Asymmetric our version of the algorithm. I have also created the pseudo-fitness function to be able to do some optimization testing that bypasses the time-consuming parts of the run. All I need to do to finish the pseudo tests is to create a loop to run through the generations and plotting procedures.   
Ben    
Ethan Tried to fix permission issues, as well as work on running the loop (invalid id) Continue fixing AREA permission issues with Ben's help.
Parker    
Elliot    
Leo    
Evelyn    

 

  Draft   Fri Oct 16 17:32:49 2020 Alex PattonDaily Update 10/16/20
  Update Plans for Monday
Alex M

 

 
Alex P and Ryan Made new version of our modified GA that is asymetric. We didn't include the options to start with symetetry but just wanted to get it to a compilable state so that we can test how it would run in the loop to make sure it functions properly there. We also have a real run going with our previous symetrical GA edit. Watch this run over the weekend and keep it going and then eventually test how the asymetric version works
Eliot    
Ben    
Leo    
Evelyn    
Ethan Worked with Ben and Alex M on getting the AREA project working on slurm. It works now, but there are some permissions issues that we still need to fix. Simply put, Ben can run it perfectly right now. Find and address permissions issues so anyone can run a job.
  113   Mon Oct 12 17:34:29 2020 Alex PattonDaily Update 10/12/20
Name Update Plans for Week
Alex M

 

 
Alex P, Ryan, and Ben Finished up writing the updated GA and got it compilable and then got some run time segmentation faults but managed to get all of those fixed. But once we got it running we encountered an error where a lot of the generationDNA ended up being 0,0,0. We looked through our functions and confirmed this was only happening with indiviudals developed through crossover and not mutation or reproduction Get this problem fixed in crossover and continue to test and make sure this works as intended but so far it is able to generate genDNA files and just has that one bug. The bug is also every other individual so we think we might have messed up how crossover generates two individuals from parents so it should hopefully be an easy fix.
Eliot    
Leo    
Evelyn    
Ethan .  
  112   Fri Oct 9 17:37:57 2020 Alex PattonDaily Update 10/9/20
Name Update Plans for Monday
Alex M

 

 
Alex P, Ryan, and Ben Got all functions finished and worked on adding them to main. We set up their calls and commented out the unfunctionalized code with a comment explaining what it was and what date it was commented out. Had to define new variables for limits on our mutation function now that our mutation function is uniformly distributed within a range rather than using standard deviations. Also created some more global variables for maximum outer radius and minimum length in order to have them more accessible. Finish up implementing and defining variables so that it can run the functions properly. Hoping to have it ready to test on monday.
Eliot    
Leo    
Evelyn    
Ethan .  
  111   Mon Oct 5 21:06:29 2020 EveryoneData Runs

Machtay_20200831_Asym_Length_and_Angle    10 individuals
Machtay_20200911_Symmetric                             10 individuals, fewer neutrinos
Machtay_20200914_Asymmetric_50_Individuals  50 individuals, fewer neutrinos
Machtay_20200929_Asymmetric_test_2               50 individuals, fewer neutrinos, broaden parameter range

 

  Draft   Mon Oct 5 17:34:32 2020 Leo DeerUpdate Monday 10/5/2020
Name Update Plans for Monday
Alex M

 

 
Alex P   Finish up mutation function and then make sure it functions properly, currently our added functions all compile but we haven't tested if they are functioning as intended yet. Also have some questions about handling the standard deviations in mustations but first we want to get it functioning before applying more tweaks
Eliot Worked on learning about 3D plotting in python. Made a simple surface plot that can be expanded upon for our purposes.  
Leo Fixed the averge Fscore Plot so that it connects the mean values with a line instead of making them discrete points. Then I made Kai's plot suggestion. For that I copied the current FScore plotting software and made some edits so that it takes in 2 runs and plots the fitness scores of both in either red or green along with the average fitness score of both. I'll attach images of what these 2 plots look like. For Friday I'm going to look over the other plotting suggestions but most likely begin on fixing the random seperation in the RLTS plots.
Evelyn    
Ryan    
Ben    
Ethan    

 

Attachment 1: FixedFScore.png
FixedFScore.png
Attachment 2: RedPlot.png
RedPlot.png
  109   Fri Oct 2 17:11:25 2020 Ethan FahimiUpdate Friday 10/2/2020
Name Update Plans for Monday
Alex M

 

 
Alex P    
Eliot    
Leo Today I started working on improving the plots. I started with plotting the average fitness score for each generation on top of all the individuals data points. I made the average fitness score a star with higher contrast so it will stand out. Here is the plot of the first 26 generations of the "Machtay_20200914_Asymmetric_50_Individuals" run with the new averages. On monday I'm going to try and create the plot of  2 runs overlaid on top of eachother as Kai suggested.
Evelyn    
Ryan Worked with Ben and Alex P. on making the Bicone function modular on our copy. We finished writing a new roulette selection function, tournament selection function, and reproduction function.  Finish writing the mutation and crossover functions. After these are complete apply these functions in place of the existing function that is wrong and test to make sure the results are not producing errors. 
Ben    
Ethan Updated the AREA code with Alex M's help to work using SLURM. Began a trial run to see if our updates work. Begin making changes that allow the job to run faster as well as investigate what is causing some individuals to have fitness scores beyond what should be physically possible.
Attachment 1: Screen_Shot_2020-10-02_at_5.39.33_PM.png
Screen_Shot_2020-10-02_at_5.39.33_PM.png
  108   Mon Sep 14 17:23:26 2020 Alex MGENETIS Update 9/14/20

I just wanted to give an update on what I've been doing today for GENETIS. I went ahead and recorded myself running the loop. It should be useful for new people we're onboarding but I also went through a whole generation, which I think is interesting for everyone to understand the timescale of each part of the loop. I'm making a small edit (I have no clue how to edit videos haha) so once I'm done I'll try to send it via email and post it on slack. I also started a long run Amy requested using antennas which are asymmetric in length and opening angle and 50 individuals per generation. Since it has so many individuals it will take a long time to go through each generation. I also had to decrease the number of AraSim jobs (thus increasing the number of neutrinos per job) because otherwise we'd be submitting too many jobs and have a lot of them blocked, holding us back from running. If anyone has problems with me using up too many jobs let me know and I can try to decrease the numer of jobs per individual futher (right now I have 10 jobs per individual with 3000 neutrinos per job, which gives us 500 AraSim jobs per generation).

  107   Wed Sep 9 17:28:54 2020 Alex MUpdate 9/9/20
Name Update Plans for tomorrow
Alex M

I've been working on readying the loop for the transition to slurm. OSC is switching from its current job managing software (torque/moab) to slurm, which uses some different commands but is functionally similar. Since we have places where jobs are called in the script, I've been changing them (with a backup copy of those files that we know works on torque). 

Right now I'm testing all of the changes I've made on the space OSC has set aside for playing with slurm. I have changes in all of the places I think they belong, but I need to make sure there aren't typos/different commands from what I've used. 

I'll keep testing the changes I've made for slurm. I'll also try to organize some of the dfferences I've found in using slurm vs torque (for example, requesting an interactive job) so that everyone can see them and be able to find out what to use when OSC makes the switch (Pitzer changes on the 22nd, Owens is in November I think).
Alex P    
Eliot    
Leo    
Evelyn    
Ryan    
Ben    
Ethan    

 

ELOG V3.1.5-fc6679b