Updates and Results Talks and Posters Advice Ideas Important Figures Write-Ups Outreach How-To Funding Opportunities GENETIS
  Important Plots, Tables, and Measurements  ELOG logo
Message ID: 20     Entry time: Fri Jul 28 17:43:20 2017
Author: Abdullah Alhag 
Type: Analysis 
Category: Analysis 
Subject: GP algorithms 
Project:  

In this post, I will be pointing out the advantage and the disadvantage of the GP algorithms I came across, particular Eureqa and HeuristicLab.

Eureqa is by far the fastest genetic algorithm software I came across. It is over simplified and easy to use. It has some built-in fitness function and also with some playing with the function that is being solved for and some other feature, it is possible for one to write his/her own fitness function. Moreover, the software is available for free for academic use and for most platform. Other features come with the software is the ability to normalize the data in different ways and even handle outliers and missing data. The program support a large collection of functions including trig and more complex one.

One the other hand, HeuristicLab is much slower than Eureqa but still far faster than Karoo-GP. The latest version of the software was released a year ago, and the support for the software is fairly slow. It is only supported for Windows; however, there is plans to adopted to Linux systems. The software support way more feature than Eureqa or karoo and even different regression and classification algorithms. You could also get the function ready to use in many software such as MATLAB, Excel, Mathematica, and much more. Another cool feature is that it shows you a three of the function and the weight of each node (operation or operand), greener means the node has more weight, see attached. It should be noticed that the software has the tendency to grow large three which could be fixed by changing the default max three length and the max three depth. The software has a problem with the last update of windows 10, you will get the blue screen if you opened too many windows, so be careful.

In booth software you could change how much of the data set goes to training and how much goes to testing, be sure to shuffle the data in HeuristicLab as it will otherwise distribute the data as training and testing non-randomly. Booth software by default shows plot of the current function with the x axis being the data entries (row numbers) and y axis being the target values with a curve of the function estimate values.

Below is a simple run of both software on a fake data that Prof. Amy gave me, see out.txt for data.

For start, using eureqa I got a few functions with an average of mean of R^2 of 0.98 or more which is very good.

First function: frequency = (571953.335547372*y + 15786079*x^2*y^4 + 297065746*x*y^2*asinh(x))/factorial(7.44918899410674 + x + y)

The first has an R^2 of 0.99(1 means perfect fit) and mean absolute error of 2.98(0 means perfect fit, data dependent, not normalized), see plot1 for 1D plot of the function estimate values vs target values.

 

Second function: frequency = (3569.91823898791*x*y - 149.144996501988 - 100.216589664235*x^2)/(5.26462242203202^x + 7.09216771444399^y*x^(1.3193170267439*x))

The second has an R^2 of 0.994(1 means perfect fit) and mean absolute error of 3.2(0 means perfect fit, data dependent, not normalized), see plot2 for 1D plot of the function estimate values vs target values.

Also attached is 2D plot of the function against the data, the function plotted is the second function, but all are very similar, see Eq23.

 

Using HeuristicLab. The function below has an R^2 of 0.987, mean absolute error of 3.6 and normalized mean squared error of 0.012.

The function is: (((EXP((-1.3681014170483*'y')) * ((((-1.06504220396658) * (2.16142798579652*'x')) * (3.44831687407881*'y')) / ((((1.57186418519305*'y') + (2.15361749794796*'y')) / ((1.6912208581006*'y') / (EXP((1.80824695345446*'x')) * 16.3366330774664))) / ((((2.11818004168659*'x') * (1.10362178478116*'y')) - ((((-1.06504220396658) * (2.16142798579652*'x')) * (3.44831687407881*'y')) / ((((2.11818004168659*'x') + (2.15361749794796*'y')) / ((10.9740866421104 + (1.8106235953875*'y')) - (2.15361749794796*'y'))) / (((-7.8798958167) + (-6.76475761634751)) + ((2.87007061579651*'x') + (2.15361749794796*'y')))))) - (((((-8.85637334631747) * ((1.9238243855142*'y') - (1.01219957177297*'y'))) + (((-6.37085286789103) * 5.99856391145622) - ((-12.9565240969832) - 2.84841224458228))) - ((2.11818004168659*'x') * (1.10362178478116*'y'))) - (((2.11818004168659*'x') * ((((0.197306324191089*'y') + (0.255996267596584*'y')) - (2.16142798579652*'x')) - (((-1.06504220396658) * (2.16142798579652*'x')) / (12.2597910897177 / (1.25729305246107*'y'))))) - (EXP((-1.3681014170483*'y')) - ((((-6.29806655709512) * 6.39744830364858) / (12.2597910897177 / (0.728256926023423*'x'))) - (1.10362178478116*'y'))))))))) * 179.42788632856) + 2.24688690162535)

 

As you could see, HeuristicLab tend to generate function which are extremely large, this one has a depth of 15 and a length of 150.

 

See plot3 for 1D plot of the function estimate values vs target values, note that it is different from before because the data is shuffled,  and three.jpg for the three representation of the function showing the wight of each node, greener means has more weight.

Attachment 1: plot1.png  832 kB  Uploaded Fri Jul 28 18:48:49 2017  | Show | Hide all | Show all
Attachment 2: plot2.png  837 kB  Uploaded Fri Jul 28 18:49:03 2017  | Hide | Hide all | Show all
plot2.png
Attachment 3: plot3.png  2.353 MB  Uploaded Fri Jul 28 18:49:17 2017  | Show | Hide all | Show all
Attachment 4: Eq23-0.png  149 kB  Uploaded Fri Jul 28 18:49:31 2017  | Show | Hide all | Show all
Attachment 5: three.png  1.475 MB  Uploaded Fri Jul 28 18:49:46 2017  | Show | Hide all | Show all
ELOG V3.1.5-fc6679b