I am implementing a genetic algorithm to use as an optimisation algorithm to evolve robots. The robots have certain parameters (represented as floats) which can lie anywhere within a certain range defined for each parameter. My goal is to optimise these parameters to produce the fittest robot.
There are about 25 parameters to optimise over, all of which are >= 0. A generation consists of 32 robots and a typical evolutionary run may have 6000 generations, so that is 192000 robots in total.
One problem I was noticing was that many times, the same robot was being produced by the random mutation function and its fitness was re-evaluated. I want to try and minimise this from happening as the physics simulation engine has a large time expense.
It would be impractical (especially from a memory point of view) to remember all robot parameters that have ever been produced and use a brute force solution.
One solution I thought of was to use a hash function to and a large array of booleans, and everytime a mutation occurs, largeArray[hash(robotParameters)]
is checked to see if it is set. If it is, the mutation occurs again to produce a different set of parameters, and if it isn't, largeArray[hash(robotParameters)]
is then set.
What is a good hash function implementation that would work for me? Preferably, there would already be an implementation in Python or C++.
Asked By : texasflood
Answered By : Sumik
Hashing an array of floats
You need some hash function that:
- low computational cost
- low probability of generating the same hash value for two input vectors = evenly distributed hashes
If you language/library have some function for hashing float array, it should be ok.
For example, this function is what Java would do with an float array (it's not a direct copy of Java library):
int hashFloatArray(float[] arr) { //this is done in Arrays.hashCode() int h = 1; for (int i = 0; i < arr.length; i++){ int floatAsInt = Float.floatToIntBits(arr[i]); //in C: int floatAsInt = *(int*)(&arr[i]); h = 31 * h + floatAsInt; } //this is done in HashMap h ^= (h >>> 20) ^ (h >>> 12); return h ^ (h >>> 7) ^ (h >>> 4); }
Finding matching hashes
Storing hashes instead of whole float arrays reduces amount of memory needed. Space of hashes is smaller than the space of float arrays, so matching hashes never give 100% certainty that original objects (float arrays) where identical. The longer hashes the smaller is the change of hash collision (situation when two different inputs produce the same hash).
Array indexed with hashes (as OP proposed)
Indexing array with hashes means that the length of the array must be 2^k where k is length of key in bits. For example: Assuming single field of array occupies 1 byte (boolean variable usually occupy 1 byte). Using 30 bit key requires 2^30 * 1 bytes = 1GB of memory (O(2^k) space complexity). As mentioned before:
- shorter key = smaller array = higher chance of hash collision
- longer key = bigger array = lower chance of hash collision
Hash table (Hash set)
Hash table is a data structure that can be used to implement hash set. Main advantage of hash set is it's O(1) average time complexity combined with O(n) space complexity.
You can use hash set to store and effectively find:
- long hashes (64 bit?) of float arrays without wasting memory
- float arrays themselves (if you have enough memory)
Hash sets are available in standard libraries of almost all programming languages.
java.util.HashSet
in Javaunordered_set
in C++11set
in Python
Why comparing and hashing floats is a problem?
Let's take two float numbers: 1.2345678 and 1.2345679 The difference between them is small enough to have no influence on the quality of your Robot, but comparing 1.2345678 and 1.2345679 results in "not equal" and hashing them results in two different hashes. Introducing an "almost equal" term may be a good idea, but that can't be done with hashes.
Best Answer from StackOverflow
Question Source : http://cs.stackexchange.com/questions/37952
0 comments:
Post a Comment
Let us know your responses and feedback