World's most popular travel blog for travel bloggers.

[Solved]: Hash function floating point inputs for genetic algorithm

, , No Comments
Problem Detail: 

I am implementing a genetic algorithm to use as an optimisation algorithm to evolve robots. The robots have certain parameters (represented as floats) which can lie anywhere within a certain range defined for each parameter. My goal is to optimise these parameters to produce the fittest robot.

There are about 25 parameters to optimise over, all of which are >= 0. A generation consists of 32 robots and a typical evolutionary run may have 6000 generations, so that is 192000 robots in total.

One problem I was noticing was that many times, the same robot was being produced by the random mutation function and its fitness was re-evaluated. I want to try and minimise this from happening as the physics simulation engine has a large time expense.

It would be impractical (especially from a memory point of view) to remember all robot parameters that have ever been produced and use a brute force solution.

One solution I thought of was to use a hash function to and a large array of booleans, and everytime a mutation occurs, largeArray[hash(robotParameters)] is checked to see if it is set. If it is, the mutation occurs again to produce a different set of parameters, and if it isn't, largeArray[hash(robotParameters)] is then set.

What is a good hash function implementation that would work for me? Preferably, there would already be an implementation in Python or C++.

Asked By : texasflood

Answered By : Sumik

Hashing an array of floats

You need some hash function that:

  • low computational cost
  • low probability of generating the same hash value for two input vectors = evenly distributed hashes

If you language/library have some function for hashing float array, it should be ok.

For example, this function is what Java would do with an float array (it's not a direct copy of Java library):

int hashFloatArray(float[] arr) {     //this is done in Arrays.hashCode()     int h = 1;     for (int i = 0; i < arr.length; i++){         int floatAsInt = Float.floatToIntBits(arr[i]);          //in C: int floatAsInt = *(int*)(&arr[i]);         h = 31 * h + floatAsInt;     }     //this is done in HashMap     h ^= (h >>> 20) ^ (h >>> 12);     return h ^ (h >>> 7) ^ (h >>> 4); } 

Finding matching hashes

Storing hashes instead of whole float arrays reduces amount of memory needed. Space of hashes is smaller than the space of float arrays, so matching hashes never give 100% certainty that original objects (float arrays) where identical. The longer hashes the smaller is the change of hash collision (situation when two different inputs produce the same hash).

Array indexed with hashes (as OP proposed)

Indexing array with hashes means that the length of the array must be 2^k where k is length of key in bits. For example: Assuming single field of array occupies 1 byte (boolean variable usually occupy 1 byte). Using 30 bit key requires 2^30 * 1 bytes = 1GB of memory (O(2^k) space complexity). As mentioned before:

  • shorter key = smaller array = higher chance of hash collision
  • longer key = bigger array = lower chance of hash collision

Hash table (Hash set)

Hash table is a data structure that can be used to implement hash set. Main advantage of hash set is it's O(1) average time complexity combined with O(n) space complexity.

You can use hash set to store and effectively find:

  • long hashes (64 bit?) of float arrays without wasting memory
  • float arrays themselves (if you have enough memory)

Hash sets are available in standard libraries of almost all programming languages.

  • java.util.HashSet in Java
  • unordered_set in C++11
  • set in Python

Why comparing and hashing floats is a problem?

Let's take two float numbers: 1.2345678 and 1.2345679 The difference between them is small enough to have no influence on the quality of your Robot, but comparing 1.2345678 and 1.2345679 results in "not equal" and hashing them results in two different hashes. Introducing an "almost equal" term may be a good idea, but that can't be done with hashes.

Best Answer from StackOverflow

Question Source : http://cs.stackexchange.com/questions/37952

0 comments:

Post a Comment

Let us know your responses and feedback