As input I have two sets of points in RN, typically for large N, for example N=40. Supose both sets have m elements:
S = s1 ... sm
T = t1 ... tm
Semantically both sets are equal, but due to noise (of whatever kind) on the R^N points, elements which should be semantically the same still have a distance larger than 0.
What I want to find is m tuples (si, tj) such that the sum of distance(si, tj) is minimized and such that sk and tk exactly occur once in the set of tuples for k=1...m. Basically (i,j) have to be chosen as towers on a chessboard which can not hit each other while minimizing the summed distances.
In other words, I want to find an onto one-one map between S and T which 'is sort of an identity map, but robust to noise'. We assume that the distance measure is a good indication of how similar elements are.
Basically I need to find a permutation of 1...N, and hence I think this problem is either NP-hard or NP-complete, since it 'feels' quite similar to the TSP; however I have not been able to rewrite the TSP problem to a subset of my problem here.
Is this problem realistically solvable for large N? Is there a name for this problem? What would be a feasible solution? Is there another criteria which might be better than the summed distances?
I thought of a greedy approach, let D be a matrix of the distances, dij = distance(si, tj).
T = {} while D is not empty: (i,j) = argmin-(i,j) dij append (i,j) to T set row i and column j to infinity.
This does not result in the optimal solution, but finds a solution. Would this be my best bet? Should I use simulated annealing, or is it overkill?
PS: from my point of view, this is just a minor problem in a bigger ML problem, however, I am very much interested in the CS background of it.
Asked By : Herbert
Answered By : D.W.
This is the problem of finding the maximum matching in a weighted bipartite graph. There are efficient algorithms that solve this problem in polynomial time.
Basically, you have a complete bipartite graph with $2m$ vertices. $S$ forms one set of vertices; $T$ forms the other set of vertices. For each $i,j$, create an edge from $s_i$ to $t_j$ whose weight is equal to $-d(s_i,t_j)$, the negation of the distance between $s_i$ and $t_j$.
Now, you want to find a matching whose weight is as large as possible. This corresponds to a set of pairs $(s_i,t_j)$ whose total distance is as small as possible. From there, you can use existing algorithms on this graph to find the maximum-weight matching.
Best Answer from StackOverflow
Question Source : http://cs.stackexchange.com/questions/28141
0 comments:
Post a Comment
Let us know your responses and feedback