I have to make a quick clustering program but the following formula is gibberish to me:
$\operatorname{Perf}(X,C) = \sum\limits_{i=1}^n\min\{||X_i-C_l||^2 \mid l = 1,...,K\}$
where $X$ is a set of multi-dimensional data and $C$ is a set of centroids for each data cluster.
This formula is a fitness function for an artificial bee colony clustering algorithm as a substitute for k-means clustering algorithm. It is described as a total within-cluster variance or the total mean-square quantization error (MSE).
Can anyone translate it to pseudo-code, normal human English, or at least enlighten me?
Asked By : helix
Answered By : edA-qa mort-ora-y
Just break it down into parts:
$ \{ f(l) \mid l = 1,...,K \} $
This is a simple set construction. The above would simply create a set with all the elements from 1 to K. In your case the f(l)
is the function:
$ ||X_i-C_l||^2 $
Given the ||
means the norm, these are vectors you are subtracting (rows of the X and C matrices). So subtract the vectors, take the norm, and square it. That produces a new set, of which you want to take the minimum.
$ \sum\limits_{i=1}^n $
This part is then just the sum of above min calculation for every index $i$ from $1$ to $n$.
Best Answer from StackOverflow
Question Source : http://cs.stackexchange.com/questions/2067
0 comments:
Post a Comment
Let us know your responses and feedback