I use a variation of a 5-cross median filter on image data on a small embedded system, i.e.
x x x x x
The algorithm is really simple: read 5 unsigned integer values, get the highest 2, do some calculations on those and write back the unsigned integer result.
What is nice is that the 5 integer input values are all in the range of 0-20. The calculated integer value are also in the 0-20 range!
Through profiling, I have figured out that getting the largest two numbers is the bottleneck so I want to speed this part up. What is the fastest way to perform this selection?
The current algorithm uses a 32 bit mask with 1 in the position given by the 5 numbers and a HW-supported CLZ function.
I should say that the CPU is a proprietary one, not available outside of my company. My compiler is GCC but tailor made for this CPU.
I have tried to figure out if I can use a lookup-table but I have failed to generate a key that I can use.
I have $21^5$ combinations for the input but order isn't important, i.e. [5,0,0,0,5]
is the same as [5,5,0,0,0]
.
It happens that the hash-function below produces a perfect hash without collisions!
def hash(x): h = 0 for i in x: h = 33*h+i return h
But the hash is huge and there is simply not enough memory to use that.
Is there a better algorithm that I can use? Is it possible to solve my problem using a lookup-table and generating a key?
Asked By : Fredrik Pihl
Answered By : Raphael
In my other answer I suggest that conditional jumps might be the main impediment to efficiency. As a consequence, sorting networks come to mind: they are data agnostic, that is the same sequence of comparisons is executed no matter the input, with only the swaps being conditional.
Of course, sorting may be too much work; we only need the biggest two numbers. Lucky for us, selection networks have also been studied. Knuth tells us that finding the two smallest numbers out of five² can be done with $\hat{U}_2(5) = 6$ comparisons [1, 5.3.4 ex 19] (and at most as many swaps).
The network he gives in the solutions (rewritten to zero-based arrays) is
$\qquad\displaystyle [0:4]\,[1:4]\,[0:3]\,[1:3]\,[0:2]\,[1:2]$
which implements -- after adjusting the direction of the comparisons -- in pseudocode as
def selMax2(a : int[]) a.swap(0,4) if a[0] < a[4] a.swap(1,4) if a[1] < a[4] a.swap(0,3) if a[0] < a[3] a.swap(1,3) if a[1] < a[3] a.swap(0,2) if a[0] < a[2] a.swap(1,2) if a[1] < a[2] return (a[0], a[1]) end
Now, naive implementations still have conditional jumps (across the swap code). Depending on your machine you can cirumvent them with conditional instructions, though. x86 seems to be its usual mudpit self; ARM looks more promising since apparently most operations are conditional in themselves. If I understand the instructions correctly, the first swap translates to this, assuming our array values have been loaded to registers R0
through R4
:
CMP R0,R4 MOVLT R5 = R0 MOVLT R0 = R4 MOVLT R4 = R6
Yes, yes, of course you can use XOR swapping with EOR.
I just hope your processor has this, or something similar. Of course, if you build the thing for this purpose, maybe you can get the network hard-wired on there?
This is probably (provably?) the best you can do in the classical realm, i.e. without making use of the limited domain and performing wicked intra-word magicks.
- Sorting and Searching by Donald E. Knuth; The Art of Computer Programming Vol. 3 (2nd ed, 1998)
- Note that this leaves the two selected elements unordered. Ordering them requires an extra comparison, that is $\hat{W}_2(5) = 7$ many in total [1, p234 Table 1].
Best Answer from StackOverflow
Question Source : http://cs.stackexchange.com/questions/39777
0 comments:
Post a Comment
Let us know your responses and feedback