World's most popular travel blog for travel bloggers.

# Shifting deletion algorithm for linear-probing hash table

, ,
Problem Detail:

I'm looking for the algorithm for deleting an element from a linear-probing hash table that does a delete-and-shift instead of just using tombstone elements.

The basic idea is quite simple - but I keep getting caught up in corner cases (notably, w.r.t. "wraparound", where an element hashes to a high slot but all the end slots are full so the index it is actually inserted at is low.)

Basic idea:

When you delete an element, keep track of the slot and start scanning forwards. If you hit an element that's null (read: not present), you're done. Otherwise, if you hit an element that hashes to a slot <= the slot kept track of, move that element to the kept-track-of slot, and keep track of the current slot instead.

Or in pseudocode:

``func del(item i):     slot s = get_slot_of(i)     table[s] = null     fixup(s) func fixup(slot s):     assert table[s] == null     slot t = s     while table[++t] != null:         if get_slot_of(table[t]) <= s:             table[s] = table[t]             table[t] = null             s = t ``

In simple cases, this works. But as soon as you include the modulo behavior, it falls flat on its face.

So how would I go about fixing this up such that it doesn't fall flat on its face in such cases? I looked around, but couldn't find this algorithm implemented anywhere - it seems like most places that use linear probing just use tombstone elements instead.

There are two main approaches to cover the "wraparound" case:

• "Triple condition", e. g. see `java.util.IdentityHashMap` source, or with more clarifying comments here
• Second approach is based on counting the distance to the empty slot and comparing it with the distance between the slot where the "candidate for shift" is observed and where it should have been inserted if the hash table is empty, which could be obtained (covering the wraparound case) by substracting the index of the slot where the key should have been isnerted from the index of the current slot in the table, modulo table size (binary `& (tableSize - 1)` if the table size is a power of two). Example imlementation of this approach could be found here.