World's most popular travel blog for travel bloggers.

# [Solved]: Can expected "depth" of an element and expected "height" differ significantly?

, ,
Problem Detail:

When analysing treaps (or, equivalently, BSTs or Quicksort), it is not too hard to show that

$\qquad\displaystyle \mathbb{E}[d(k)] \in O(\log n)$

where $d(k)$ is the depth of the element with rank $k$ in the set of $n$ keys. Intuitively, this seems to imply that also

$\qquad\displaystyle \mathbb{E}[h(T)] \in O(\log n)$

where $h(T)$ is the height of treap $T$, since

$\qquad\displaystyle h(T) = \max_{k \in [1..n]} d(k)$.

Formally, however, there does not seem to be an (immediate) relationship. We even have

$\qquad\displaystyle \mathbb{E}[h(T)] \geq \max_{k \in [1..n]} \mathbb{E}[d(k)]$

by Jensen's inequality. Now, one can show expected logarithmic height via tail bounds, using more insight into the distribution of $d(k)$.

It is easy to construct examples of distributions that skrew with above intuition, namely extremely asymmetric, heavy-tailed distributions. The question is, can/do such occur in the analysis of algorithms and data structures?

Are there example for data structures $D$ (or algorithms) for which

$\qquad\displaystyle \mathbb{E}[h(D)] \in \omega(\max_{e \in D} \mathbb{E}[d(e)])$?

Nota bene:

• Of course, we have to interpret "depth" and "height" liberally if we consider structures that are not trees. Based on the posts Wandering Logic links to, "Expected average search time" (for $1/n \cdot \sum_{e \in D} \mathbb{E}[d(e)]$) and "expected maximum search time" (for $\mathbb{E}[h(D)]$) seem to be used.

• A related question on math.SE has yielded an interesting answer that may allow deriving useful bounds on $\mathbb{E}[h(D)]$ given suitable bounds on $\mathbb{E}[d(e)]$ and $\mathbb{V}[d(e)]$.

#### Answered By : Wandering Logic

In chained hash tables with uniform hashing a similar question arises. Given a table with $n$ elements hashed into $m$ buckets, the expected number of elements per bucket is $n/m$, but what is the expected length of the longest chain?

I found a Stackoverflow question about chained hash tables. The answer by btilly argues that for fixed ratio $n/m$ the expected worst case longest chain is $\Theta(\log n/ \log\log n)$. (Since $n/m$ is fixed, $m$ doesn't need to come into the answer.)

And the answer by templatetypedef references the following paper for a more detailed analysis:
Gaston H. Gonnet: Expected Length of the Longest Probe Sequence in Hash Code Searching.
J. ACM_ 28(2):289-304, 1981.
[Free techreport version at University of Waterloo]

Slightly more generally I found a math.stackexchange question about the balls and bins problem. The answer by Yuval Filmus cites the following paper, which gives even tighter bounds: Martin Raab; Angelika Steger: Balls into Bins - A Simple and Tight Analysis. 2nd Intl Wkshp on Randomization and Approximation Techniques in Computer Science, pp. 159-170, 1998.

Finally, there is apparently a branch of statistics called extreme value theory culminating in the Fisher-Tippett-Gnedenko theorem, which, I think, gives expected maxima for a huge variety of distributions.