World's most popular travel blog for travel bloggers.

# Sum of size of distinct set of descendants $d$ distance from a node $u$, over all $u$ and $d$ is $\mathcal{O}(n\sqrt{n})$

, ,
Problem Detail:

Let's consider a rooted tree $T$ of $n$ nodes. For any node $u$ of the tree, define $L(u,d)$ to be the list of descendants of $u$ that are distance $d$ away from $u$. Let $|L(u,d)|$ denote the number of nodes that are present in the list $L(u,d)$.

Prove that the sum of $|L(u,d)|$ over all distinct lists $L(u,d)$ is bounded by $\mathcal{O}(n\sqrt{n})$.

# My work

Consider all $L(u,d)$ such that the left most node on the level $Level(u) + d$ is some node $v$. The pairs $u, d$ for all such $L(u,d)$ must be distinct and the sum of all $d_i$ will correspond to the number of nodes $x$ in the tree with $Level(x) \le Level(u) + d$.

This is because if some sequence of nodes $v_1, v_2, \dots v_k$ corresponds to the descendants of some node $u$ at a distance $d$ and the sequence of nodes $v_1, v_2, \dots v_{k'}$ where $k' > k$ corresponds to the descendants of some node $u'$ at a distance $d+1$, then there must also exist a node $u''$ such that $L(u'', d) = v_{k+1}, v_{k+2}, \dots v_{k'}$. This would also mean that $u''$ is not in the subtree of $u$ and thus there are at least $d$ distinct nodes in the subtree of $u''$ upto a distance $d$ from $u''$.

If the distinct distances are $d_1, d_2, \dots d_k$ then, $n \ge \sum_{i}d_i \ge \sum_{i=1}^{k}i \ge \frac{k(k+1)}{2}$. =

$\implies k \le \sqrt{n}$

After this I tried to show that there can be only $\mathcal{O}(\sqrt{n})$ distinct lists $L(u,d)$ so that I can then trivially obtain the upper-boud of $n\sqrt{n}$ but I could not make any more useful observations.

This link claims that such an upper bound does exist but has not provided the proof.

Any ideas how we might proceed to prove this?

###### Answered By : Yuval Filmus

Let us start by giving an alternative formula for $\sum_{u,d} |L(u,d)|$. For a node $x$, let $\pi^{(d)}(x)$ be its $d$'th parent, and let $h(x)$ be its height. Then $$\sum_{u,d} |L(u,d)| = \sum_x |\{ L(\pi^{(d)}(x),d) : 0 \leq d \leq h(x)\}|.$$ It suffices to show that $|\{ L(\pi^{(d)}(x),d) : 0 \leq d \leq h(x)\}| = O(\sqrt{n})$ for all nodes $x$.

Let $y = \pi^{(d-1)}(x)$ and $\pi(y)$ be $y$'s parent. If $L(\pi(y),d) \neq L(y,d-1)$ then $y$ must have a sibling $z$ with a leaf at depth $d$. We can associate the $d$ vertices on the path leading to the leaf (including $z$) to $\pi(y)$; note that vertices associated to different parents of $x$ are distinct.

For each element in $\{L(\pi^{(d)}(x),d) : 0 \leq d \leq h(x)\}$ choose a representative $\pi^{(d)}(x)$ with the maximal $d$, and let $I$ be the set of all such $d$'s. In particular, if $\pi^{(d)}(x)$ is a representative then $L(\pi^{(d)}(x),d) \neq L(\pi^{(d-1)}(x),d)$, and so $\pi^{(d)}(x)$ has $d$ vertices associated to it. We deduce that $$\sum_{d \in I} d \leq n.$$ In particular, $I$ can contain at most $\sqrt{n}$ numbers larger than $\sqrt{n}$, and so $|I| = O(\sqrt{n})$.