Suppose one wanted to build a tree at random. Let the first insertion at step $i = 1$ be the root node. From here, nodes are inserted into the tree at random one at a time. How would one go about calculating the expected number of nodes $E(d)$ at depth $d$ after $i$ insertions?
For example, At $i = 1$, it's just the root node, so $E(0) = 1$ and all other depths will have zero nodes. At $i = 2$, $E(0) = 1$ and $E(1) = 1$ as the inserted node has to be at depth 1 from the root. At $i = 3$, the next node may either be attached to the root node or the existing node at depth one, so the tree may either look like:
* / \ * *
Or:
* / * / *
Depending on what happend at $i = 3$, at $i = 4$ there's either a $2/3$ chance of the new node attaching at depth 2 and $1/3$ attaching at depth 1, or an even $1/3$ probability of the new node attaching at the root, the node at depth 1 or the node at depth 2. Keeping random insertion in mind (not a binary or balanced tree in any way), how would one go about calculating the expected number of nodes at depth $d$ after $i$ insertions?
Asked By : Bryce Thomas
Answered By : Yuval Filmus
Let $X_{d,t}$ denote the expected number of nodes at level $d$ and time $t$, and let $p_{d,t}$ be the probability that at time $t$, a node is added at level $d$. Our indexing is such that $p_{0,0} = X_{0,t} = 1$. Clearly $$ X_{d,t} = \sum_{s=0}^t p_{d,t}. $$ Suppose that at time $t-1$, the number of leaves at depth $d-1$ is $X$. Then the probability that at time $t$ a leaf is added at depth $d$ is $X/t$. The expectation of this across the first $t-1$ steps is exactly $X_{d-1,t-1}$, and so $$ p_{d,t} = \frac{X_{d-1,t-1}}{t}. $$ Therefore $$ X_{d,t} = \sum_{s=1}^t \frac{X_{d-1,s-1}}{s}. $$ What does this look like? Let's use generating functions: $$ P_d(z) = \sum_{t=0}^\infty X_{d,t} z^t = \sum_{t=0}^\infty \sum_{s=1}^t \frac{X_{d-1,s-1}}{s} z^t = \sum_{s=1}^\infty \frac{X_{d-1,s-1}}{s} \sum_{t=s}^\infty z^t = \sum_{s=1}^\infty \frac{X_{d-1,s-1}}{s} \frac{z^s}{1-z}. $$ Notice that $$ \int_0^z P_{d-1}(w) dw = \sum_{s=1}^\infty X_{d-1,s-1} \int_0^z w^{s-1} dw = \sum_{s=1}^\infty X_{d-1,s-1} \frac{w^s}{s}. $$ Therefore $$ P_d(z) = \frac{1}{1-z} \int_0^z P_{d-1}(w) dw, \quad P_0(z) = \frac{1}{1-z}. $$ We can calculate $$ \begin{align*} P_0(z) &= \frac{1}{1-z} \\ P_1(z) &= \frac{-\log (1-z)}{1-z} \\ P_2(z) &= \frac{1}{2} \frac{\log^2(1-z)}{1-z} \\ P_3(z) &= \frac{1}{6} \frac{-\log^3(1-z)}{1-z} \end{align*} $$ A pattern becomes apparent, and is easy to prove by induction: $$ P_d(z) = \frac{1}{d!} \frac{\log^d (1-z)^{-1}}{1-z}. $$ We can pack all this data into a bivariate generating function: $$ P(z,w) = \sum_{d \geq 0} X_{d,t} z^t w^d = \frac{\exp (w\log (1-z)^{-1})}{1-z} = (1-z)^{-w-1}.$$ In principle, we can compute $X_{d,t}$ asymptotically from the generating functions; but it is easier to do this directly: $$ \begin{align*} X_{0,t} &= 1 \\ X_{1,t} &= \sum_{s=1}^t \frac{1}{s} = \log t + O(1) \\ X_{2,t} &= \sum_{s=1}^t \frac{\log s + O(1)}{s} = \int_1^t \frac{\log s + O(1)}{s} ds + O\left(\frac{\log t}{t}\right) = \frac{\log^2 t}{2} + O(\log t) \\ X_{3,t} &= \sum_{s=1}^t \frac{\log^2 s + O(\log s)}{2s} = \int_1^t \frac{\log^2 s + O(\log s)}{2s} ds + O\left(\frac{\log^2 t}{t}\right) = \frac{\log^3 t}{6} + O(\log^2 t) \end{align*} $$ Generalizing, we get $$X_{d,t} = \frac{1}{d!} \log^d t + O(\log^{d-1} t).$$ With more work, it is possible to obtain lower order terms as well, indeed a polynomial in $\log t$ followed by a Laurent series.
Best Answer from StackOverflow
Question Source : http://cs.stackexchange.com/questions/22580
0 comments:
Post a Comment
Let us know your responses and feedback