World's most popular travel blog for travel bloggers.

[Solved]: Efficient compression of unlabeled trees

, , No Comments
Problem Detail: 

Consider unlabeled, rooted binary trees. We can compress such trees: whenever there are pointers to subtrees $T$ and $T'$ with $T = T'$ (interpreting $=$ as structural equality), we store (w.l.o.g.) $T$ and replace all pointers to $T'$ with pointers to $T$. See uli's answer for an example.

Give an algorithm that takes a tree in the above sense as input and computes the (minimal) number of nodes that remain after compression. The algorithm should run in time $\cal{O}(n\log n)$ (in the uniform cost model) with $n$ the number of nodes in the input.

This has been an exam question and I have not been able to come up with a nice solution, nor have I seen one.

Asked By : Raphael

Answered By : Alex ten Brink

Yes, you can perform this compression in $O(n \log n)$ time, but it is not easy :) We first make some observations and then present the algorithm. We assume the tree is initially not compressed - this is not really needed but makes analysis easier.

Firstly, we characterize 'structural equality' inductively. Let $T$ and $T'$ be two (sub)trees. If $T$ and $T'$ are both the null trees (having no vertices at all), they are structurally equivalent. If $T$ and $T'$ are both not null trees, then they are structurally equivalent iff their left children are structurally equivalent and their right children are structurally equivalent. 'Structural equivalence' is the minimal fixed point over these definitions.

For example, any two leaf nodes are structurally equivalent, as they both have the null trees as both their children, which are structurally equivalent.

As it is rather annoying to say 'their left children are structurally equivalent and so are their right children', we will often say 'their children are structurally equivalent' and intend the same. Also note we sometimes say 'this vertex' when we mean 'the subtree rooted at this vertex'.

The above definition immediately gives us a hint how to perform the compression: if we know the structural equivalence of all subtrees with depth at most $d$, then we can easily compute the structural equivalence of the subtrees with depth $d+1$. We do have to do this computation in a smart way to avoid a $O(n^2)$ running time.

The algorithm will assign identifiers to every vertex during its execution. An identifier is a number in the set $\{ 1, 2, 3, \dots, n \}$. Identifiers are unique and never change: we therefore assume we set some (global) variable to 1 at the start of the algorithm, and every time we assign an identifier to some vertex, we assign the current value of that variable to the vertex and increment the value of that variable.

We first transform the input tree into (at most $n$) lists containing vertices of equal depth, together with a pointer to their parent. This is easily done in $O(n)$ time.

We first compress all the leaves (we can find these leaves in the list with vertices of depth 0) into a single vertex. We assign this vertex an identifier. Compression of two vertices is done by redirecting the parent of either vertex to point to the other vertex instead.

We make two observations: firstly, any vertex has children of strictly smaller depth, and secondly, if we have performed compression on all vertices of depth smaller than $d$ (and have given them identifiers), then two vertices of depth $d$ are structurally equivalent and can be compressed iff the identifiers of their children coincide. This last observation follows from the following argument: two vertices are structurally equivalent iff their children are structurally equivalent, and after compression this means their pointers are pointing to the same children, which in turn means the identifiers of their children are equal.

We iterate through all the lists with nodes of equal depth from small depth to large depth. For every level we create a list of integer pairs, where every pair corresponds to the identifiers of the children of some vertex on that level. We have that two vertices in that level are structurally equivalent iff their corresponding integer pairs are equal. Using lexicographic ordering, we can sort these and obtain the sets of integer pairs that are equal. We compress these sets to single vertices as above and give them identifiers.

The above observations prove that this approach works and results in the compressed tree. The total running time is $O(n)$ plus the time needed to sort the lists we create. As the total number of integer pairs we create is $n$, this gives us that the total running time is $O(n \log n)$, as required. Counting how many nodes we have left at the end of the procedure is trivial (just look at how many identifiers we have handed out).

Best Answer from StackOverflow

Question Source : http://cs.stackexchange.com/questions/168

0 comments:

Post a Comment

Let us know your responses and feedback