World's most popular travel blog for travel bloggers.

# Definition and properties of support

, ,
Problem Detail:

Support measures the fraction of transactions that contain a particular subset of items. The notions of support and correlation may not necessarily agree with each other. This is because item pairs with high support may be poorly correlated while those that are highly correlated may have very low support. For instance, suppose we have an item pair {A, B}, where supp(A) = supp(B) = 0.8 and supp(A, B) = 0.64. Both items are uncorrelated because supp(A, B) = supp(A)supp(B). In contrast, an item pair {A, B} with supp(A) = supp(B) = supp(A, B) = 0.001 is perfectly correlated despite its low support.

I don't quite follow the example about support. Could anyone give a proof of $supp(A)supp(B) = supp(A, B) \iff A \text{ and } B \text{ are uncorrelated}$ ? I don't even quite understand what $supp(A, B)$ means. Is there a book or paper on support vs correlation?

First, the definition is clear: support is exactly "the fraction of transactions that contain a particular subset of items." This is a data-mining term, not a statistics term.

$supp(A)$ is the fraction of transactions that contain item $A$. $supp(B)$ is the fraction of transactions that contain item $B$. $supp(A,B)$ is the fraction of transactions that contain the subset $\{A,B\}$.

Now, your question really seems to be more about how the authors expound on this, which does seem to be a little unclear. As a statistic, support can basically mean, "What is the probability that this subset of items is contained in a random transaction?" That is, you can consider that $supp(A)=\mathbf{P}(A)$.

Then you can see that $$supp(A)supp(B) = supp(A, B) \iff A \text{ and } B \text{ are uncorrelated}$$ comes from the definition of statistical independence, $$\mathbf{P}(A\cap B) = \mathbf{P}(A)\mathbf{P}(B) \iff A \perp B.$$