So I'm reading "Introduction to Machine Learning" 2nd edition, by Bishop, et. all. On page 27 they discuss the Vapnik-Chervonenkis Dimension which is,
"The maximum number of points that can be shattered by H [the hypothesis class] is called the Vapnik-Chervonenkis (VC) Dimension of H, is denoted VC(H) and measures the capacity of H."
Whereas "shatters" indicates a hypothesis $h \in H$ for a set of N data points such that it separates the positive examples from the negative. In such an example it is said that "H shatters N points".
So far I think I understand this. However, the authors lose me with the following:
"For example, four points on a line cannot be shattered by rectangles."
There must be some concept here I'm not fully understanding, because I cannot understand why this is the case. Can anyone explain this to me?
Asked By : BrotherJack
Answered By : JeffE
The definition of "a set $P$ can be shattered by rectangles" is that for every subset of $P$, there is a rectangle that contains precisely that subset and excludes the rest of $P$. Equivalently, every labeling of the points as positive and negative is consistent with at least one hypothesis in $H$.
Now consider four points $p,q,r,s$ along a line in the plane. Since there is no rectangle that contains $p$ and $r$ but excludes $q$ and $s$, these four points cannot be shattered by rectangles.
Best Answer from StackOverflow
Question Source : http://cs.stackexchange.com/questions/1917
0 comments:
Post a Comment
Let us know your responses and feedback