Skip to content
 

Interesting Cases, Support Vectors, and Ape Art

What makes an observation interesting? Through the example of devious quizzes
that ask you to distinguish ape art from modern art, we will investigate the fundamental
idea of support vector machines: a SVM is a classifier specified in terms of weights
assigned to interesting observations. This is different from most regression models in
statistics, which are specified in terms of weights assigned to variables or interactions.

Building predictive models from data is a frequent pursuit of
statisticians and even more frequent for machine learners and data
miners. The main property of a predictive model is that we do not
care much about what the model is like: we primarily care about the
ability to predict the desired property of a case (instance). On the
other hand, for most statistical applications of regression, the
actual structure of the model is of primary interest.

Predictive models are not just an object of rigorous analysis. We
have predictive models in our heads. For example, we may believe
that we can distinguish good art from bad art. Mikhail Simkin has
been entertaining the public with devious quizzes. A recent
example is An
artist or an ape?
, where one has to classify a picture based on
whether it was painted by an abstractionist or by an ape. Another
quiz is Sokal &
Bricmont or Lenin?
, where you have to decide if a quote is from
Fashionable Nonsense or from Lenin. There are also tests that check
your ability to discern famous painters, authors and musicians from
less famous ones. The primary message of the quizzes is that the
boundaries between categories are often vague. If you are interested
in how well test takers perform, Mikhail did an analysis of the True art
or fake?
quiz.

These quizzes bring us to another notion, which has rocked the
machine learning community over the past decade: the notion of a href=”http://www.kernel-machines.org/tutorial.html”>support vector
machine. The most visible originator of the methodology is href=”http://yann.lecun.com/ex/fun/index.html#allyourbayes”>Vladimir
Vapnik. There are also close links to the methodology of href=”http://www.gaussianprocess.org/”>Gaussian processes, and
the work of Grace
Wahba
.

A SVM is nothing but a hyperplane in some space defined by the
features. The hyperplane separates the cases of one class (ape
pictures) from the cases of another class (painter pictures). Since
there can be many hyperplanes that do separate one from the other,
the optimal one is thought to be equidistant from the best ape
picture and the worst painter picture. Using the href=”http://en.wikipedia.org/wiki/Kernel_trick”>`kernel trick’
we can conjure another space where individual dimensions may
correspond to interactions of features, polynomial terms, or even
individual instances.

svm.png src=”http://andrewgelman.com/movabletype/archives/svm.png/svm.png”
width=”300″ height=”300″ />

In the above image, we can see the separating green hyperplane
halfway between the blue and red points. Some of the points are
marked with yellow dots: those points are sufficient to define the
position of the hyperplane. Also, they are the ones that constrain
the position of the hyperplane. And this is the key idea of support
vector machines: the model is not parameterized in terms of the
weights assigned to features but in terms of weights
associated with each case
.

The heavily-weighted cases, the support vectors, are also
interesting to look at, because of pure human
curiosity. An objective of experimental design would be to do
experiments that would result in new support vectors: otherwise the
experiments would not be interesting – this flavor of experimental
design is referred to as `active learning’ in the machine learning
community. The support vectors are the cases that seem the trickiest
to predict. My guess is that Mikhail intentionally selects such
cases in his quiz as to make it fun.

6 Comments

  1. rif says:

    The property of learning a function parametrized by weights on examples is a general property of optimizing functionals which are the sum of an empirical loss functional and the norm in a Reproducing Kernel Hilbert Space. The SVM is one example, but there are many others. The business about the examples closest to the boundary, and actually the whole geometric conception of SVMs in terms of distance to the hyperplane ("margin"), is a bit of a red-herring. In practice, SVMs are nearly always used in a context where "errors" are allowed but penalized, and in this case, all points which are errors are also support vectors. In this framework, the nice geometric notion of the support vectors being objects closest to the boundary is lost — there's still a boundary, but there can be errors which are arbitrarily close to it, and the margin is not well-defined. I prefer to think of the SVM as arising from a particular choice of loss function (the hinge loss) in a functional optimization problem.

    Loved the ape quiz.

  2. Aleks says:

    The complexity of the "statistical learning" approach is staggering and comparable to the scope of the statistical school. But a statistician would use the likelihood function instead of an empirical loss functional, the prior instead of a regularizer, probability of having generated the data instead of hinge loss. These tools are analogous, and differences may be irrelevant. RKHS is very nice, but it takes one or two lectures to explain it properly.

    However, expressing the model in terms of weights assigned to cases is something that one doesn't see too often in statistics, and would be interesting to see more often.

  3. rif says:

    I agree completely with you. Have you considered presenting something like gaussian process regression? It's fairly simple, it's the same equations as regularized least squares "under the covers", but it's a nice bayesian interpretation, it gives you a confidence interval on your outputs, AND the function you learn is expressed in terms of weights assigned to the observations, just like an SVM.

  4. pnprice says:

    I'm going to ignore the serious statistical isues here and just focus on the fun stuff. I'm pleased to say that I got 100% correct on the Artist or Ape quiz, and got 83% correct on the True Art or Fake quiz. I'm ashamed to admit that I got exactly 50% on the Sokal and Bricmont or Lenin quiz.

    But as far as telling us anything, two of these quizzes really don't.

    (1) Artist or Ape is uninformative because I'm pretty sure the author of the quiz chose the Artist pictures that look _most_ like an ape might have drawn them, and chose the Ape pictures to look _most_ like an artist might have drawn them. This doesn't really tell us anything about whether modern artists draw like apes draw.

    (2) True Art or Fake is uninformative because the "Fake" art was generated by someone who had seen the True Art and was deliberately trying to compose something that looked similar. The "skill" in drawing a Mondrian isn't in drawing the lines and coloring in the squares, it's conceiving of the idea of making a pattern like that out of lines and squares. I happen to not like Mondrian, but I give the guy some credit: until he came along, nobody was making (or at least selling or showing) art like that. Sure, NOW you can copy him and make something that looks pretty similar, but you're still copying him. If I stand exactly where Ansel Adams stood to take his famous Half Dome picture, and expose the film at the same time of day and develop and print it the same way, I'll have something that is nearly indistinguishable from an Ansel Adams picture, but I will not have demonstrated that Adams was as talentless as I am.

    Sokal and Bricmont or Lenin seems more fair, though. It suffers somewhat from the same shortcoming as (1) above — these are the S-B quotes that are most like Lenin quotes and vice versa — but given that S-B is just one book about one subject, the fact that there are _any_ non-trivial S-B quotes that sound just like non-trivial Lenin quotes is already noteworthy.

    If I were to try to put all of these thoughts into a scholarly statistical context, I would say that there is a strong selection bias in the quizzes. Using these results to say that artists paint like apes, or that modern artists are no more talented than non-artists, is like comparing the temperate of the warmest winter days and the coolest summer days and saying that winter temperatures are about the same as summer.

    All of that said, the quizzes are a ton o' fun and I'll be looking for more.

  5. The bias asserted by rif in the previous comment does not exist.

    Sokal & Bricmont's views on philosophy of science are identical to those of Lenin.

    Not all abstract paintings look like a work of an ape: many contain geometric figures, which apes are unable to draw. However, the paintings within the branch of abstract art, called "Abstract Expressionism", all look like they were produced by an ape.

    Rif's comments about imitating Mondrian are irrelevant as I did not imitate him (or any other artist). Instead I presented the paintings creared using The new method in Abstract Art, invented by myself.

    Regarding warmest winter and coolest summer days: in some places indeed there is no difference between the seasons of the year. In San Francisco you don't swim in summer and don't ski in winter.
    One can say that there is no winter or summer in San Francisco (just like in the San Francisco Museum of Modern Art there is no art).

  6. Abstract says:

    Pretty interesting stuff, the use of dots inspired me to make a painting