—

I don’t know, I am on the fence about whether Frank Plumpton Ramsey’s sister did justice to Frank’s life. I can say being related to the family that I was shocked that Frank was engaged in a open marriage. I guess as a sister, it’s hard to get into the steamy stuff. But our family, as probably nearly all families, has its share of iconoclasts and eccentrics.

Like I still smile when a hunk walks past me on the patio. I couldn’t resist that one. Seriously my patio inundated with hunks. LOLLLLLLL

Someone will get the joke who might be reading. I am a huge tease and joker.

]]>https://www.ices.on.ca/Publications/Journal-Articles/1998/January/A-comparison-of-statistical-learning-methods-on-the-GUSTO-database

(I was a bystander in that work and predicted the result.)

I came across approximation theory as a side effect of an interest in smart Monte Carlo and numerical integration and remember being blown away as to why that was never part of any my statistics courses. Think it should be a required for getting a stats degree.

]]>1) I don’t think most NNs in practice use quadratic activation functions, so S-W is definitely important. The network itself will be much higher degree (doubling on every layer). So the polynomial basis they’re using for regression is of a very high degree (see the tables)

2) Chebfun work typically assumes functions are (piecewise) analytic. In that case I’m skeptical if people pulling out S-W because it’s the wrong tool. My PhD was pretty heavy on approximation theory. I am aware of the literature. My reading of the paper was that it was being applied only to the activation function not to the response surface, so smoothness may be sensible. ]]>

This is not quite how I reacted to the paper. First, the authors are mostly using _quadratic_ polynomials and arguing that using NNs is like using high-degree polynomials. See last slide of

http://heather.cs.ucdavis.edu/polygrail.pdf

I guess limiting the degree is a form of regularization, but I agree they should also be regularizing the coefficients for any given degree polynomial.

Second, whenever someone whips out a criticism of whipping out the Stone-Weierstrass theorem (Andrew did also in an email to me and Bob) I whip out my goto criticism of the criticism of the Stone-Weierstrass theorem:

http://www.chebfun.org/ATAP/atap-first6chapters.pdf (last paragraph of chapter 6)

“The trouble with this line of research is that for almost all the functions encountered in practice, Chebyshev interpolation works

beautifully! Weierstrass’s theorem has encouraged mathematicians over the years to give too much of their attention to pathological

functions at the edge of discontinuity, leading to the bizarre and unfortunate situation where many books on numerical analysis caution

their readers that interpolation may fail without mentioning that for functions with a little bit of smoothness, it succeeds

outstandingly.”

The end of https://www.boost.org/doc/libs/1_67_0/libs/math/doc/html/math_toolkit/sf_poly/chebyshev.html references this book when discussing its chebyshev_transform function, which inputs a function on the (a, b) interval and outputs the near-minimax polynomial for that function. It worked well for me when doing an integral that could otherwise only be done with the integrate_1d() function in Stan that Ben Bales is close to merging. It would be interesting to see someone plug in their NN activation function and see how mini the near-minimax approximation to it is.

]]>Thanks. This blog can always use a bit more literary criticism.

]]>