Comments on: In my role as professional singer and ham

By: Ewan Cameron

Ewan Cameron — Tue, 17 Sep 2019 08:08:25 +0000

Shhh … the idea that (unnecessarily) complicated ML methods can add huge value to electronic patient data is the bait that certain healthcare start-ups need to get patients to sign over their precious data (for eventual repackaging to insurance companies & co). (You don’t get a $3 billion valuation just for being able to train a NN.)

By: Sameera Daniels

Sameera Daniels — Sat, 14 Jul 2018 02:23:22 +0000

In reply to Sameera Daniels. I sure picked up some great lines for my new script. Smile.

By: Sameera Daniels

Sameera Daniels — Mon, 02 Jul 2018 18:06:17 +0000

In reply to Sameera Daniels. Smile

By: Sameera Daniels

Sameera Daniels — Sat, 30 Jun 2018 21:31:18 +0000

In reply to Dan Simpson.

Re: Firstly, because even from one of my favourite authors, I am basically allergic to people writing about maths and mathematicians. It’s always so stodgy and wooden, as if they are writing about a world they don’t understand but also can’t convincingly fake.
—

I don’t know, I am on the fence about whether Frank Plumpton Ramsey’s sister did justice to Frank’s life. I can say being related to the family that I was shocked that Frank was engaged in a open marriage. I guess as a sister, it’s hard to get into the steamy stuff. But our family, as probably nearly all families, has its share of iconoclasts and eccentrics.

Like I still smile when a hunk walks past me on the patio. I couldn’t resist that one. Seriously my patio inundated with hunks. LOLLLLLLL

Someone will get the joke who might be reading. I am a huge tease and joker.

By: Corey Yanofsky

Corey Yanofsky — Sat, 30 Jun 2018 20:58:30 +0000

In reply to Pete Mohanty. It's not really that surprising that polynomial regression continues to predict well even after multicollinearity sets in. Multicollinearity means that the likelihood doesn't have a unique maximum point -- rather, a whole subspace of parameter space that achieve the maximum value and you get the same predicted values *no matter where you are in the optimal subspace*. This is true of linear regression in general; one just has to be mindful of the injunction against extrapolating outside of the fitted data (and mindful of the fact that when multicollinearity occurs the dimensionality of the fitted data is smaller than the cardinality of the set of predictors).

By: Lara Pawson

Lara Pawson — Sat, 30 Jun 2018 06:56:35 +0000

Thank you for this properly thoughtful response to Will’s brilliant book. I’m a friend of his and completely biased about this work. And it brings so much pleasure to know he has readers out there like you, Dan. He must be chuffed. I am sure he is. Have a wonderful weekend!

By: Pete Mohanty

Pete Mohanty — Fri, 29 Jun 2018 21:29:57 +0000

Re: ‘bad idea’ of using natural basis (polyreg, the package we introduce in ‘Polynomial Regression As an Alternative to Neural Nets’) lets the user do PCA first. Re: no regularization, we note plans to include such comparisons in the next draft. More importantly, we do not recommend this as a tool used for inference; in fact, polyreg does not even report coefficient estimates. As the paper shows, polynomial regression is surprisingly robust for predictive purposes long after multicollinearity has set in such that one would not trust individual coefficient estimates.

By: Jake

Jake — Fri, 29 Jun 2018 21:03:53 +0000

In reply to Ben Goodrich. I think my main problem with the Matloff et al is that things like RELU or a sigmoid function are morally very far from being polynomials (sigmoids are bounded by (-1,1), and RELU has too many 0s), and to me the problem with 'well you can approximate them anyway on an interval' is that you need to hope really hard that the output from your previous level isn't too far outside that interval.

By: Keith O'Rourke

Keith O'Rourke — Fri, 29 Jun 2018 12:10:38 +0000

Deja vu (sort of)
https://www.ices.on.ca/Publications/Journal-Articles/1998/January/A-comparison-of-statistical-learning-methods-on-the-GUSTO-database
(I was a bystander in that work and predicted the result.)

I came across approximation theory as a side effect of an interest in smart Monte Carlo and numerical integration and remember being blown away as to why that was never part of any my statistics courses. Think it should be a required for getting a stats degree.

By: Dan Simpson

Dan Simpson — Fri, 29 Jun 2018 05:11:56 +0000

In reply to Ben Goodrich.

Two things:
1) I don’t think most NNs in practice use quadratic activation functions, so S-W is definitely important. The network itself will be much higher degree (doubling on every layer). So the polynomial basis they’re using for regression is of a very high degree (see the tables)
2) Chebfun work typically assumes functions are (piecewise) analytic. In that case I’m skeptical if people pulling out S-W because it’s the wrong tool. My PhD was pretty heavy on approximation theory. I am aware of the literature. My reading of the paper was that it was being applied only to the activation function not to the response surface, so smoothness may be sensible.

By: Ben Goodrich

Ben Goodrich — Fri, 29 Jun 2018 03:06:24 +0000

“Now as a general rule, anytime someone whips out Stone-Weierstrass I feel a little skeptical. Because the bit of me that remembers my approximation theory remembers that the construction in this theorem is very slow to converge. I’m also alarmed by the use of high-degree polynomial regression using the natural basis and no regularization. Both of these things are a very bad idea.”

This is not quite how I reacted to the paper. First, the authors are mostly using _quadratic_ polynomials and arguing that using NNs is like using high-degree polynomials. See last slide of

http://heather.cs.ucdavis.edu/polygrail.pdf

I guess limiting the degree is a form of regularization, but I agree they should also be regularizing the coefficients for any given degree polynomial.

Second, whenever someone whips out a criticism of whipping out the Stone-Weierstrass theorem (Andrew did also in an email to me and Bob) I whip out my goto criticism of the criticism of the Stone-Weierstrass theorem:

http://www.chebfun.org/ATAP/atap-first6chapters.pdf (last paragraph of chapter 6)

“The trouble with this line of research is that for almost all the functions encountered in practice, Chebyshev interpolation works
beautifully! Weierstrass’s theorem has encouraged mathematicians over the years to give too much of their attention to pathological
functions at the edge of discontinuity, leading to the bizarre and unfortunate situation where many books on numerical analysis caution
their readers that interpolation may fail without mentioning that for functions with a little bit of smoothness, it succeeds
outstandingly.”

The end of https://www.boost.org/doc/libs/1_67_0/libs/math/doc/html/math_toolkit/sf_poly/chebyshev.html references this book when discussing its chebyshev_transform function, which inputs a function on the (a, b) interval and outputs the near-minimax polynomial for that function. It worked well for me when doing an integral that could otherwise only be done with the integrate_1d() function in Stan that Ben Bales is close to merging. It would be interesting to see someone plug in their NN activation function and see how mini the near-minimax approximation to it is.

By: Dan Simpson

Dan Simpson — Fri, 29 Jun 2018 01:15:59 +0000

In reply to Andrew. I usually only read trash. Definitely not stuff I can shoehorn into a discussion about NNs

By: Andrew

Andrew — Fri, 29 Jun 2018 01:06:45 +0000

Dan:

Thanks. This blog can always use a bit more literary criticism.