https://www.youtube.com/watch?v=U756vlwn8Lg

Step outside and into the black stretch limousine, don’t ask how we know where you are. You are now the official Chief Marketing Officer for this operation.

]]>Note that you need to go to something that is x.0. Because everyone has gone from 2.0 to 4.0 at the moment, we need to show how far we are ahead by incrementing to 6.0. ]]>

I mean I certainly don’t expect the ML approach to be automatically figuring out new universal laws any time soon, we would be losing that ability by completely relying on it. But compare it to what is actually going on in these fields with “weak theory” where that isn’t going to be happening anyway.

They are playing around with a linear model until they get a statistically significant coefficient for variable A (“A is linked to X”). Then someone publishes another study where this happens for variable B (“A and B are linked to X”), then its discovered variable C interacts with variable A and B, etc.

]]>when modeling a person’s health you rarely collect data on their choice of sock color

I’m sure there is some correlation between health and sock color. For example, is it as easy to find diabetic socks in the same colors as normal socks? If not that may give you info about diabetes status. If you also have diabetes diagnosis as a feature then whether or not the person with diabetes has goofy colored socks could indicate whether they are wearing the specialized socks.

That’s just one way, I’m sure there are others. Unless we’re running out of space/time, I’d say throw it in if its available.

]]>I also still suspect you mean something different than usual when talking about a “model” in this context.

]]>The assumptions underlying Machine learning are at least that the data collected is relevant to the real world question of interest.

]]>I would agree that it is trivial for a human that truly understands the real causal process to generate the correct types of variable features of the world to reasonably replicate it, but not for a machine given a bunch of unstructured data with almost no guidance.

I meant that automatically generating features that are sums and products of others is a pretty basic feature generation technique. If you are at that point you are already well past “a bunch of unstructured data”. So I’m not sure what you meant by additive or multiplicative processes (this made me think of normal vs lognormal distributions, which I didn’t really see the relevance of).

just pick the 1 of 1000 models generated that best fit the data

Actually, you usually don’t want to just pick one and trying out only 1000 possible models would be really low:

https://en.wikipedia.org/wiki/Ensemble_learning

I know that sounds like nitpicking, but there are a number of details in what you say that are kind of “off”, indicating you don’t really “get it”. It sounded like you had a bad experience once because you weren’t familiar with the pitfalls of data leakage and also for some reason tried not to incorporate any theoretical understanding of the features into your pipeline (ie via feature generation/selection/transformation).

Anyway, I continue to stand by my original definition:

]]>Machine learning is techniques like classification and regression trees (CARTs) or neural networks, where you make as few assumptions about the data as possible and attempt to optimize your out-of-sample predictive skill/accuracy without caring much about your model parameters or functional form.

I would agree that it is trivial for a human that truly understands the real causal process to generate the correct types of variable features of the world to reasonably replicate it, but not for a machine given a bunch of unstructured data with almost no guidance. This is sort of my point with the hype around machine learning, AI and GUI interfaces that we are told can do machine learning for us as if the thinking about a problem can be essentially turned over to the machine and that thinking critically about the problem is no longer necessary, just pick the 1 of 1000 models generated that best fit the data. Now design an intervention to exploit this finding, never mind that it could as easily be associative as causal. That said, the organization and structuring of linear or multiplicative data comes with a boat load of assumptions often ignored and iterative methods of solving equations have been around for a very long time with the only difference in 2018 being the amount of computing storage and processing power that can be brought to the problem by the average modeler.

Perhaps there will be a time when the entire process can be turned over to the machine and it will spit out the correct solution without need for guidance or defense of assumptions and I will thus be pedantically satisfied with the term “machine learning”, but it is currently one of those terms to which I have a negative reaction given how it is often presented.

]]>]]>I’ve worked on projects where predictive models were optimized with every variable that increased the classification rate on the hold out sample. Over time this model did less well than one that did worse initially, but was consistent with a reasonable causal theory and performed far more consistently over time without need for tweaking.

This sounds like info was being leaked from the hold out set into the model. Methods like gradient boosting work

so wellthat if you just check the performance on a hold out set multiple times using different hyperparameters, features, etc (ie, “tweaking”) then you will overfit to that holdout set. You do not need to explicitly feed the data into the model for it to be trained/influenced by it. The simple model was just less flexible so it could not be overfit as easily.A machine learning method that would impress me is one that could extract the true causal process from data with a known causal process such that a linear model would be generated for a linear causal process and a multiplicative model would be generated for a known multiplicative causal process, etc.

I’m not really sure what you are looking for because it almost sounds trivial. Wouldn’t you just want to include features that are linear or multiplicative combinations of the base features?

A small project that may give some intuition is training a neural network to “add” a sequence of numbers.

To start with, generate a bunch of integer sequences (equal length will be easiest, you can pad with zeros…) to use as the features and then set the sum of each sequence as the target. You should see a simple network can “learn” to give correct answers within the domain of training examples, but when it fails it will be obvious that whatever it learned was not actually the rules of addition. There are then variations of how you can transform the input or choose an architecture that can improve on this. I thought whenever I did this I got it from a blog post somewhere but I am not finding it now, sorry.

P.S. It’s a finite universe and someone has to tend the boilers so “Everybody gets 100 meter yachts and a winter berth in Monaco!!!” isn’t a valid answer.

]]>Let’s consider a couple of examples and begin with yours which I agree is the least controversial, but also one I would be more likely to categorize as code automation rather than machine learning, but it is certainly not uncommon to have two words for the same thing and even overlapping concepts that create a unique category. Perhaps machine learning is aptly described as the iterative processing of solutions to equations, but I had assumed there was more to it.

Now let’s consider clustering algorithms from basic k-means to gradient boosting methods. In the examples I’ve seen, the gradient boosting methods clearly outperformed the k-means by substantial margins on hold-out samples. The results were not even close. What I have yet to see is the replication of this performance on future data when the models and parameters are not grounded in features that are consistent with a reasonable causal theory. I’ve worked on projects where predictive models were optimized with every variable that increased the classification rate on the hold out sample. Over time this model did less well than one that did worse initially, but was consistent with a reasonable causal theory and performed far more consistently over time without need for tweaking.

A machine learning method that would impress me is one that could extract the true causal process from data with a known causal process such that a linear model would be generated for a linear causal process and a multiplicative model would be generated for a known multiplicative causal process, etc. These examples may already exist and I would love a couple of links to these examples if they do.

]]>For our customers, they care more about the results than the inference algorithm + statistical model we use (or in ML terms, “algorithms”). In smaller data settings, we’ve observed problems with estimating parameters using the traditional inference algorithms, which have simplifying assumptions built in.

]]>Ok, I’ll bite. What can you then do with statistical information produced when you have no theoretical underpinning by which to interpret the results?

I didn’t realize I was saying something controversial.

You are just looking for the model to perform well (make good predictions, correct classifications, good decisions, whatever) on new data, that’s all you care about.

In my experience, when people argue they have no theory relative to the interpretation of a machine learning algorithm, they usually do, or they simply don’t want to have to defend it relative to the assumptions of a model, or they really don’t understand the theory on which their classification algorithm does in fact rely to be usefully applied. Thus, the analogy above.

You seem to be mixing up two steps, but I’m not sure. You train a machine learning algorithm (1) to generates a model(2) that takes features and gives output. Ie features -> model -> output. The details of the model that gets generated are not really of interest. Of course people know, at least in a general sense, the steps used by the machine learning algorithm to come up with the model.

I would say if someone wrote software that would automatically try out various Stan models and choose the “best” in some way, then that would be machine learning.

]]>In my experience, when people argue they have no theory relative to the interpretation of a machine learning algorithm, they usually do, or they simply don’t want to have to defend it relative to the assumptions of a model, or they really don’t understand the theory on which their classification algorithm does in fact rely to be usefully applied. Thus, the analogy above.

]]>Machine learning is techniques like classification and regression trees (CARTs) or neural networks, where you make as few assumptions about the data as possible and attempt to optimize your out-of-sample predictive skill/accuracy without caring much about your model parameters or functional form.

]]>;-)

]]>Most of the people doing “machine learning” are doing very different stuff than Stan models (or that’s my impression).

]]>I think the reason to put the focus on machine learning is not that it is a good description of Bayesian inference in drug development models (let alone equivalent). The reason is that machine learning is “sexy”. But at least thery are not doing Bayesian inference on the blockchain!

]]>So with my ignorance of machine learning, I was wondering if machine learning was code for Bayesian multilevel modeling or if there was something more going on. Likely there is something more, but I was curious, which was why I commented and asked what I was missing.

]]>“This type of software is generally referred to as “clinical decision support” (“CDS”) software. “CDS software” is loosely defined as an application that analyzes data to help health care providers make clinical decisions. Artificial intelligence, such as machine learning algorithms, may fall into this category when used for a health purpose. Per the Cures Act, whether CDS software is excluded from FDA’s jurisdiction depends on, among other things, the ability of the health care professional using the software to independently review the basis for a clinical recommendation.” https://www.lexology.com/library/detail.aspx?g=cf6351a3-a944-4468-9214-d141a689955e

]]>
how “machine learning” is different from Bayesian inference

…

“machine learning is a synonym for Bayesian [multi-level] models

I’d be interested to know where this idea came from.

]]>My guess is that Daniel and crew will be doing (approximate?) Bayesian inference and will be careful with calibration, so they won’t fall into the standard ML trap described in the blog post.

]]>As someone in a field that is starting to approach advanced statistical methods (Bayesian inference, machine learning, agent-based modeling), I have gotten into some … discussions … about how “machine learning” is different from Bayesian inference (my intuition is that in most applications are the same process but people justifying it from different backgrounds). Seeing this pitched as “statistical machine learning” via Stan models sounds like Daniel’s taking the same approach (that I’ll strongly word as something like “machine learning is a synonym for Bayesian [multi-level] models” – note I’m the one saying this, not Daniel).

Am I missing a significant part of the argument?

]]>