Generable: They’re building software for pharma, with Stan inside.

Daniel Lee writes:

We’ve just launched our new website.

Generable is where precision medicine meets statistical machine learning.

We are building a state-of-the-art platform to make individual, patient-level predictions for safety and efficacy of treatments. We’re able to do this by building Bayesian models with Stan. We currently have pilots with AstraZeneca, Sanofi, and University of Marseille. We’re particularly interested in small clinical trials, like in rare diseases or combination therapies. If anyone is interested, they can reach Daniel at [email protected]

I’ve been collaborating with Daniel for many years and I’m glad to hear that he and his colleagues are doing this work. It’s my impression that in many applied fields, pharmacometrics included, there’s a big need for systems that allow users to construct open-ended models, using prior information and hierarchical models to regularize inferences and thus allow the integration of multiple relevant data sources in making predictions. As Daniel implies in his note above, Bayesian tools are particularly relevant where data are sparse.

37 thoughts on “Generable: They’re building software for pharma, with Stan inside.

  1. Sounds like a great project!

    As someone in a field that is starting to approach advanced statistical methods (Bayesian inference, machine learning, agent-based modeling), I have gotten into some … discussions … about how “machine learning” is different from Bayesian inference (my intuition is that in most applications are the same process but people justifying it from different backgrounds). Seeing this pitched as “statistical machine learning” via Stan models sounds like Daniel’s taking the same approach (that I’ll strongly word as something like “machine learning is a synonym for Bayesian [multi-level] models” – note I’m the one saying this, not Daniel).

    Am I missing a significant part of the argument?

    • how “machine learning” is different from Bayesian inference

      “machine learning is a synonym for Bayesian [multi-level] models

      I’d be interested to know where this idea came from.

      • Bon mot aside, I specifically asked the question here because the released statement mentions that “Generable is where precision medicine meets statistical machine learning” and then goes on to say “We’re able to do this by building Bayesian models with Stan.”

        So with my ignorance of machine learning, I was wondering if machine learning was code for Bayesian multilevel modeling or if there was something more going on. Likely there is something more, but I was curious, which was why I commented and asked what I was missing.

        • According to the wikipedia “Machine learning is a field of computer science that uses statistical techniques to give computer systems the ability to “learn” (i.e., progressively improve performance on a specific task) with data, without being explicitly programmed.”

          I think the reason to put the focus on machine learning is not that it is a good description of Bayesian inference in drug development models (let alone equivalent). The reason is that machine learning is “sexy”. But at least thery are not doing Bayesian inference on the blockchain!

        • Our hyperconverged statistical blockchain solutions will accelerate your disease target discovery process through the power of applied AI. By harnessing the wasted computing power of all our Ride-Sharing customers’s cell phones and enabling our customers to get reduced cost rides by joining the gig economy as nodes in our distributed computing platform!

        • Accelerating is clearly not enough, it needs to be a disruptive paradigm shift. So perhaps “Our hyperconverged statistical blockchain solutions are a disruptive paradigm shift for your disease target discovery process and virtual drug development through the power of applied AI. We catapult into the digital economy 6.0 by harnessing the wasted computing power of all our Ride-Sharing customers’s cell phones and enabling our customers to get reduced cost rides by joining the gig economy as nodes in our distributed computing platform!”
          Note that you need to go to something that is x.0. Because everyone has gone from 2.0 to 4.0 at the moment, we need to show how far we are ahead by incrementing to 6.0.

        • I would say that Machine Learning is the new hotness and so they are trying to squeeze Stan under that umbrella to get in to the backstage entourage.

          Most of the people doing “machine learning” are doing very different stuff than Stan models (or that’s my impression).

        • I was just trying to get some context to give me an idea of where to start. I personally would not even consider Stan, or any method where you specify the form of a model, to be machine learning. I guess that is ok though.

          Machine learning is techniques like classification and regression trees (CARTs) or neural networks, where you make as few assumptions about the data as possible and attempt to optimize your out-of-sample predictive skill/accuracy without caring much about your model parameters or functional form.

        • Ok, I’ll bite. What can you then do with statistical information produced when you have no theoretical underpinning by which to interpret the results? Would it be analogous to trying to increase peoples weight by making them taller?

          In my experience, when people argue they have no theory relative to the interpretation of a machine learning algorithm, they usually do, or they simply don’t want to have to defend it relative to the assumptions of a model, or they really don’t understand the theory on which their classification algorithm does in fact rely to be usefully applied. Thus, the analogy above.

        • Ok, I’ll bite. What can you then do with statistical information produced when you have no theoretical underpinning by which to interpret the results?

          I didn’t realize I was saying something controversial.

          You are just looking for the model to perform well (make good predictions, correct classifications, good decisions, whatever) on new data, that’s all you care about.

          In my experience, when people argue they have no theory relative to the interpretation of a machine learning algorithm, they usually do, or they simply don’t want to have to defend it relative to the assumptions of a model, or they really don’t understand the theory on which their classification algorithm does in fact rely to be usefully applied. Thus, the analogy above.

          You seem to be mixing up two steps, but I’m not sure. You train a machine learning algorithm (1) to generates a model(2) that takes features and gives output. Ie features -> model -> output. The details of the model that gets generated are not really of interest. Of course people know, at least in a general sense, the steps used by the machine learning algorithm to come up with the model.

          I would say if someone wrote software that would automatically try out various Stan models and choose the “best” in some way, then that would be machine learning.

        • I apologize if my response was not directed at your comment and if I was instead continuing an argument I’ve had with others on this topic. In these discussions, I typically find it difficult to draw solid line distinctions between machine learning and model development/testing and often find the justifications for ignoring distributional, sampling, and measurement assumptions to be less than convincing.

          Let’s consider a couple of examples and begin with yours which I agree is the least controversial, but also one I would be more likely to categorize as code automation rather than machine learning, but it is certainly not uncommon to have two words for the same thing and even overlapping concepts that create a unique category. Perhaps machine learning is aptly described as the iterative processing of solutions to equations, but I had assumed there was more to it.

          Now let’s consider clustering algorithms from basic k-means to gradient boosting methods. In the examples I’ve seen, the gradient boosting methods clearly outperformed the k-means by substantial margins on hold-out samples. The results were not even close. What I have yet to see is the replication of this performance on future data when the models and parameters are not grounded in features that are consistent with a reasonable causal theory. I’ve worked on projects where predictive models were optimized with every variable that increased the classification rate on the hold out sample. Over time this model did less well than one that did worse initially, but was consistent with a reasonable causal theory and performed far more consistently over time without need for tweaking.

          A machine learning method that would impress me is one that could extract the true causal process from data with a known causal process such that a linear model would be generated for a linear causal process and a multiplicative model would be generated for a known multiplicative causal process, etc. These examples may already exist and I would love a couple of links to these examples if they do.

        • @Uncertain Archaeologist, for the time being, the models are indeed multilevel models with full Bayesian inference implemented in Stan. We’re planning to do some more in the future… I’ll just leave it at that for now.

          For our customers, they care more about the results than the inference algorithm + statistical model we use (or in ML terms, “algorithms”). In smaller data settings, we’ve observed problems with estimating parameters using the traditional inference algorithms, which have simplifying assumptions built in.

      • So far we have not used anything other than HMC, but we are not allergic to using approximations when warranted and when we have a way to evaluate the quality of the approximations. Our main focus is on specifying good generative models and making them usable by non-specialists; the choice of an inference algorithm is somewhat context dependent.

    • Zad – were you think of this aspect?

      “This type of software is generally referred to as “clinical decision support” (“CDS”) software. “CDS software” is loosely defined as an application that analyzes data to help health care providers make clinical decisions. Artificial intelligence, such as machine learning algorithms, may fall into this category when used for a health purpose. Per the Cures Act, whether CDS software is excluded from FDA’s jurisdiction depends on, among other things, the ability of the health care professional using the software to independently review the basis for a clinical recommendation.” https://www.lexology.com/library/detail.aspx?g=cf6351a3-a944-4468-9214-d141a689955e

  2. Just to give the debate some context, what would you do to reduce “false negatives” if your loved one were in an urn of cancer patients, a potential silver bullet was available, and you had to choose (your loved one having been drawn) whether he/she ought to get the treatment? https://www.nature.com/articles/d41586-018-03862-6

    P.S. It’s a finite universe and someone has to tend the boilers so “Everybody gets 100 meter yachts and a winter berth in Monaco!!!” isn’t a valid answer.

    • These decisions should be made based on Bayesian decision theory and at least an attempt at a predictive model and a utility function. It’s not easy but will result in far better outcomes on average than the kind of thing usually done which is some set of obscure rules

  3. Continued from here.

    I’ve worked on projects where predictive models were optimized with every variable that increased the classification rate on the hold out sample. Over time this model did less well than one that did worse initially, but was consistent with a reasonable causal theory and performed far more consistently over time without need for tweaking.

    This sounds like info was being leaked from the hold out set into the model. Methods like gradient boosting work so well that if you just check the performance on a hold out set multiple times using different hyperparameters, features, etc (ie, “tweaking”) then you will overfit to that holdout set. You do not need to explicitly feed the data into the model for it to be trained/influenced by it. The simple model was just less flexible so it could not be overfit as easily.

    A machine learning method that would impress me is one that could extract the true causal process from data with a known causal process such that a linear model would be generated for a linear causal process and a multiplicative model would be generated for a known multiplicative causal process, etc.

    I’m not really sure what you are looking for because it almost sounds trivial. Wouldn’t you just want to include features that are linear or multiplicative combinations of the base features?

    A small project that may give some intuition is training a neural network to “add” a sequence of numbers.

    To start with, generate a bunch of integer sequences (equal length will be easiest, you can pad with zeros…) to use as the features and then set the sum of each sequence as the target. You should see a simple network can “learn” to give correct answers within the domain of training examples, but when it fails it will be obvious that whatever it learned was not actually the rules of addition. There are then variations of how you can transform the input or choose an architecture that can improve on this. I thought whenever I did this I got it from a blog post somewhere but I am not finding it now, sorry.

    • Anoneuoid:

      I would agree that it is trivial for a human that truly understands the real causal process to generate the correct types of variable features of the world to reasonably replicate it, but not for a machine given a bunch of unstructured data with almost no guidance. This is sort of my point with the hype around machine learning, AI and GUI interfaces that we are told can do machine learning for us as if the thinking about a problem can be essentially turned over to the machine and that thinking critically about the problem is no longer necessary, just pick the 1 of 1000 models generated that best fit the data. Now design an intervention to exploit this finding, never mind that it could as easily be associative as causal. That said, the organization and structuring of linear or multiplicative data comes with a boat load of assumptions often ignored and iterative methods of solving equations have been around for a very long time with the only difference in 2018 being the amount of computing storage and processing power that can be brought to the problem by the average modeler.

      Perhaps there will be a time when the entire process can be turned over to the machine and it will spit out the correct solution without need for guidance or defense of assumptions and I will thus be pedantically satisfied with the term “machine learning”, but it is currently one of those terms to which I have a negative reaction given how it is often presented.

      • I would agree that it is trivial for a human that truly understands the real causal process to generate the correct types of variable features of the world to reasonably replicate it, but not for a machine given a bunch of unstructured data with almost no guidance.

        I meant that automatically generating features that are sums and products of others is a pretty basic feature generation technique. If you are at that point you are already well past “a bunch of unstructured data”. So I’m not sure what you meant by additive or multiplicative processes (this made me think of normal vs lognormal distributions, which I didn’t really see the relevance of).

        just pick the 1 of 1000 models generated that best fit the data

        Actually, you usually don’t want to just pick one and trying out only 1000 possible models would be really low:
        https://en.wikipedia.org/wiki/Ensemble_learning

        I know that sounds like nitpicking, but there are a number of details in what you say that are kind of “off”, indicating you don’t really “get it”. It sounded like you had a bad experience once because you weren’t familiar with the pitfalls of data leakage and also for some reason tried not to incorporate any theoretical understanding of the features into your pipeline (ie via feature generation/selection/transformation).

        Anyway, I continue to stand by my original definition:

        Machine learning is techniques like classification and regression trees (CARTs) or neural networks, where you make as few assumptions about the data as possible and attempt to optimize your out-of-sample predictive skill/accuracy without caring much about your model parameters or functional form.

        • I am saying that the assumptions implicit in the data, data structures, and models themselves are often ignored by their proponents . Just as you seem to be doing in your explanations.

        • If you’re referring to the fact that when modeling a person’s health you rarely collect data on their choice of sock color but often collect data on say red blood cell count, then I agree with you. The assumption “the data I’m feeding you is relevant to the causal process” is actually an important and unrecognized assumption, as is the assumption “The outcome measures I am feeding you are actually relevant to the decision making / point of my research” which is a big problem for much of the psych research criticized on this blog. For example power-posing apparently if anything affects only the reported “feelings of power” not the originally claimed testosterone levels, and how relevant is “reporting feelings of power on a psych survey” to actual daily outcomes of interest, like career trajectory, earnings, etc?

          The assumptions underlying Machine learning are at least that the data collected is relevant to the real world question of interest.

        • Yes, agreed. And there is no aspect of reality that can be observed and recorded that is context free and therefore assumption free, hence the persistence of the unmeasured causal variable problem. In my experience, strong proponents of machine learning seem to assume away the conditions for the emergence of the current data without presenting a strong argument that this is justified. If we ignore those conditions we are making untested and unexamined assumptions about the data generating process.

        • when modeling a person’s health you rarely collect data on their choice of sock color

          I’m sure there is some correlation between health and sock color. For example, is it as easy to find diabetic socks in the same colors as normal socks? If not that may give you info about diabetes status. If you also have diabetes diagnosis as a feature then whether or not the person with diabetes has goofy colored socks could indicate whether they are wearing the specialized socks.

          That’s just one way, I’m sure there are others. Unless we’re running out of space/time, I’d say throw it in if its available.

        • Well, the entire goal is to be able to ignore all the assumptions about the relationship between input and output. Once again, by talking about ignoring assumptions as a bad thing you seem to miss the point. Dealing with all that is an expensive, annoying, and time consuming task so whatever can be automated should be. However, if including additional info improves performance at a reasonable cost that should definitely be done.

          I also still suspect you mean something different than usual when talking about a “model” in this context.

        • And what I am saying is that you are missing the point if you think you can ignore those assumptions without significant costs as well.

        • Ok, I don’t see how this makes my original definition incorrect. Anyway though, we can use that as a starting point. So far your evidence for this seems to be “that one time we overfit to a holdout dataset”.

          I mean I certainly don’t expect the ML approach to be automatically figuring out new universal laws any time soon, we would be losing that ability by completely relying on it. But compare it to what is actually going on in these fields with “weak theory” where that isn’t going to be happening anyway.

          They are playing around with a linear model until they get a statistically significant coefficient for variable A (“A is linked to X”). Then someone publishes another study where this happens for variable B (“A and B are linked to X”), then its discovered variable C interacts with variable A and B, etc.

        • I should also say that I am certain that most of what is getting published about applying “AI” to an area like medicine is people just overfitting their CV results (pretty much the same pitfall you described coming across). It is the new p-hacking.

Leave a Reply to Curious Cancel reply

Your email address will not be published. Required fields are marked *