How smartly.io productized Bayesian revenue estimation with Stan

Markus Ojala writes:

Bayesian modeling is becoming mainstream in many application areas. Applying it needs still a lot of knowledge about distributions and modeling techniques but the recent development in probabilistic programming languages have made it much more tractable. Stan is a promising language that suits single analysis cases well. With the improvements in approximation methods, it can scale to production level if care is taken in defining and validating the model. The model described here is the basis for the model we are running in production with various additional improvements.

He begins with some background:

Online advertisers are moving to optimizing total revenue on ad spend instead of just pumping up the amount of conversions or clicks. Maximizing revenue is tricky as there is huge random variation in the revenue amounts brought in by individual users. If this isn’t taken into account, it’s easy to react to the wrong signals and waste money on less successful ad campaigns. Luckily, Bayesian inference allows us to make justified decisions on a granular level by modeling the variation in the observed data.

Probabilistic programming languages, like Stan, make Bayesian inference easy. . . .

Sure, we know all that. But then comes the new part, at least it’s new to me:

In this blog post, we describe our experiences in getting Stan running in production.

Ojala discusses the “Use case: Maximizing the revenue on ad spend” and provides lots of helpful detail—not just the Stan code itself, but background on how they set up the model, and intermediate steps such as the first try which didn’t work because the model was insufficiently constrained and they needed to add prior information. As Ojala puts it:

What’s nice about Stan is that our model definition turns almost line-by-line into the final code. However, getting this model to fit by Stan is hard, as we haven’t specified any limits for the variables, or given sensible priors.

His solution is multilevel modeling:

The issue with the first approach is that the ad set estimates would be based on just the data from the individual ad sets. In this case, one random large $1,000 purchase can affect the mean estimate of a single ad set radically if there are only tens of conversion events (which is a common case). As such large revenue events could have happened also in other ad sets, we can get better estimates by sharing information between the ad sets.

With multilevel modeling, we can implement a partial pooling approach to share information. . . .

It’s BDA come to life! (But for some reason this paper is labeled, “Tauman, Chapter 6.”)

He continues with model development and posterior predictive checking! I’m lovin it.

Also this excellent point:

After you get comfortable in writing models, Stan is an expressive language that takes away the need to write custom optimization or sampling code to fit your model. It allows you to modify the model and add complexity easily.

Now let’s get to the challenges:

The model fitting can easily crash if the learning diverges. In most cases that can be fixed by adding sensible limits and informative priors for the variables and possibly adding a custom initialization for the parameters. Also non-centered parametrization is needed for hierarchical models.

These are must-haves for running the model in production. You want the model to fit 100% of the cases and not just 90% which would be fine in interactive mode. However, finding the issues with the model is hard.

What to do?

The best is to start with really simple model and add stuff step-by-step. Also running the model against various data sets and producing posterior plots automatically helps in identifying the issues early.

And some details:

We are using the PyStan Python interface that wraps the compilation and calling of the code. To avoid recompiling the models always, we precompile them and pickle them . . .

For scheduling, we use Celery which is a nice distributed task queue. . . .

We are now running the revenue model for thousands of different accounts every night with varying amount of campaigns, ad sets and revenue observations. The longest run takes couple minutes. Most of our customers still use the conversion optimization but are transitioning to use the revenue optimization feature. Overall, about one million euros of advertising spend on daily level is managed with our Predictive Budget Allocation. In future, we see that Stan or some other probabilistic programming language plays a big role in the optimization features of Smartly.io.

That’s awesome. We used the BSD license for Stan so it could be free and open source and anyone could use it inside their software, however they like. This sort of thing is exactly what we were hoping to see.

33 thoughts on “How smartly.io productized Bayesian revenue estimation with Stan

  1. > We used the BSD license for Stan so it could be free and open source and anyone could use it inside their software, however they like.
    An important clarification: While Stan is BSD, PyStan and RStan are GPL3, which is a more restrictive open source license. If you need the more permissive BSD license (because CopyLeft can be annoying or infeasible in lots of cases), CmdStan is the tool for you!

    • We’re sort of stuck with RStan because R itself is copylefted (GPLv3).

      To quote from Allen Riddell on issue #382 of PyStan’s GitHub repo, “PyStan 3 will be ISC licensed. PyStan 2 is derived from GPL’d RStan code. It therefore uses the GPL.” The good news is that PyStan 3 is well underway and should be out soon (for some value of “soon”).

      We’re continuing to roll out new Stan 3 interfaces and features into Stan 2.x. Stan 3 will happen when we’re done and we’re ready to simplify by removing all the deprecated functionality—the major version number will change because it will break backward compatiblity.

  2. One can be overwhelmed in a blog written in a foreign language:
    “productized”
    “For scheduling, we use Celery which is a nice distributed task queue. . . .”
    “copylefted”
    ” deprecated functionality”

    • productize (v.) turn something into a product (this one’s just basic English, which lets you verb a noun)

      distributed task queue (n.) when you run things in parallel, something needs to manage the logistics of the different jobs. jobs are usually managed with a kind of queue that holds jobs and then performs processor allocation, failover, load balancing, etc. A distributed queue is one that is itself implemented on multiple processors.

      copyleft (n.) licenses like GPL that require any derived products (the definition of which is fairly involved) which are distributed (ditto) to be released under a compatible open-source license (like GPL or BSD

      deprecated functionality (n.) functionality that has been replaced and is awaiting decomissioning in a future release; it’s generally polite to deprecate a function with warnings on how to replace it before releasing a backward-compatibility breaking release.

      backward compatible (adj.) programs that you wrote in the last version still work in this version.

      • Andrew and Bob Carpenter:

        Of course I looked up the terms before I replied, but the only one which is not linguistically ugly is “Celery.” And, I hope my granddaughter doesn’t hang out with people who “verb” a noun such as “copyleft.” Trump, for all his linguistic failings and excessive employment of exclamation marks, approximately sounds as if English is his Mameloshn.

        • Paul:

          Sure, and the Queen of England might find our American dialect uncouth. There’s no easy answer here. If the author of the above-linked piece had written in a way that sounded clean to you, it could well have sounded labored and awkward to its intended audience.

        • +1 to that—see the Wikipedia article for more grammatical detail. The ones we grow up with seem fine, but new ones seem jarring sometimes. Nowadays, nobody can conjugate their way out of the overloaded “run” in “run for office” (verb) and “a run for office” (noun). And don’t get me started on gerunds, which turn the whole thing around by letting you noun a verb.

          This topic’s near and dear to my heart—the semantics component of my Ph.D. thesis was on a Davidsonian event semantics for verbs that could account for gerunds and their arguments (e.g., “Robin was running in the marathon” vs. “Robin’s running of the marathon”).

  3. I made a Stan model as one component of a project my company deployed on-premise at the Bank of England. We ended up writing a little CmdStan wrapper in python to kick off the analysis and consume the results.

    • Corey: I understand that some companies might want to distribute software which includes Stan code without revealing the modifications and what analysis is made, but are you allowed to tell why Bank of England wants to distribute software which includes Stan code without telling how the analysis is made? (two things: 1) sharing the code is required only if you distribute GPL licensed software, ie it’s not required if you don’t distribute, 2) why the analysis by Bank of England is secret?)

      • Aki, not sure if this applies here, but banks have lots of security concerns and often wouldn’t want their data shipped off site for analysis, so if they contract with a third party it can make sense for the third party to deliver a packaged “box” that runs on a sandbox network on site, consumes data and spits out results.

      • The company I work for, MindBridge Analytics, is the one distributing the software; we distributed it to the Bank of England. Ideally we would have used PyStan but GPL-licensed stuff is off limits. Daniel has the right idea about the security concerns except that he understates them in this case: the Bank of England is a central bank and acts as the regulator for a lot of financial activity in the U.K., and our product is intended to help them in that role, so it eats detailed economy-wide financial information.

        • Corey, thanks for the additional information. I know cases where I understand that GPL is off limits and I like to learn new cases where it is off limits. It seems I’m still missing something in this case, which might be something so obvious for you that you think I’m annoying asking these questions. GPL doesn’t affect data confidentiality. If Stan part needs to be non-GPL, for me it means you have modified Stan code and you don’t want to share that modified code with Bank of England. If security is important I would assume that they would like to see the code. What I’m missing here?

        • I’m once again not directly involved, but my guess would be that distributing code to a bank with a GPL license means that they can now take this code and turn it into their own product: modify and sell to other banks, etc. I suspect that’s more the issue than code confidentiality, as you say they may well want to see the code for security reasons, but seeing the code and being able to *redistribute it as their own service* are maybe 2 different issues.

        • I just do what my boss tells me. My understanding is that if we distribute software that includes a GPL component then the strong copyleft in the GPL requires us to make the source code of the entire product publicly available, which we don’t want to do. If PyStan had a weak copyleft license or a permissive license like Apache or MIT then it wouldn’t “infect” the rest of the distributed work.

          “One is only required to adhere to the terms of the GPL if one wishes to exercise rights normally restricted by copyright law, such as redistribution. Conversely, if one distributes copies of the work without abiding by the terms of the GPL (for instance, by keeping the source code secret), he or she can be sued by the original author under copyright law.”

        • Not a lawyer, but have spent a fair amount of time trying to understand copyright law as it applies to these kinds of projects. my understanding is that if PyStan is under the GPL and you *modify PyStan* or *derive* a work from PyStan, then you will have to distribute the code of your *PyStan* modifications or derivative works. If PyStan works by compiling your code and linking it into the Python interpreter, then once it’s linked in it’s probably “a derived work” though I don’t think the legality of this has been clearly and unambiguously determined in court.

          However, since Stan code itself is usable in other ways, it’s not clear that the Stan code comes under GPL. For example you send a “graph maker” software, it takes as input a Stan code, and a set of graph description instructions, and it compiles and loads the Stan code, runs the code, and generates graph outputs based on the draws created by the Stan code. The Stan code itself is a totally modular component, completely separate from the “graph maker” software, and distributed as a text file. You’d need to distribute the graph maker software under GPL, but not the Stan code or the “graph description files” that the graph maker takes as input.

          If on the other hand, you pre-compile the Stan code and link it in to the graph maker, and distribute this prepackaged non-modular thing… then at this point the whole shebang is GPL, since the Stan code isn’t a separate pluggable/replaceable component.

          again, I can’t stress enough that I’m not a lawyer or expert in this area, just someone who’s spent some time on figuring this stuff out.

        • My understanding is that if you distribute GPL software as a separate program which is not linked to your own software but you are just calling it, then GPL is *not spreading* to your software. Based on what I have read and not knowing specifics of your case, for me it seems so easy to do that separation, that I haven’t thought this as a problem for PyStan unless PyStan itself is modified. Although now as you say “I just do what my boss tells me”, I realized that it can be much easier to use CmdStan instead of 1) finding out what is and what is not allowed in case of GPL, 2) taking care that during the rest of software lifetime there is no such changes that would affect how GPL software is used, and 3) convincing everyone in the decision process that 1+2 is possible.

        • It’s more like: I’m not getting paid to do any of that, and other people *are* getting paid to know what’s safe and unsafe in the world of code re-use. It wasn’t even my job to write the CmdStan wrapper in python; another dev took that on.

        • You can distribute your .stan file standalone under any license you want. Or you could choose to not distribute it at all, but I think a lot of people would be disappointed that a *central bank* would not make its models publicly available (even if it has to keep some data that goes into the model confidential).

        • that a *central bank* would not make its models publicly available

          My guess is this is WAY above Corey’s pay grade ;-) Specifically this seems like a national political issue, as is for example the way in which the US Federal Reserve sets its interest rate targets and its metrics for inflation and poverty measures and reserve requirements and etc etc.

  4. “We used the BSD license for Stan so it could be free and open source and anyone could use it inside their software, however they like.” For the use described, I don’t understand why the GPL license would be a problem. You can build a commercial service using GPL software. That’s within the bounds of the license. However, if you made changes and published a new, improved binary version of Stan then the GPL would compel you to release your improvements in source form as well. So the fact that Stan is BSD licensed seems to make no difference in the situation described here.

    The “advantage” of BSD licensing over GPL is that commercial companies can take BSD licensed code, improve it, and publish binaries *without* releasing source code. That doesn’t seem like an advantage to me. I’d rather see Stan protected by a copyleft license. If there really is a good case for BSD licensing Stan then whoever wants can appeal to the Stan copyright holders and request an exemption. So it’s easy to go from GPL to BSD, but as it is now, Stan is already BSD so the cat is out of the bag.

    The Affero version of the GPL (https://www.gnu.org/licenses/agpl-3.0.en.html) would be a different story though. The Affero GPL goes beyond the GPL and requires that commercial *services* provide complete source code.

Leave a Reply

Your email address will not be published. Required fields are marked *