Polynomials can approximate the function in [a,b] but we may know that the process itself has an upper and lower bound f(q) < H and f(q) > L

]]>Polynomials can approximate the function in [a,b] but we may know that the process itself has an upper and lower bound f(q) L, then polynomials are guaranteed to be an arbitrarily bad approximation outside [a,b] because they grow polynomially for large x.

Yes, given typical regularization assumptions, close to the interval [a,b] we can often do ok. And this applies for lots of cases (ie. estimate total sales next week from daily data for the last 5 years), but the regularity assumptions are themselves SUBSTANTIVE. There are lots of problem areas where very smooth assumptions don’t apply. For example, a feedback control system for a gas turbine engine. How rapidly can the temperature increase? Well, if your feedback control system opens the throttle to max maybe it can explode in 7 milliseconds? I don’t know. Can we tell the difference between a function that oscillates wildly up and down, and a function that is near constant and is measured with a lot of measurement error? Only via substantive assumptions about the underlying process we’re modeling.

]]>We’d like to say something about f(b-epsilon) and f(b+epsilon) but don’t have a data point there. Intuitively, it seems approximation error bounds for the interpolation should be better because we can approach b-epsilon from both b and x’, but b+epsilon only from b. If we assume nothing about f we can’t say anything, but any smoothness condition probably buys us lower approx error for b-epsilon. Not sure whether that can be made precise.

Still, it seems extrapolation shouldn’t be much more error prone than interpolation within a distance equal to the typical distance between points over which you interpolate.

]]>Now, suppose you have a real-world problem where you’ve measured f with errors in a region between [a,b]. The data tell you approximately where your function should be within the interval (ie. some measure of the errors between f(x) and the data d(x) shouldn’t be too big).

However, what happens for x values outside [a,b] ??

The ONLY way to solve this problem is to have substantive knowledge about how f(x) should behave as it goes outside the interval [a,b]. So for example, if we’re talking about the concentration of a typical drug, f(x) should go to zero as x goes to infinity and your body excretes or metabolizes the drug. The range of rates are determined by substantive information about biochemistry and metabolism. If we’re talking about sales, we know at least they shouldn’t go to infinity in finite time, because there is no such thing as infinite sales. We may know things like historical rates of change, and be able to apply some information about how fast things could change… etc etc. It’s SUBSTANTIVE knowledge that tells you what the behavior of your function should do outside the measured range.

The truth is, in many cases, it just doesn’t matter what you use to interpolate through a data rich region, lots of things work well. But, it’s absolutely impossible for a pure function approximation technique to determine how best to predict outside the range of the data.

]]>If you have no clue about the underlying dynamics, and you just run a huge cross validation race between parametizations, you are screwed. Unless, the data is very large (which here it isn’t), and you know for a fact (which you don’t) that the underlying DGP is stable and properly sampled by the available data.

]]>looks like the choice was completely arbitrarily. They would say: “now we will fit a curve to the data using multivariate adaptive regression splines.” But nowhere it’s explained why he used such a method instead of, let’s say, kernel regression or Fourier analysis or a neural network.”

Yes — explanation of why the method was used is so often missing; in many cases, probably because the author didn’t have a very good reason for the choice. This is a sad situation — authors need to think about why they are using a particular method (and several comments above have given advice on what needs to go into the choice) AND they need to explain why they made that choice. Without the reasoning and without the explanation of the reasoning, the result is scientifically just a house of cards.

]]>If you can’t figure out some substantive information to use that would be different in those different circumstances, you should move on to another field other than mathematical modeling and statistics.

]]>I like your suggestions but let me emphasize that, although you say “not get too far away from the data,” it would be more accurate to say “not get too far away from the data and your prior information.” The data are just a bunch of numbers. Issues such as seasonality, similarity across products, population growth, etc. . . . these all come from prior knowledge.

I’m not just being picky here; this is important because we spend lots of time trying to automate our procedures, either formally with algorithms or informally in textbooks. And when you look at formalizations of statistics, they tend to minimize prior information. Instead you’ll see models chosen entirely from features internal to the data such as sample size, data type (binary, counts, continuous), censoring, etc.

]]>1. Seasonality. Understand it. Get rid of it — in particular, it will make step 2 easier.

2. Trend: do you have any? Is it similar across products? Does it change a couple of times in the backdata?

2a. If there’s a trend, what seems to drive it? Is it population growth, number of diesel vehicles, or …? This isn’t so much a statistical search as a conceptual one, validated by statistics.

3. Start simple. See what you get with a Holts model, with cross-validation. See Hydman’s textbook for examples (and his R forecasting package)

Any way you slice it, you are assuming some model of the past can be extended into the future. That’s why skepticism about a forecast is always warranted.

]]>NY resolution: read more documentation.

]]>stan_gamm4(sales ~ s(time_trend, by = product), data = yourdata, random = ~(1 | product))

]]>To benchmark your other attempts, perhaps start with this simple model using the rstanarm package and see how you go:

model_fit <- stan_lmer(sales ~ time_trend+ (1 + time_trend| product) + ((1 + time_trend| week/mon of year)), data = yourdata)

Where “time_trend” might be some simple transforms (say, a Box-Cox trend or a logistic curve–some visualisation of your data might help) of a linear trend, to capture basic non-linearities. As Kenneth mentioned, you might want to include any exogenous factors that you think affects sales. Or things like advertising spend.

The issue with this model is that your random effects mightn’t be exchangeable. Ask the question: would knowing the value of the time trend (or the exogenous factors, or ad spend) provide information about the week/month of the year? If so, you’ll need to make a correction a la Andrew’s paper with Bafumi.

]]>When you say “demand for a pruduct,” you mean (1) people spend (2) money on this (3) stuff. That suggests you should be thinking about (1) demographics, (2) economics, and (3) whatever is special about your product(s). Any one of those has to be a better X than the number on the calendar.

For example, demographics: Do kids buy this? Do people with kids buy it? Single? Elderly? All of those are highly predictable, and you must know who is buying the stuff.

Economics: Do you buy more of these if you’re feeling optimistic about your future income, or fewer?

The product: If you have one, are you more or less likely to buy another one? Do you consume them, or do you keep them, and for how long? Does somebody advertise this stuff?

The moral here is the same as carpentry: Think first, pick up the tool afterwards.

]]>OTOH, there may be good reasons to think that seasonal variations will be similar next year to this year. So making seasonal corrections would likely be helpful.

Demand for a product can be affected by things out of your control, like fashion or weather. If you can find some good correlates like that, you can track them as sentinels to warn that your extrapolations may start going off course.

]]>Check out the SuperLearner literature. e.g. https://cran.r-project.org/web/packages/SuperLearner/index.html

(I’m not endorsing it–haven’t studied it enough to–but this is exactly what they are advocating)

There is plenty of theoretical literature that can be used rather than “curve fitting.” If these are comparable to many consumer durable products, Bass diffusion models are a good place to begin. If they are new products and are still on the rapid diffusion part of the cycle, then a good fitting curve will soon perform very badly, since the slope is likely to flatten pretty soon. On the other hand, if there appears to be a strong seasonal pattern with some stable trend, then more traditional short-term forecasting models would seem appropriate (decompose the series into seasonality and trend and fit some model to the remaining random fluctuations). That brings up another important question – what time frame do you want the forecast to cover?

Thinking about all this makes me believe the initial question was posed wrongly. There is no simple answer to the question “what method to use?” At some point the question will become technical, as in “what curve fits best and how do I measure that?” But long before you get to that point are a number of (in my mind) more important questions about the nature of what you are forecasting, how stable the underlying dynamics are (e.g., is a disruptive technology just over the horizon?), what the purpose of the forecast is, etc. etc. Without worrying about these important issues there can be no simple answer to the question of what curve to fit to the data.

]]>