Died in the Wool

Garrett M. writes:

I’m an analyst at an investment management firm. I read your blog daily to improve my understanding of statistics, as it’s central to the work I do.

I had two (hopefully straightforward) questions related to time series analysis that I was hoping I could get your thoughts on:

First, much of the work I do involves “backtesting” investment strategies, where I simulate the performance of an investment portfolio using historical data on returns. The primary summary statistics I generate from this sort of analysis are mean return (both arithmetic and geometric) and standard deviation (called “volatility” in my industry). Basically the idea is to select strategies that are likely to generate high returns given the amount of volatility they experience.

However, historical market data are very noisy, with stock portfolios generating an average monthly return of around 0.8% with a monthly standard deviation of around 4%. Even samples containing 300 months of data then have standard errors of about 0.2% (4%/sqrt(300)).

My first question is, suppose I have two times series. One has a mean return of 0.8% and the second has a mean return of 1.1%, both with a standard error of 0.4%. Assuming the future will look like the past, is it reasonable to expect the second series to have a higher future mean than the first out of sample, given that it has a mean 0.3% greater in the sample? The answer might be obvious to you, but I commonly see researchers make this sort of determination, when it appears to me that the data are too noisy to draw any sort of conclusion between series with means within at least two standard errors of each other (ignoring for now any additional issues with multiple comparisons).

My second question involves forecasting standard deviation. There are many models and products used by traders to determine the future volatility of a portfolio. The way I have tested these products has been to record the percentage of the time future returns (so out of sample) fall within one, two, or three standard deviations, as forecasted by the model. If future returns fall within those buckets around 68%/95%/99% of the time, I conclude that the model adequately predicts future volatility. Does this method make sense?

My reply:

In answer to your first question, I think you need a model of the population of these time series. You can get different answers from different models. If your model is that each series is a linear trend plus noise, then you’d expect (but not be certain) that the second series will have a higher future return than the first. But there are other models where you’d expect the second series to have a lower future return. I’d want to set up a model allowing all sorts of curves and trends, then fit the model to past data to estimate a population distribution of those curves. But I expect that the way you’ll make real progress is to have predictors—I guess they’d be at the level of the individual stock, maybe varying over time—so that your answer will depend on the values of these predictors, not just the time series themselves.

In answer to your second question, yes, sure, you can check the calibration of your model using interval coverage. This should work fine if you have lots of calibration data. If your sample size becomes smaller, you might want to do something using quantiles, as described in this paper with Cook and Rubin, as this will make use of the calibration data more efficiently.

15 thoughts on “Died in the Wool

  1. The question makes me wonder whether there are some models being assumed – and ones that may not be warranted. The 1,2, and 3 standard deviation bands sounds like normal distributions may be assumed for the returns. That is probably unwise, given that the lack of fatter tails virtually rules out “black swans” or other rare events. More importantly, there are readily available tools to estimate more fully what the returns distribution might look like. The question sounds like only summary data (like the mean and standard deviation) is being estimated rather than the whole distribution. I think it makes more sense at least to fit distributions to the historical data before resorting to summary measures.

    • “There are no populations in finance”

      I think that’s true in several senses, but it’s so far from the finance world view… really problematic to describe to people who are working in a stochastic-calculus or iid returns world-view.

      In particular, the finance industry has that whole “historical performance is not a guarantee of future performance” marketing shtick… but then goes on to treat different times as if they were random draws from a common pool of events. Although there is some merit to learning how things behave from past events, at any given time there is always the possibility that the financial behavior of the world will change dramatically… A politician takes office and suddenly pushes totally different policies, a natural disaster takes place and knocks out sources of fuel or power plants or manufacturing facilities, a broad scandal erupts in financial accounting practices affecting hundreds of companies… there’s no reason why the future needs to work like the past.

  2. The traditional Black-Scholes style framework was developed assuming random walk/lognormal stock prices. If you have a drift plus volatility geometric Brownian motion, the volatility term will typically swamp the drift term over short intervals (if the parameters are calibrated to historical returns). It’s pretty well acknowledged that stock prices have much larger jumps than would be predicted by the lognormal, and that there’s some autocorrelation in the returns (“bull” and “bear” markets for example).

    Backtesting is usually based on modern portfolio theory/mean variance optimization. This framework ignores higher moments (3rd and above) entirely and assumes that volatility alone captures the risk. Yet in financial data you will routinely see movements that are statistically nearly impossible. And these aren’t even “black swans” because we’ve observed this over and over. The “optimal” aggregate portfolio further requires estimating linear correlations between different assets, but the dependence isn’t stable or well explained by linear correlation.

      • The normal distribution is very thin tailed. Six standard deviations out is something like a one in a billion chance. But in finance things that should happen, say, once a century based on historical volatility show up far more frequently.

        A dramatic example would be the 1987 crash where the Dow dropped over 20% in one day. That was how many sigmas? Many examples in 2008 which also demonstrated the limitations of correlation matrices. The supposed “diversification benefit” based on low historical correlations proved illusory.

        • That doesn’t mean these events are ‘statistically nearly impossible’, it just means the statistical distribution is not iid normal.

        • They are nearly impossible in “this framework [which] ignores higher moments (3rd and above) entirely and assumes that volatility alone captures the risk”, which was the sentence preceding the “Yet in financial data…”. Of course, the statement doesn’t make any sense if it’s not conditional on some model.

Leave a Reply

Your email address will not be published. Required fields are marked *