Skip to content
 

Curve fitting on the web

We once collected the following data from a certain chemical process:

optical_density.png

The curve looks smooth and could be governed by some meaningful physical law. However, what would be a good model? There is probably quite a number of physical laws that would fit the observed data very well. Wouldn’t it be nice if a piece of software would examine a large number of known physical laws and check them on this data? ZunZun.com is such a piece of software, and it runs directly from the browser. After plugging my data in, ZunZun gave me a ranked list of functions that fit it, and the best ranked was the “Gaussian peak with offset” (y = a * e(-0.5 * (x-b)^2 / c^2) + d):

DepDataVsIndepData1243490136505.png

Number two was “Sigmoid with offset” (y = a / (1.0 + e(-(x-b)/c)) + d).

In all, ZunZun may help you find a good nonlinear model when all you have is data.

8 Comments

  1. I'm glad you found my hobby web site useful. I'm currently working on adding fit statistics for the fitted parameter estimations. Any suggestions or requests for the site?

    James

  2. Pascal PERNOT says:

    Nice fit, but from a chemical point of view, this will not be very helpful, in the sense that the fitting function is completely unrelated to chemical laws…

  3. bob says:

    do they return AIC, BIC, DIC, or CV error to help decide which models are good fits and which models are overfitting? is N-th order polynomial (where N=number of points) an option?

  4. Aleks says:

    James, excellent job! The single best suggestion is actually by Bob – AIC (fast) or cross-validation (less fast) would be a very useful addition, as I've had some problems when complex polynomials would be fitted to a small data set. AIC and CV penalize the number of parameters, so one needs more data to compensate for the complexity. If you do cross-validation, I would recommend performing several replications to increase stability.

    Pascal, well, fitting a simple function might help us identify the underlying laws easier than a mass of noisy data points. But I agree that functional dependency is not yet a law.

  5. OneEyedMan says:

    Do you consider it a problem that with a big enough library of functions, you can describe anything, essentially over fitting your data?

    If I recall correctly, fully describing a function without knowing it explicitly requires estimating its infinite derivatives, just as doing so for a probability distribution requires infinite moments.

    Is this just another moment in model making where you have to manage the reasonableness of the complexity of the model against the quality of the fit?

  6. Aleks says:

    OneEyedMan, a Bayesian would assign priors to different functions. Some functions would be a priori more likely than others. AIC, for example, is analogous to a prior in the sense that it favors models with fewer parameters.

    Sometimes, however, you need 'metadata' – the knowledge about the nature of your process, beyond the measurements themselves.

  7. Currently adding the fit statistics from

    http://scipy.org/Cookbook/OLS which include AIC.

    As soon as I finish the covariance matrix for nonlinear functions I'll add this work to the site (http://zunzun.com) and the BSD-licensed source code download (http://sf.net/projects/pythonequations).

    I expect to be done this week.

    James

  8. Wrapping up fit statistics now.