Skip to content

best algorithm EVER !!!!!!!!


Someone writes:

On the website you find a lot of material for Optimal (or “optimizing”) Data Analysis (ODA) which is described as:

In the Optimal (or “optimizing”) Data Analysis (ODA) statistical paradigm, an optimization algorithm is first utilized to identify the model that explicitly maximizes predictive accuracy for the sample, and then the resulting optimal performance is evaluated in the context of an application-specific exact statistical architec­ture. Discovered in 1990, the first and most basic ODA model was a distribution-free machine learning algorithm used to make maximum accuracy classifications of observations into one of two categories (pass or fail) on the basis of their score on an ordered attribute (test score). When the first book on ODA was writ­ten in 2004 a cornucopia of in­disputable evidence had already amassed demonstrating that statistical models identified by ODA were more flexible, transpar­ent, intuitive, accurate, par­simonious, and generalizable than competing models instead identified using an unin­tegrated menagerie of legacy statistical meth­ods. Understanding of ODA methodology skyrocketed over the next decade, and 2014 produced the development of novometric theory – the conceptual analogue of quan­tum mechanics for the statistical analysis of classical data. Maximizing Predictive Accu­racy was written as a means of organizing and making sense of all that has so-far been learned about ODA, through November of 2015.

I found a paper in which a comparison of several machine learning algorithms reveals that a classification tree analysis based on ODA approach delivers best classification results (compared to binary regression, random forest, SVM, etc.)

So far, based on given information, it sounds pretty appealing – do you see any pitfalls? – would you recommend it for using in data analysis when I want to achieve accurate predictions?

My reply: I have no idea. It seems like a lot of hype to me: “discovered . . . conucopia . . . menagerie . . . skyrocketed . . . novometric theory . . . conceptual analogue of quan­tum mechanics.”

But, hey, something can be hyped and still be useful, so who knows? I’ll leave it for others to make their judgments on this one.


  1. Anoneuoid says:

    I didn’t read the entire thing, but this is about ML classification techniques, and there is no mention of xgboost. It is probably not superior.

    • ODA replaces ML techniques. And GLM techniques. And legacy parametric methods.

      And ODA does things that no other statistical paradigm can do. This has been shown over and again. In hundreds of articles from dozens of labs on many continents. Over decades!

      ODA is by definition superior, if superior is the objective function. This is math (operations research) with an exact statistical cloak. There is no room for uniformed comments in math. And no true scientist makes uniformed comments about something as important as a new statistical paradigm! A real scientists walks the walk, reads and compares, works…

      • Anoneuoid says:

        It would be great if you created a report on where it beats xgboost on a binary classification task. Then send the code, etc to Andrew so we can discuss the outcome here.

        • There are hundreds of examples of ODA used with binary data. Compare whatever you wish however you wish. If you want to pay for the analysis, we are a statistical consulting company.

          • Mike says:

            I am also interested in a comparison between xgboost and ODA.

            Please work on such a report. If ODA is doing a better than xgboost, it would be a great advertisement for ODA. Please go for it. It’s your product, so it’s your job (not someone elses job).

        • Mike says:

          I am also interested in such a comparison. Please work on such a report. Then we can discuss more concrete and more fact-based. If ODA is better than xgboost, it would be a great advertising for ODA. It should be done by you, not by others (it’s your product).

          • “It should be done by you, not by others (it’s your question)”

            a. Select an example involving binary data that was analyzed by ODA.

            b. Construct the data set.

            c. Do whatever you wish. The ODA analysis was already done.

            END OF THREAD

            • Darn it!

              Mike, I should explain:

              I don’t use my personal finite time doing anything other than ODA. In gestalt I am interested in finding the model that presents the best combination of predictive accuracy (normed against chance) and parsimony, as indexed by the D statistic. However, models of different complexity may be appropriate based upon statistical power (an exclusion criterion) and theoretical clarity or pragmatic significance (inclusion criteria). I know that the best any present model can do is explicitly identified, so there is no guesswork. That is,

              1. If accuracy is defined as in the ODA paradigm, then in training analysis ODA will find the best model.

              2. If accuracy is defined as in the ODA paradigm, and (as in novometrics) if one is only interested in validity performance, then CTA software allows the operator to set either of two criteria: (a) find the best model that has identical training and jackknife (or any other validity criterion) performance; or (b) find the best model that has highest jackknife (or whatever) performance with experimentwise (or whatever) p<0.05 (or whatever). The software allows operator control of many constraints, there are ODA articles on this, and of course the book synthesizes the matter…

              I look in books or articles for data sets. When I find a data set, sometimes it is analyzed using XYZ method. So, I summarize the findings reported in the article using XYZ, and then re-analyze the data using ODA. If you have such a data set, we certainly could talk about a collaborative paper–I do this for fun, and to learn more about my trade, other methods, applied results. Right now I am a bit swamped–why I must return to work.

              If I find a data set that was not analyzed, I only use ODA to take a look.

              I can't do everything. I know, I tried, I failed…

  2. Torquemada in Training says:

    The British popular science magazine “New Scientist”, for the amusement and edification of its readers, keeps an eye out for the modern buzzwords snake oil salesmen depend on to dazzle potential customers. “Quantum” is the go-to adjective these days.

    • Andrew says:


      I won’t believe anything until it’s appeared in PPNAS with p less than .05.

    • Chuck Norris is so tough that he counted to infinity. Twice!

      While I enjoy humor, it actually has to be funny. With respect to new analytic paradigms, scientists whom I work with prefer the not-so-popular activity of reading the literature. It is tough, and usually not amusing–but it is enlightening! Reading is cheap–working the math is the way I taught my PhD students, back in the day.

      I don’t know a single scientist that pays attention to “popular – amusing – magazines” to guide their analytic thinking.

  3. “Whatcha talkin’ bout, ODA?”

    Often, when I read things I don’t understand, I still grasp that something is being said. In this case I do not.

  4. Jake says:

    The first post on compares ODA against something called “Yule’s Q” which after a bit of searching turns out to be a test for association in 2×2 tables, with no indication of why you’d use that instead of a good old chi-squared. It’s also not clear what the author actually did, the derived second table is a flipped copy of the first one.

    I also noticed in their analysis of the lamb birth data the author concludes that “the following model emerged: if lambs born in 1952=zero or one, then predict that lambs born in 1953=zero or one; otherwise predict lambs born in 1953=two.” Good job buddy.

  5. Ben Bolker says:

    From :

    As of January 13, 2015, and until further notice, only standard mail communications will initially be available to new individuals who wish to communicate with ODA LLC. This is due to the accelerating use of the internet for spam pollution. For all matters, please mail a letter to: […] Please include your eMail address in the letter. Once a legitimate letter has been received, an eMail address that may be used for all subsequent communications with ODA LLC will be eMailed to you.

    Crazy. (Must cut down on their new business a lot?)

    • Ed Freeman says:

      Can’t they use ODA to build a spam filter?

      • Andrew says:


        Indeed, just as Daryl Bem used precognition to decide which journal to send that famous paper of his.

        • Pleasant memories! I was an invited speaker (a junior undergraduate, IIRC) at a conference at which Daryl and Sandra were keynote speakers. IIRC, they were at Cornell? I ate dinner with them. I didn’t study attribution, so I really had/have no idea what Daryl did. But, I had a great conversation with Sandra–I was the dude who showed that her androgyny theory was perfectly represented as a 90 degree proper rotation of the normatively standardized instrumentality and expressiveness dimensions (and the Spence four-fold typology was an unreliable surrogate). That is, the principal components solution for the two variables (assuming r=0 as Sandra hypothesizes, or that r>0 otherwise). Solved that one using raw scores by accident, and found that for such a two-dimensional solution the value of the first eigenvalue is [1 + abs(r)]/2. The first *exact* statistical model (albeit of a hybrid measurement/individual-differences theory) in my career. Good ole times. I still use PCA, but not the kind one can get in any software package. Rob and I used it to create new weather factors (versus national weather service solutions) and predict precipitation and temperature anomalies. We identify the factors using linear programming, thereby avoiding paradoxical confounding. The right tool for the right job. Emphasis on both “rights”…

    • Yeah, Ben, it is crazy. :-) But, crazy also was two mad scientists working 74 combined man years on a new statistical paradigm, not getting paid. Starting the first truly free academic journal was pretty crazy (the “free access” journals that came later make authors pay).

      We spent our lives actually discovering something completely unique that addressed the problems of the old methods (e.g., see my “Reading and Understanding” books at APA with Larry Grimm, or hundreds of journal articles in indexed journals). Selling stuff is new to us–we have no tricks, no theories about selling, just experience in solving problems that other’s can’t solve. Our books have hundreds of examples of that, but it is math.

      But, there is a brand new “buy now” button on the web page! It is the first of many! I hypothesize that the more buy buttons, the happier consumers will be! Consumers! :-) I see the difference: in the old days we were interested in colleagues and money was not in focus–we were all go, no show. Today one must entice consumers, avoid actual conversation (set up eMail boxes and have people in other countries support them), and make as much money as possible. Today it seems only the latter is taken seriously! So who’s crazy, Ben?

  6. Shravan says:

    Also, what is academic social psychology (one of the members has that as his area)? Is there any other kind?

    • jrc says:


      I deserve no credit for this, because I could not have made it up if I wanted to. I just wanted to see what she was doing post-Harvard and crack a joke about it. But she is still at Harvard. Lucky for comedy though, she is still Amy Cuddy. Currently the top post on

      “When it comes to power posing, why stop with humans? Some people are using power posing ideas to help animals. One of the most unusual e‑mails I received was from a horse trainer named Kathy, who had been working for years on a project that ‘encourages horses to find intrinsically motivating behaviors as a means to both physical and mental rehab.’ “

      ….now wait for it:

      “In a way, this is the most convincing anecdotal evidence of all”

    • Yes, there is another kind.

      A matrix is still used by Professors in large academic programs to characterize the department, plan course rotations and classroom sizes, schedule research resources, identify recruiting needs, and so forth.

      In psychology, clinical, industrial/organizational, and academic are common rows describing professional foci. The columns are substantive foci such as community, social, learning, developmental, physiology, quantitative, educational, cross-cultural, health, military, sports, and other such areas (e.g., see the Divisional structure of the APA).

      This matrix helps highly-qualified graduate students find excellent (fit for them) “think tanks” (and helps think tanks find qualified students), obtain (award) fellowships, and so forth. Few programs today have the heterogeneity of past decades, and in many (perhaps most) there aren’t enough students or faculty in many programs to saturate even a small matrix.

  7. Shravan says:

    BTW, they have a software called MegaODA that can handle up to 3 *million* data points. Andrew, check it out. The software’s only 199 USD and you might learn something.

    • Jonathan (another one) says:

      Shravan, we need your best Mr. Evil finger in the corner of your mouth here.

    • Three million does not sound like a lot, I know. But, in reality, three million is a lot! In many classical phenomena N of this size are sufficient to detect “ecologically” rare phenomena—accidents, errors, diseases, interactions, tornadoes, etc. More complicated models, and/or analysis of even rarer phenomena require the most powerful computer—the brain of the analyst.

      We ran our first-ever *large* experimental MultiODA on a CRAY-2 (NCSA, Urbana). Exponential in N, the problem had a binary class (dependent) measure and three ordered attributes (independent variables) for N=39 (thirty nine), and it red-lighted the CPU forcing a cold boot.

      Years later we were able to solve MultiODA problems for uniform random data involving five attributes and N=1,000,000 in several CPU seconds using an IBM3060-400VF supercomputer (UI, Chicago).

      Today we get better nonlinear answers to problems involving four attributes and N=3,000,000 in CPU seconds using a 64-bit PC.

  8. anon says:

    “the conceptual analogue of quan­tum mechanics for the statistical analysis of classical data.”

    This could be big leap forward in the analysis of data, a quantum leap even

    • Regardless of whether or not this is a serious comment, this comment is EXACTLY what EVERY true scientist should imagine, hope, investigate. How many times a century is an entirely new PARADIGM discovered?

      • Corey says:

        I love this rhetorical question! Let’s see… Neyman-Pearson hypothesis testing and let’s put confidence intervals in there as well, let’s put likelihood and derived concepts — Fisher’s maximum likelihood, Wedderburn’s quasi-likelihood, Owen’s empirical likelihood, Nelder’s h-likelihood — in one bucket, let’s put all the variations on Bayesian foundations — de Finetti, Jeffreys, Savage, Cox’n’Jaynes, Wallace’s minimum message length — in one bucket and let’s stick maximum entropy methods in there as well, Wald’s statistical decision theory and derived concepts, Robbin’s empirical Bayes approach, Rissanen’s minimum description length approach, Valiant’s probably approximately correct (PAC) learning, I don’t know if Benjamini’s false discovery rate approach qualifies as entirely new but it’s pretty damn original and gave rise to a large varieties of novel methodologies so I’m counting it, Davies’s model-as-data-approximation approach, and my newest entrant, ODA — wouldn’t want to step on any toes! So that’s nine times a century, if that century is the 20th.

        • Andrew says:


          In all seriousness, I would put Bayesian data analysis (as expressed in our book) as a paradigm that is distinct from all the paradigms you listed just there, and at least as important as most.

          • Corey says:

            The judges will allow it! And may I just say that I wouldn’t have counted Robbins’s empirical Bayes had I not learned from a thing you wrote that I can’t locate just now that what Robbins had in mind was a lot deeper than just type II maximum likelihood.

        • Corey says:

          I make no claim that this list is exhaustive. Nine times is just a lower bound — but now that I count again I see ten entries in that list. Counting was never my strong suit…

        • We have different conceptualizations of a comprehensive paradigm, but your list is really cool. And funny! :-)

          Legacy had its chance, and it failed.

          Air, water, land, food, medicine, finance, peace, life quality–everything necessary for modern life is stressed.

          The Zeitgeist is change, a search for NEW directions. Including a search for predictive accuracy in conjunction with increasing accountability for errors.

          May the most accurate models win…

          • Shravan says:

            Fig 2 in this paper is absolutely spectacular!

            • I agree. It’s a tour de force. By itself it could provide the basis for a TED talk.

              I like how it shows that once you’re dissatisfied, you’re dissatisfied, period. No degrees. On the other hand, the things that *lead* to dissatisfaction are broken down into subtle subgroups. A distinction is made between “very poor” and “poor and better” waiting times and between “fair or worse” and “good or very good” courtesy. Why those particular breakdowns? There must be wisdom behind them.

              Next we come to the math. The percentages (44.2%, 39%, and 16.8%) add up to 100%, which suggests that they represent proportions of the whole, not of the subgroups. There are 285 patients in all (95+41+149). Of these, 41 had “very poor” waiting time. Within that category, 39% of the whole patient group–that is, 110.76 of the 285–reported dissatisfaction. This means that of the 41 patients with “very poor” waiting time, 110.76 were dissatisfied. Mysterious multiplicity! Maybe some of them were bearing twins and triplets while waiting.

              Or maybe the percentages are of the subgroups, not of the whole, and it’s just a coincidence that they add up to exactly 100%. In that case, 15.99 of the 41 patients with “very poor” waiting time are dissatisfied, in contrast with 25.032 of the 149 with “poor or better” waiting time and 41.99 of the 95 patients with “fair or worse” courtesy.

              But is the message here that if you’re dissatisfied with *any* part of the ER experience, you’re dissatisfied, period, and it doesn’t matter how great or how small your dissatisfaction? This would have to be elucidated in the TED talk.

              In the meantime, the p-values are impressive.

              • Your prior comments were good examples of flippant disregard.

                Here you are guilty of specifically attacking something clearly without having the required knowledge or experience in what you are talking about, and making incorrect assumptions that lead to an incorrect conclusion. These are two main problems mentioned as an enemy of science. This would be very easy to demonstrate to laymen, for example using a football example.

                THIS thread IS worth a TED talk. The figures would be tables: on the left the ignorant attack, on the right the truth. The subsections would be misbehavior categories. This thread is an automatic data machine, and properly disseminated may actually make a difference! Keep it coming!

                Youngsters–what kind of scientist to you want to be: educated and open-minded, or ignorant and closed-minded? Why do you youngsters think this is happening?

                Selecting people you believe are qualified reviewers may increase the quality of your reviews, and ultimately of your paper. If a qualified reviewer with good intentions disagrees with your paper, then listen! My first cross-cultural paper was reviewed by Harry Triandis, the great man in the field. He was very instructive, told me what books to read. He reviewed the revised paper, now a confirmatory study, and published it without revision. I and my colleagues learned a lot, and went on to publish lots of great cross-cultural research.

                The point is not to get an easy review, but to get a professional, competent review. People who hate and fear new methods, and know nothing about them but diss them anyway, are an impediment to progress… Washington isn’t the only swamp making life irrational…

            • This material is used to inform a panel of laymen (approved/picked by lawyer of the legacy company/statisticians) who made up the jury.

              Defense lawyers, as usual in such cases, parade their experts, who pontificate their formulas and espouse their self-ratings, and of course, lie.

              BUT, the jurors were not fooled. Really simple, crystal clear, completely obvious examples revealed the lies and harmful malpractice of the statisticians. When jurors *understand* they can cross-generalize.

              Among the top causes of mortality in the US is taking a prescribed medication. ALL safety analysis is *mandated* to be conducted by regression models. More and more really simple demonstrations are coming on-line that demonstrate that regression models are not at all accurate (here is a little article on regression, logistic is no better–there are many examples in indexed journals as well as in ODA journals, type logistic in the search box on the journal home page).


              Every late-night TV program at night is funded in part by teams of lawyers looking for people to sue companies that produce dangerous drugs, and the statisticians that gave the drug the green light.

              What could go wrong?

  9. Keith O’Rourke says:

    Not without impact in medicine and psych

    My guess is that is not that hard to make an academic career in statistics and avoid the statistical profession altogether …

    (In the land of the blind, the one eyed man is king?)

  10. Christian Hennig says:

    Here the first introductory chapter gives something of an overview:
    I don’t know more than this but from this it smells like overfitting, probably something like leave-one-out CV is used (it’s not mentioned there, just my guess) and the results optimized so that the resulting prediction error will be optimistic. This issue is not treated in the Introduction although, to be fair, one may guess that the authors have something to say about this (hype or not, there may be some intelligence in it).

    Also the class of models about which the optimization runs is not mentioned but would be interesting. It is mentioned that data are considered as either qualitative or ordered and methods are invariant against monotonic transformations, so probably it’s discrete models that cut ordered variables in a probably somewhat restricted way respecting the order and qualitative models in arbitrary ways.
    Sounds a bit like trying to find the optimal classification tree in the set of all possible trees. Oh, what to do about overfitting?

    Haha, if I wasn’t so lazy I could use some KNOWLEDGE for my posting but as it’s just a blog comment, I make it all up from the introduction that you can read, too.

    • Christian Hennig says:

      Probably I should have written “selection bias” instead of “overfitting”.

      • anon says:

        Dear Mr Yarnold,

        please leave a comment on CV procedure and issue of selection bias/ overfitting.

        • Over-fitting is one of the MAIN topics in the book that was attacked by ignorant jesters. It is a prime motivation for maximum-accuracy analysis. Read all about it…

          • Mike says:

            Is maximum-accuracy analysis leading to overfitting? (“best model” vs “generalizable model”)
            Please describe validation procedures for ODA/ CTA.

            We cannot read a whole book for understanding validation procedures of your method.

            • I appreciate your interest and love and respect your concern–it is perfect motivation to learn ODA. And, I understand that it seems like reading yet another entire book (that has almost no formulas) may be a daunting task. Especially if the book covers the same old crap covered in all the other books you ever read on the subject, and makes the same untenable assumptions, repeats the same methods and reaps the same deficiencies… But, dude–you are 300 pages away from the promised land! :-)

              Mike, the entire book is about correct fitting, the entire paradigm! I can’t re-write the book here. Perhaps a brief response will satisfy your request, I hope so. :-)

              The final Axiom of novometrics mandates replication/validation in order to estimate predictive accuracy–training results are not used as estimates. The most common validation methods are various jackknife, K-fold, Monte Carlo, bootstrap, hold-out, and multi-sample methods (AFAIK, only ODA software performs many of these methods for ALL statistical analyses). The novometric D statistic norms model quality as a function of accuracy and parsimony (I cited an article on this in another response in this thread–IMO it may address all of your concern in two pages–Theoretical aspects of the D statistic).

              These are described and used throughout the book. These validation methods are also discussed in a forest of other books and a sea of other articles. It is easiest to read the book, it covers all the bases.

              Training is for practice–validation is for real…

      • Wrong book! Math can’t be smelled, it must be read and then worked. One can’t know what one hasn’t learned or discovered–in speaking otherwise one can only expose their own prior failures and/or well-learned biases. :-)

        Is it constructive, funny, smart — or what — for anyone to describe perfectly — wait for it — the state of current affairs — wait some more — and the antithesis of the new paradigm — wait a bit longer — and they have it c-o-m-p-l-e-t-e-l-y backwards?

        In other words, saying the perfect anti-truth? And being completely unaware of reality? Is there a word to describe that type of behavior? Is the word “scientist”?

        Blog comments, like wine, expose the truth of one’s inner being…

        • Christian Hennig says:

          Paul Yarnold: Can you please recommend an introduction of length, say, 20 pages, in a standard journal, that should enable the reader to get the main ideas and why they work?
          I see why you like to refer to the book but you’ve got to understand that for an outsider much of what you write looks like a sales pitch and that one wants some more solid and condensed information (than given in the freely available Introduction of the book) before falling for it. Despite your claims there are many good ideas out there worthwhile to know and understand, and I’m not going to read full books on all of them.

          • Start with the articles published before the first book, read the first book, then the articles after the second book. Select any and start reading. I recommend read them all! Or find one that interests you, and start there. This is what professional researchers do when they find a new procedure.

            If you read everything, and then make sense of it, and add 250 other articles, fill-in the holes, and correct the mistakes, then you have the book! The PDF version of the book is priced at less than most text books in stats. If it is too much, your library will certainly order it if requested. There is ONE such book on modern methods (the reason the book was written), and an endless number of other books all on the same topics. Selecting my book to not read is like winning the “I lose” lottery with one white ball (the book) and the rest all being not-white balls (all other books).

            If reading my book or scouring the library is just too much, the ODA page has a list of hundreds of articles in the publications tab that use ODA. There is a special issue out now with a book review, six articles, and a critique. These articles introduce elementary topics and compare alternative approaches.

            I did the hard work and wrote the book. If that is not enough, that is OK, everyone knows thanks to Ben that Rob and I are crazy, and selling is not our primary mission–we didn’t think about it until after the book was out! Not selling to everyone won’t hurt our mission, we haven’t been paid all along.

            It’s not a job, it’s an adventure!

            • Christian Hennig says:

              It’s all math, eh? I went through about six articles now (nicely enough your website has a few) without having yet encountered a single formula defining what exactly is done.

              • The formulas that you seek make assumptions–the root of that word is ass. In the Marines they told us assumption means “make an ass out of you and me”.

                Operations research models make no assumptions–just constraints. Models explicitly get the best possible answer. Operations research got man to the moon. Statistics has been used to test whether the world is round, p<.05 (the AP article often credited with starting the revolution in psychology–JESP recently slammed the door on the old assumption-laden pap).

                There are plenty of formulas, but they are not needed. What is needed is accuracy, reproducibility, and parsimony. These are CONCEPTS, not assumption-laden formulas hiding behind absurdly untenable assumptions.

                The purpose of science in NOT to be blinded by formulas. But, if you need formulas to feel secure (in the book, formulas are replaced with words), take a look at my paper with Lowell Carmony. The formulas in that proof still make me laugh–and then cry.

                We use the math to make conceptually clear models, expressed in their natural units. Nothing to hide, just to seek…

              • Sorry, Christian, I forgot to recommend a free article (the article with Lowell is likely expensive).

                Check this out for a standard optimal linear model:

                THAT is operations research. One size fits nothing. Perfect size for each customer (data set)… The proof with Lowell took this to an entirely new level–it is an existence proof of the theoretical distribution of optimal values, blah, blah…

              • Christian Hennig says:

                Paul: Thanks for this. This makes things actually much clearer. It looks a bit like the Support Vector Machine without the kernel trick and it only ever discusses what I’d call the training error without making statements about generalisation, but fair enough, at least it makes clear enough how the idea actually works.

                “The formulas that you seek make assumptions” – actually I was looking for formulas that clarify what is done because formulas are a mode of communication minimising (when done right) the chance of misunderstandings. Whatever you do makes implicit assumptions and if you don’t give the formulas, they are just not visible but still there. In your paper for example you impose linear separation between classes. To me this counts as an assumption (obviously the method can be applied to classes that are not linearly separated but then the result will not be optimal compared against methods that allow for more flexible boundaries) – fair enough you actually realise this and devote a paragraph to adding nonlinear terms.

            • Darn it, I forgot to say… We only use — formulas — when we have to. Same with pseudo-code. We actually *want* people to understand! My first books were called “Reading and Understanding…” That is the point, in my view, of professional academics.

              However, once we establish a basic formulation, we no longer use formulas to the greatest extent possible.

              Here is another free paper–took Rob and me a half-year of work. Had to download weather data from hundreds of satellites. We use an operations research model–described only in terms of constraints (since that is what it is really all about–ask any astronautical engineer) to create new non-confounded weather factors. This article illustrates paradoxical confounding (Simpson’s) for regression and PCA. And, ODA methods eliminate the confound, and do a MUCH better job.

              So, you say MIP45. It seems pretty complex. The following paper makes it seem obvious and — the only way to fly…


              • My pleasure, Christian–the existence proof is whacked complex (sorry, our comments crossed in cyberspace).

                However, ir is now knows that MultiODA models are inferior to nonlinear (CTA) models.

                If the model is linear, CTA will find it. If the model is non-linear, MultiODA will miss it, or identify a paradoxical confound.

                I want to find small (two attribute) linear models and use them as attributes in CTA nodes. Never been done. I am confident that, at lest in engineering and chemistry, a few processes are actually linear within constraints (e.g., in giant mixing vats, for pharmies). The constraints are a problem in engineering–just ask frozen O rings.

                MultiODA with binary coefficients (1, 0, -1) are more accurate and more parsimonous than logistic regression models using three times the number of attributes (Statistics in Medicine), and routinely find better models than logistic–as well as find powerful models for applications in which logistic finds nothing (lots, I recall a good one in Journal of General Internal Medicine).

                I do want to study binary multioda for data compression, for example, for deep-space pulse transmissions. Decoding time back on Earth or on other spacecraft is not the issue, the issue is battery life of the transmitter.

                CTA blows away MultiODA in terms of ESS–accuracy normed for chance. Or gets the same answer if MultiODA lucks out.

                Novometrics is the pot at the end of the rainbow. Take two variables, x (binary) and y (ordered). What is the association between them? Do you use a t-test? Friedman ANOVA?

                For each test how many models do you get? One t value, one F value.

                In novometrics (Latin for New Measurement), for a given sample, there may be zero models. Or one model. Or, more than one model–like in the Particle in a Box model in QM. In QM the size of the box is indexed by distance, in novometrics the size of the box is indexed by N.

                Here is a recent free paper that describes the novometric D statistic, and related concepts. It is short, and IIRC has one really simple formula. It may be interesting.

                There are lots of others, they are all over the place. The book fixes that–it is why I wrote it, it was not a fun thing to do. I hate writing books, it takes forever… I have six in the LOC now, two are classified.

  11. Dzhaughn says:

    Unless their algorithm is emergent from the entanglement of qubits in anti de Sitter space, I don’t have time to look into it.

  12. Shravan says:

    This post and the comments are starting to feel like something from an Ionesco play.

    • Andrew says:


      Yeah, at first I thought the whole thing was funny but now it just seems sad. It reminds me of this story.

      The interesting thing here to me is not that someone with a Ph.D. has a webpage with his very own perpetual motion machine, but that outsiders (like the person who sent me that original email) can be taken in by it. To professionals like us, all those buzzwords on that website are a warning that we’re in the presence of a statistics or machine-learning analogue to Daryl Bem, but to someone without our training and experience, it looks like it could be the real thing.

      Then again, the editors of the Journal of Personal and Social Psychology weren’t able to see through Daryl Bem’s paper.

      • Rats! I’ve been discovered! Had a good run though, I sure fooled everyone at Northwestern, being the youngest full prof of medicine–that was slick. And my impact factor of 1200 tells a story about my devious tactics–how many reviewers and editors are gullible. And the societies that elected me a fellow–all were drunken fools. Now that I am revealed, perhaps your sadness will abate?

        The first statistician that started their own journal was named Pearson, and his journal was called Biometrika. I am the latest. Opinions expressed here tell me that much more still needs to be done.

        The unmatched training and professionalism of this panel not withstanding, it is inevitable that some will arise who pave the way to the future. Fortunately, this future belongs to the young, and they are watching and forming their own ideas. They have a future to conquer, and an Earth to save. They know the old ways are rotten, and they search for new directions. NEW directions.

        • Andrew says:


          I don’t think the people I saw in that panel, years ago, were devious or slick; I don’t think Daryl Bem is devious or slick; and I don’t think you’re devious or slick. I think you’re all doing your best.

        • Ian Fellows says:


          Yes, but I think Pearson was not the primary and almost sole contributor to Biometrika.

          In terms of communicating your ideas, yes I think there is a lot of work to be done. This blog is read by some of the most sophisticated thinkers in Statistics, AI and Machine learning. What does it tell you that we are not able to parse out the claimed revolutionary paradigm shifting brilliance of your work? It could be that we simply lack the vision to see beyond our own preconceptions, or perhaps like Galois your communication style obscures the work. The third option is that the work is not revolutionary and that you have fooled yourself into thinking it is. After all the easiest person to fool is yourself.

          One of these options is right. I’d be willing to put money on one of them, but am open to being persuaded that I’m wrong.

          • Ian,

            Wow, thank you, what a great thing to tell a dude… Exceptionally motivational, thank you.

            It tells me that likely I am not yet being copied in three giant countries famous for hacking (I admire the countries, but fear the hackers), although every single thing I post is immediately downloaded from these places–automatically (too fast for humans).

            In the beginning I wrote everything out, put in the code. So everyone interested could follow. Then I started getting — bothered by people who wanted to make money. There are at least two google pages of companies advertising “optimal data analysis” that are all shams.

            So then I started to mix it up, referencing complex stuff. Omitting the code if already previously given, mixing-up the topics, firing in all directions, even leaving out crucial details deferred until subsequent papers. Lately I like to publish in bursts, hopefully faster than can be translated properly–forcing multiple transliterations.

            The book finally made sense of it. It is the only efficient way to learn what is known. The latest book covers through novometrics with binary outcomes. I decided to write it before I died, so that people would know what was happening. I had so many of my dearest friends die…

            Still, I wanted to keep novometrics for non-binary class variables (new Axiom 2) under wraps while I investigated it. But my boss got me a BIG DATA file–single-point time series, drug use (doses) by patient every month, subject attribute variables. Darn it! Now I have to start parsing single-point ordered series using ODA and CTA! So, Ariel and I, and Charley and I, put out the first little papers. But, someone has to know what the PA is–computers are not able to put 1 and 1 together—that means reading some papers in indexed journals, and some ODA eJournal papers. Then I decided to attack as many data sets as I could find, to learn more about novometrics. These latest articles are primarily written for people who already know what is happening. The book synthesizes what I consider to be important. Everything in the ODA journal published this year will make most sense after the book is mastered. Some of the stuff needed to understand what is going on is *only* in the book.

            I almost have enough material now to write a second book (300+ pp), maybe next year. However, this year I and some buds have been hitting the indexed journals. I think we have maybe a dozen published this year, as many are under review, and as many are in progress. And, I have my first commercial system rolling out (an AI diary powered by ODA) and hope to have the best data set ever collected for studying drug ADRs and efficacy in temporal single-case multivariable data (see the “analysis of raw data induces” paper in ODA—typing that title section in the search box will get the article). For these indexed journals, we have to “flesh out” everything, which is whatever, what can you do…

            Rob and I are planning to build the informeter soon, I am drafting the specs and sample manual analyses. We will offer that as a subscription, and keep all software in-house and off-line. Then things will be a little better, right now it is complete chaos…

            • Ian, I am not after Pearson. I am after Mowrer, at 51K pages in 70 years (2 pages a day). Mostly descriptive. In books! By myself it will be too hard, so–collaborations on grants and indexed journals! Hey, it’s fun, and otherwise I get bored easily, and I don’t give a damn if I get there or miss by 25,000 pages–it is the journey that I love. The challenge, the opportunity, discovery, celebration, good night’s sleep, happiness… Makes life delicious…

              • One more thing, very important. Fooling myself. Note that the first ODA journal was published in 2010. The next in 2013. Hundred hour weeks for two straight years spent trying to defeat novometrics. I was so happy when I thought I did, twice, but both times I made a computing error. I couldn’t handle the failure to defeat after the 4th year–totally out of data, so out it came. Theories are scary to me because they can never be proven, only supported!

            • Core dump, two annoying memory traces:

              1. For these indexed journals, we need to “flesh-out” everything–which is agonizingly boring and difficult to restate in a myriad of ways, but what can you do…

              2. Gunny said: ASSUME = make an ASS out of U and ME

              Thank you ALL for your time, wit, interest, and participation. And, ultimately, for being pretty cool.

              Until we meet again!

              • Christian Hennig says:

                “1. For these indexed journals, we need to “flesh-out” everything–which is agonizingly boring and difficult to restate in a myriad of ways, but what can you do…”
                But this is exactly what you need to make your work reproducible and transparent, as it should be in science.

              • Yo, Christian et al.,

                Yes, it would be awesome if there was one efficient resource that would do the hard work of synthesizing the literature, and presenting it in a straightforward manner designed to be well understood. That would be GREAT!

                I remember I posted this comment to Ian et al: “The book finally made sense of it. It is the only efficient way to learn what is known. The latest book covers through novometrics with binary outcomes. I decided to write it before I died, so that people would know what was happening. I had so many of my dearest friends die…”

                THEREFORE, To anyone actually interested–read all about it, the PDF is cheaper than a night at the movies! The only book of its kind!

                If one doesn’t have the impression that this is something worth investigating now, I can understand: few lead, some follow, many never get into the action.

                However, bear in mind that everything is moving forward faster, in more directions, and the best is surely yet to come.

                If the cost is out-of-budget, one might ask the reference librarian to submit a purchase request.

                Or, one is free to read every ODA article–in expensive indexed journals and in the free ODA eJournal article, and all the citations, and identify things that need to be resolved, and resolve them (and correct things that are no longer state-of-the-art)…

                Or, nothing of the kind! :-)

              • PS: Sigh, Christian, it occurs to me that you missed the point and said something ignorant, again.

                STATED FACT

                All indexed articles require the identical information–definitions of class variable, attribute, sensitivity, confusion matrix, ESS, permutation p, jackknife analysis–to be repeated in every article (a teacher’s manual is under preparation, to help teach college courses using the book).

                THE POINT IS

                It is extremely difficult to say the same thing over and over, each time perfectly, each time differently.

                Try it for whatever you use–describe everything involved in the procedure perfectly, twice, differently–then do it 200 times. It is boring, and it is difficult.

                THE POINT IS NOT

                That fewer free ODA articles are fleshed-out presently. :-)

                Whomever may wish to read fleshed-out articles can obtain copies of ODA articles in all the indexed journals: from a library or publisher, not from me. There are an amazing number, only a tiny fraction have “ODA” in the title, Rob and I named the paradigm after we discovered it–at the beginning of the collapse of the field due to student drain.

                Begin with the earliest and working forward, including cited manuscripts–in all the indexed journals.

                THE POINT IS

                Whomever wants the most efficient resource, pony-up a few bucks.

                Gunny also said (paraphrase): Invest or get off the pot

  13. Andrew, you are not doing your best.

  14. Shravan says:

    I’m getting into the swing of this ganja-induced stream-of-consciousness thing. “If Fisher had lived in the era of “apps”, maximum likelihood estimation might have made him a billionaire.” First line, chapter 7 of Computer-Age Statistical Inference.

    • It seems logical that making a better widget would sell, but not in science–following the MO, obeying intellectual clone reviewers who ubiquitously use only one method, groupthink–are increasingly cited as enemies of science.

      However, the book suggests that *using* the methods in applied settings–such as modeling (customer) satisfaction, investing, insurance, finance, prospecting, blah, blah–is big business.

      The analysis *is* fun. Anyone with talent, passion and opportunity (data and time) will love this–the entire empirical world (classical data) is a turkey shoot. New knowledge extracted from old data is a routine finding. As one astute French statistician said in RG: “This means everything has to be redone”. Yep. Every analysis is exciting–like deep-sea fishing, one never knows what is at the end of the line. Usually its a small one, most ordinal variables lack theoretical motivation, measurement precision, and thus predictive kinetics. How else will we know what we measure well, and where we strike out? ODA–novometric–makes this crystal clear. Conceptual clarity feels good… Till you accidentally stumble into a hostile crowd, then it is no fun at all…

  15. Shravan says:

    In related news, these days I am looking into the potential of the Fock space to explain Bayesian inference as a special case of frequentist reasoning.

  16. Angus Reynolds, Bronze Swiming Certificate. says:

    I found this excerpt from the American Psycological Association review of the 2004 textbook with an example of how ODA would work.
    To test the hypothesis that one’s manuscripts with fewer pages are more likely to be published…
    “To conduct an optimal data analysis, the ODA software would begin by arranging all of the manuscripts (i.e.,
    observations) along a continuum formed by page length, with each manuscript represented by a 0 or 1 depending on its
    publication status. ODA would then examine all possible cutpoints along the continuum (i.e., midpoints between two
    successive observations that have different values on the class variable) and would separately evaluate the classification
    performance achieved across all observations, using each cutpoint that conforms to the directional hypothesis (i.e., for
    which the lower score on the page-length continuum is associated with acceptance and the higher score on the pagelength
    continuum is associated with rejection). The final ODA model would consist of the cutpoint that matches the
    directional hypothesis and produces the greatest overall percentage of accurate predictions across both categories of the
    class variable. For example, the optimal model might be, “If page length ≤ 25.5, then predict the manuscript is accepted
    for publication; otherwise, predict the manuscript is rejected.” This particular model would be considered optimal
    because no other cutpoint consistent with the directional hypothesis could achieve a greater overall percentage of
    classification accuracy with these data.”

    This reminds me of one of my lecturers mentioning that one of the previous professors at the Uni had come up with some completely different way of doing data analysis, but it never really took off and there wasn’t the computing power for it back then. I’m beginning to think there must be countless examples of such efforts…

  17. Sokal and Sneath wanted to use ODA structures to construct biological trees, but failing to find the exact distribution they used binomial. These early models have proven inadequate. Paul Meehl and colleagues wanted to use an ODA structure to construct taxons, but alas, no theoretical distribution of optima. I recently used novometrics to revisit MMPI and MMPI-2 data, results are in the ODA eJournal. There were others, IIRC mentioned in the first book, and in a recent ODA eJournal article…

    Clearly computers are needed to elucidate the exact distribution for non-directional analysis–but all the computers in the world couldn’t solve the problem for even a moderate N:

    Yarnold, P.R., & Soltysik, R.C. (1991). Theoretical distributions of optima for univariate discrimination of random data. Decision Sciences, 22, 739-752.

    However for directional hypotheses there is an closed-form solution:

    Soltysik, R.C., & Yarnold, P.R. (1994). Univariable optimal discriminant analysis: One-tailed hypotheses. Educational and Psychological Measurement, 54, 646-653.

    Carmony, L., Yarnold, P.R., & Naeymi-Rad, F. (1998). One-tailed Type I error rates for balanced two-category UniODA with a random ordered attribute. Annals of Operations Research, 74, 223-238.

    • At the time we discovered the open form solution, the field was finally established! Hundreds upon hundreds of articles and algorithms were being constructed, because computers were becoming faster. The first PC was on the market–the field was starting to explode, we (the community) began to hold conferences! Then “greed is good” (marketing) and “dot com” (hackers) became the zeitgeist, most systems engineering (engineering colleges) and quantitative methods (business colleges) programs lost so many students that the programs were dissolved–faculty and remaining students scattered about into non-fitting programs. Only a few of the early leaders stuck with ODA. Rob and I never left, there is no other quantitative perspective that we find so captivating, it was the purpose of our lives. The youngsters today have forgotten the math that got mankind to the moon without computers, and have resorted to using pre-ODA methods that have problems which motivated the rise of ODA in the first place. Those who forget history are doomed to repeat it–indeed!

  18. Corey says:

    Geez, these zealots who’re overly fixated on their one true paradigm sure are tiresome. These guys just need to read Cox’s theorem!

    …what? Is there some problem here?

  19. Corey, I have a question regarding your brilliant, seminal statement—

    How do you know? What have you read? Tell me, what do you know about novometric theory?

    Absolutely nothing, but who Who cares, right! It doesn’t matter! Why bother to find out…

    Corey has the answer, everyone stop working!

    HINT#1: If ALL you have is to offer the past, don’t bother–EVERYONE KNOWS. :-)

    HINT#2: NEVER speak about something you don’t understand.

    HINT#3: REALIZE that if you don’t know what something is, then you can’t understand it.

    • Anoneuoid says:

      We all have limited time and resources. I suggested an idea for a heuristic as the first post in this thread. It should take less effort on your part than making all these posts. You chose to side step it, which is fine I guess. However, if you want people to pay attention to the method you are advocating, implementing my suggestion would be the best way to do it.

    • Corey says:

      There shall, in that time, be rumors of things going astray, and there shall be a great confusion as to where things really are, and nobody will really know where lieth those little things with the sort of raffia work base that has an attachment. At this time, a friend shall lose his friend’s hammer and the young shall not know where lieth the things possessed by their fathers that their fathers put there only just the night before, about eight o’clock.

  20. Dear Anonuoid,

    My last post to Mike explained my criteria for donating pro-bono work. It is possible that you meet my criteria. If you remain interested, send a write-up of your research hypothesis, methods, and results to me vis-a-vis RG. This thread is exhausted.

    I appreciate your suggestion about how to get people to “pay attention to the method you are advocating”. As of a minute ago a total of 332 people in 50 countries read 1,123 ODA papers since Monday night. The people reading the posts and THEN reading ODA articles ARE the people that I want to reach, while the people making the posts and NOT reading ODA articles provide invaluable opportunity to defeat baseless objections…

    Legacy wishes to legacy statistics fans!

    • Anoneuoid says:

      >”donating pro-bono work”

      Just that you are treating this as such an undertaking makes me question your algo.

      If you upload a dataset and tell us results that you already have, I will plug into xgboost and report back… it should take a couple of minutes (maybe a bit longer depending on what format the data is in, etc). BTW, if you are familiar with R or python it should take you no more than a couple hours to get xgboost going.

      • Dear Mr anonymous,

        I reject your counter-offer. I hope what I offered you initially is clear this time.

        a) You can select from many data sets already analyzed

        I have already published many, many data sets in the ODA journal, so that they can be re-analyzed. These are free to anyone in the Universe. If you want one of mine, please select and use it–the ODA analysis is already published.

        If you want me to select, use the article comparing scores on MMPI taxons for many different samples. Or, the data on inter-rater reliability of plant health. I recall that I found the results of the analyses interesting.

        b) When I donate time to do work that I do to make a living, I prefer to use new data sets that I haven’t already used and made available to everyone.

        If you want this to happen, you can send me a new data set, and your analysis (intro, methods, results),

        c) Please contact me via RG, this forum is inefficient. I can’t send you a RG message–you have no real name.

        Dear Silent Youngsters:

        Some of the posts on this public thread are empirical evidence for what John Tierney writes is “The Real War on Science”.

        However, there are additional biases and tactics exposed in this thread that John didn’t write about yet. Can you detect and name them?

        These little exchanges are qualitative data that are easily content-coded into ordinal scales. The reliability paper that I mention above discusses how to assess inter-rater reliability of the codings.

        This is extremely important. Clearly, being a reviewer for an ODA paper implies the reviewer should know what ODA is, or at least find out a little bit. Obviously, apparently many vocal hot-shot legacy statisticians dismiss new work without knowing anything about it!

        So, a roster of all people who published papers using optimal (and legacy) methods–who are thus potentially proven-qualified reviewers, and who wish to be on a list of prospective reviewers, will soon be one click away from every editor on the planet!

  21. Dear Christian,

    Your patient insistence on seeing some math, and your rapid and astute evaluation of a new-to-you mathematical model (and request for yet more details), is clear evidence of sincere analytic drive and talent, and of strength of character. All go, no show…

    It occurred to me that, if you wish to collaborate on a comparison of ODA and other methods, perhaps you would be interested in crafting a follow-up to a recent article:

    If interested, please contact me via RG message. It would be an honor to be your wing-man…

  22. anon says:

    Having read through the entirety of the comments, I now feel the urge to reread a classic from Martin Gardner

Leave a Reply