Discreteland and Continuousland

Roy Mendelssohn points me to this paper by Jianqing Fan, Qi-Man Shao, and Wen-Xin Zhou, “Are Discoveries Spurious? Distributions of Maximum Spurious Correlations and Their Applications.” I never know what to think about these things because I don’t work in a discrete world in which there are zero effects (see our earlier discussion of the “bet on sparsity principle”), but I thought I’d pass it along in case any of you are interested.

24 thoughts on “Discreteland and Continuousland

  1. So does this mean that my SEO ranking will go way up? I do have a question that I have never been able to wrap my head around. I know you generally advocate taking as large a model as possible and letting the hierarchical model decide what to include, and also not to have zero probabilities. Where I have a problem is how is this much different then allowing 1000 variables in a regression and seeing what sticks, which can lead to spurious relationships. Also, many estimators end up as linear sums of the data, even things like smoothing splines, and when some variables are not set to zero you end up with a lot of added noise. This is particularly true with space-time data. Forcing locations that are essentially noise from being included in the analysis produces much better results. This is equivalent to forcing the effect to be zero.

    Thanks for any insight.

    • Because in the 10000 variables case, the estimates are independent. In the hierarchical model case, there’s a model of the distribution of effects. Including a lot of things that are zero will inherently bias each individual effect towards 0. Not so if each effect is estimated in a so-called “unbiased” manner.

  2. I agree. The default assumption should be that everything is correlated with everything else. The mere presence of a relationship is uninteresting because it offers no additional information. Once again Paul Meehl already dealt with this in his area of expertise:

    “These armchair considerations are borne out by the finding that in psychological
    and sociological investigations involving very large numbers of subjects, it is regularly
    found that almost all correlations or differences between means are statistically
    significant. See, for example, the papers by Bakan [1] and Nunnally [8].
    Data currently being analyzed by Dr. David Lykken and myself, derived from a
    huge sample of over 55,000 Minnesota high school seniors, reveal statistically significant
    relationships in 91% of pairwise associations among a congeries of 45 miscellaneous
    variables such as sex, birth order, religious preference, number of siblings,
    vocational choice, club membership, college choice, mother’s education, dancing,
    interest in woodworking, liking for school, and the like.”
    http://cerco.ups-tlse.fr/pdf0609/Meehl_1967.pdf

    • the DAG of causal relationships in nature is sparse but fully connected. This is why I think “bet on sparsity” as a guiding principle only makes sense in the context of prediction. Even in predictions, there are problems with the principle.

      • I am not very familiar with thinking in terms of DAGs. I have a few (probably naive) concerns with what I understand you to be claiming.

        1) Nature can be described by a Directed *Acyclic* Graph. This seems like an extremely strong claim to me. Why can’t A and B simultaneously cause each other? As I said, I am probably misunderstanding this. How are feedbacks represented by a DAG in this context?

        2) “Newton’s law of universal gravitation states that any two bodies in the universe attract each other with a force that is directly proportional to the product of their masses and inversely proportional to the square of the distance between them” (https://en.wikipedia.org/wiki/Newton%27s_law_of_universal_gravitation) This means that everything in the universe is really affecting everything else, however minutely. So I do not think the universal DAG is truly sparse, more that most edges are extremely “thin”(low, but >0 influences).

        • 1) I think is taken care of by properly considering time-series. For example in a system of differential equations for gravitation of two bodies, it seems like everything causes everything else… but in fact position and mass NOW cause acceleration NOW, and acceleration NOW causes velocity to change NEXT and velocity NOW causes position to change NEXT, keeping the time series in mind helps to sort this out. Otherwise it looks like position causes acceleration, acceleration causes velocity, velocity causes position around in a cycle.

        • There was some discussion in a Newtonian mechanics class I took as to whether forces could depend on the acceleration. It turns out that this can be a useful model for some feedback control systems, but we have to understand it as forces now being caused by acceleration an epsilon of time prior to now. In other words, we can have a system where accelerometers measure acceleration and then exert forces, but the acceleration they’re measuring is always slightly “in the past”.

        • It’s true that breaking things down by time is often sufficient to address concern (1), but I think physical laws can be considered examples of instantaneous causation that still pose problems. For example, the relationship between pressure, volume, and temperature of a gas. Dawid wrote a paper called ‘Beware of the DAG’ (sorry, too lazy to find link) where he proposed an alternative mechanism for handling causality that he said doesn’t get tripped up by cyclic instantaneous causal relationships.

        • sure, applied work with DAG needs care and there’s plenty of sloppy epidemiology work out there hidden under fancy path analyses.

          i’m only using it as a conceptual model for conditionality-induced correlation structure. From that perspective, cycles can be part of the explanation of how sparse causality induces dense correations. if you’re an economist you might just label it ‘endogeneity’ and call it a day.

        • the PV=nkT equation is explicitly an *equilibrium* equation, in other words, all of time is collapsed down to a point at infinity. This isn’t instantaneous causation, it’s a simplistic model for dynamics that ignores all but the endpoint. Still, it has value in many analyses because “infinity” in time for a bunch of molecules is still potentially much faster than your eye can blink.

        • I see what you are saying. When I think in terms of simulations I come to the same conclusions, that if the value of A affects the value of B it *must* come before it. But I am unconvinced this is can’t be an artifact of the way we are modelling the phenomenon.

          Does it come down to the age-old question of whether time is discrete or continuous? Or somehow both, like how light is supposedly a wave and a particle. Is it possible for an event to occur *at* time t? Or is that always shorthand for *within* time interval t?

        • Personally, I think that this is a bit of a confused understanding of mechanics that is symptomatic of a general desire for discrete-style thinking. This then that then this then…Variational principles, symmetry and invariance provide, to me, a better conception of causality and are much more ‘continuous’ concepts.

        • But continuity is itself false-thinking. Particles are discrete, thanks to uncertainty principles individual measurements have reasonably well defined lower bounds on accuracy, and the Variational principles of classical mechanics seem to be inducible by averages over a nonstandard number of discrete Feynman paths.

          I take a very warm view of the Internal Set Theory version of nonstandard analysis, so your mileage may vary, but it’s a principled view, not a confused one.

        • One of the principles is that taking the nonstandard-analysis viewpoint, which is essentially viewing “continuous” models as indexes of a class of discrete model where the discretization is smaller than some unspecified grid produces less confusion about the meaning of the model, along the lines of making causality explicit in the present question, as well as solving lots of conundrums about things like crack propagation or the like where purely mathematical singularities can confuse physical intuition. For example, I reject the idea that any mathematical difficulty arising from continuous probability distributions is a difficulty of interest to statistics: all measurements are discrete, some measurements are dense enough to be usefully modeled as continuous provided you don’t let the continuous math drive the bus.

        • Daniel and hjk – interesting debate and I am biased to Daniel’s position but I don’t think anyone knows or ever will know.

          We don’t directly perceive reality but rather just represent it to ourselves – we can’t step outside these representations to see how close they are to reality but we are continually reminded they are wrong when we miss-predict how reality is.

          So if reality is finely discrete, it can be well represented as continuous (and this may be very convenient with little disadvantage unless we miss-interpret implications of the representation as risk free predictions of reality.)

          If reality is continuous, it can be well represented as finely enough discrete (although this is maybe unnecessarily awkward and also risky if we miss-interpret implications of that representation as risk free predictions of reality.)

          If our representation happens to match reality with respect to continuous/discrete it is still a disadvantage if we miss-interpret implications of that representation as risk free predictions of reality as the representation will be wrong in other ways.

        • hjk, in IST version of nonstandard analysis, we can show that every standard continuous function f(x) is the standardization of some nonstandard s-continuous function. We simply take a grid of nonstandard intervals of interval size dx, an infinitesimal, and create a function which is constant between x and x+dx and equal to f(x) over that interval, or if you like, equal to f(x+dx/2) or f(x+dx), in each case the standardization is the same. So, in some sense, the standard continuous functions are a kind of equivalence class for a large class of discrete but s-continuous functions. Anything predicted by a standard continuous theory is a thing predicted by each member of that equivalence class as well. There is no way within standard mathematics to know whether there is a difference between the standard continuous stuff, and the nonstandard, we can only know that difference when working inside the nonstandard apparatus.

          In other words, nonstandard discrete structures are NOT the same as discrete structures within standard mathematics, so arguing within standard mathematics that things are not standard and discrete does not argue against the idea that they are (usefully modeled by) nonstandard-discrete concepts.

        • AFAIK nothing stops a directed graph from having both directions be active e.g. a->b and b-> the main point of being directed is that that is not the same as a-b.

        • Well I usually think about the Acyclic part as being like in git, meaning that you can’t change history (meaning time is in the picture as you said above).

    • CS Peirce made a similar point – presence of a irregularity [replacing dependence] is uninteresting but regularity [replacing independence] is interesting.

Leave a Reply

Your email address will not be published. Required fields are marked *