Information flows both ways (Martian conspiracy theory edition)

A topic that arises from time to time in Bayesian statistics is the desire of analysts to propagate information in one direction, with no backwash, as it were. But the logic of Bayesian inference doesn’t work that way. If A and B are two uncertain statements, and A tells you something about B, then learning more about B also tells you something about A (except in some special cases of conditional independence). Here’s an example from applied statistics in drug development.

The same principle applies in other areas of life. For example, a couple years ago I discussed how journal reputation is a two-way street: prestigious outlets such as the Lancet or PNAS lend their reputations to the articles they publish—but when it turns out they’re publishing low-quality work, the negative reputation rebounds on the journal. It has to be that way.

And Mark Palko has discussed a related phenomenon in politics, in which groups that use disinformation to their political advantage can get overwhelmed by it. Here’s Palko:

If you want to have a functional institution that makes extensive use of internal misinformation, you have to make sure things move in the right direction.

With misinformation systems as with plumbing, when the flow starts going the wrong way, the results are seldom pretty.

As Palko put it more recently, “contradictory beliefs often mask similar, or at least compatible, personalities. It is remarkably difficult to reconcile flat earth and alien invasion theories, but adherents of both can frequently find common ground in their sense of isolation and, more to the point, persecution by society in general and the scientific and academic establishment in particular.”

I’ve felt the same way about various goofy social priming theories that we’ve discussed over the years. For example, it’s remarkably difficult to reconcile the theories that votes and political attitudes are determined by shark attacks and college football games and ovulation and subliminal smiley faces and chance encounters with strangers on the street. It’s the piranha problem. But all these theories, inconsistent with each other as they are, all feel the same in that they’re all based on an attitude that voters are capricious and easily manipulated.

The thing that the researchers who favor these theories don’t realize is that if you can be consistently manipulated 100 different ways, then you can’t be consistently manipulated at all—because in any given setting, the manipulator have no control over the 99 other possible manipulations. All these theories kind of feel like they’re allies of each other, but they’re actually competing. In that way it’s similar to the idea that the flat earth and alien invasion theories feel like allies—and I wouldn’t be surprised if supporters of one of these theories have warm feelings about the other one—even thought either of these theories would pretty much rule out the other.

24 thoughts on “Information flows both ways (Martian conspiracy theory edition)

  1. Something confuses me about this – “if you can be consistently manipulated 100 different ways, then you can’t be consistently manipulated at all.” I get the point about inability to consistently manipulate voters, but that does not mean they cannot be manipulated in any of the 100 different ways at any given time. Your post makes it sound like you are rejecting the notion that voters can be easily manipulated because of the inability to do it consistently. But isn’t it entirely possible that they are easily manipulated in many of the 100 different ways – but that it is not easy to do it consistently? My admittedly unscientific view of current events it that voters are easily manipulated for many of those ways (e.g., by reading or hearing a headline story devoid of facts or reasoning). Systematic or consistent manipulation is harder – although the Russians and the NRA(and perhaps others)appear to be potentially do so.

    • Dale:

      I don’t deny that people (including me!) can be manipulated. The key word in my sentence is “consistently.”

      And “consistently” is relevant because consistent effects are what are detectable in experiments, and consistent effects are what are claimed by researchers.

      For example, the claim that subliminal smiley faces have large effects on immigration attitudes is interesting to the extent that it’s consistent—it’s not so interesting to say that one time there were these people in an experiment whose attitudes on immigration happened to be correlated to their exposure to a smiley-face stimulus. And an effect that is not consistent cannot be detected by an arbitrary lab experiment or survey.

      I do believe that some interventions have consistent effects. For example, in a low-information election, it makes sense that if you do a lot of advertising, that will affect people’s votes. It’s not that nothing consistent works, it’s that there can’t be 100 different things out there with large and consistent effects.

      • I’m following most of this, until the end. The final claim is that “there can’t be 100 different things out there with large and consistent effects.”

        This isn’t true. In physics you can easily have 100 objects exerting gravity and changing the trajectory of a particle. These can even mostly offset so that you can reliably detect a single effect, as long as you can figure out what effect you are measuring. But that requires a reliable model of the other factors.

        The problem comes in when a theory is used to test a single effect, but 100 other factors which are all very uncertain contributions (even if they are large and consistent) also exist. Because you don’t have a predictive model for the system as a whole, you’re not really isolating the question you want to isolate in your experiment. And in that case, even after observing noisy data you can’t conclude anything about the one factor you’re looking at, because the variation could be coming from elsewhere. And this is *exactly* the point that Feynman makes about a critical problem in experimental psychology in this excellent essay from 1974 that predicts the replication crisis – http://calteches.library.caltech.edu/51/2/CargoCult.htm

        “I explained to her that it was necessary first to repeat in her laboratory the experiment of the other person—to do it under condition X to see if she could also get result A—and then change to Y and see if A changed. Then she would know that the real difference was the thing she thought she had under control… She was very delighted with this new idea, and went to her professor. And his reply was, no, you cannot do that, because the experiment has already been done and you would be wasting time. This was in about 1935 or so, and it seems to have been the general policy then to not try to repeat psychological experiments, but only to change the conditions and see what happens…

        [He now discusses an experiment he found on rats in mazes that does what he suggested.] Now, from a scientific standpoint, that is an A‑Number‑l experiment. That is the experiment that makes rat‑running experiments sensible, because it uncovers the clues that the rat is really using—not what you think it’s using. And that is the experiment that tells exactly what conditions you have to use in order to be careful and control everything in an experiment with rat‑running. [But no-one did this again.]”

        • >I’m following most of this, until the end. The final claim is that “there can’t be 100 different things out there with large and consistent effects.”

          >This isn’t true. In physics you can easily have 100 objects exerting gravity and changing the trajectory of a particle.

          I think the idea is that there are 100 minor effects that are more or less independent. No one of them can consistently overpower the others, though once in a while that might happen. IOW: because statistics!

        • I think part of the point is also that most people are not actually manipulable in this way. These theories argue that 1-3% of voters change their minds and vote for a different candidate based on the intervention. We know that the vast majority of Democrats (or Democrat-leaners) will vote for the Democratic candidate, and vice versa for the Republicans (see previous posts/papers of Andrew’s). So not only do these theories compete with each other, they compete with each other over a small slice of the electorate.

          So when these theories also argue for an effect size of 1-3% of voters, and only 5-10% of the electorate is manipulable in this way, then two possibilities present themselves. One, these interventions target/affect different voters in the 5-10%. In this case, effect sizes are still overstated in order for each of these theories to have a slice of the pie. Two, these interventions target/affect the same voters. In this case, effect sizes are still overstated because these interventions should work against each other because they are competing. For example, if a voter watches their favorite college team win but then reads about a shark attack, these two effects should counteract each other to an extent.

          Regardless, these theories compete because they present specific interventions to explain how the electorate is manipulatable without reconciling their presented effect sizes and without discussing how multiple effects should affect each other. This is regardless of the many other problems Andrew’s discussed in the past, with the lack of within-person experimental design and the garden of forking paths.

          I also find it amazing that the opposite argument is never made, i.e. “Most voters cannot be manipulated into voting for the other candidate.” I’m certainly not going to argue that voters are all well-informed on all issues, but these papers still essentially argue that most voters do not change their vote preference capriciously due to exogenous events.

        • This is not an area I know much about, so please take this as a request for evidence. “These theories argue that 1-3% of votes change their minds and vote for a different candidate based on the intervention.” Are these theories or evidence? I can buy the fact that few people change their minds once they have made a choice (system 1 thinking is hard to overcome) and I can also buy the fact that many people have party line voting behavior. But I’m not sure that means they cannot be easily swayed – especially early on. I could readily believe that new headlines or tweets (depressing though it is) can exert a large influence in forming people’s opinions. Then, these opinions might become very hard to change. My question is how much evidence do we have about people’s political tastes and the ease of influencing them? I suspect the 1-3% figure comes from somewhere but it is hard for me to square that with what I seem to observe on a day to day basis.

        • “there can’t be 100 different things out there with large and consistent effects”

          What is more true is that if there are 100 essentially random things out there with large consistent effects, then due to the nature of high dimensional random variables, the net effect is guaranteed to be nearly constant, and so none of them can really be used to manipulate anyone because at most the one thing you’re controlling is around 1/100 of the net effect and the 99 other random effects which have similar size are guaranteed to dominate the net effect.

          It’s only when there are 99 other effects that are small and one effect that is large (or similar situation) where you can control the outcome by controlling the one large effect.

          The basic idea is called “concentration of measure” and it relies on

          1) Most effects are similar sized
          2) Most effects vary essentially randomly
          3) There are not dominant correlations between variables
          4) There are a large number of effects

        • Intuitively, the alternative is that there are a large number of intertwined causal effects with some distribution having a (non-intervention) mean near some fixed value. (It means that there WOULD BE large complex correlations, violating assumption 3.)

          In that case, when you manipulate any of these variables, you (accidentally) intervene in the causal structure in a way you’re not paying attention to, and find an effect due to the change. Slight variations will find different effects, and none of them are particularly predictive outside of the experiment because you don’t understand the system, and aren’t correctly identifying what the intervention did.

          (I think that’s exactly what Feynman saw and complained about.)

        • When there are correlations, then the “effective dimensionality” reduces. So for example, if you throw a ball, you induce strong correlations in the trajectories of all the molecules. You can essentially model this as 12 dimensional: one 3-vector of position, and one of translation velocities, and one 3-vector of rotation position and rotation velocity information. Even though all the molecules still have thermal motion that adds some random oscillations to each individual molecule, the dominant effects are the 12 I mention.

          It seems likely that political attitudes are like this, there are basically a few dominant dimensions, perhaps things like historical facts about your life that affect your experiences and hence attitudes, and current facts about the economic conditions you are in, and certain dominant themes in the media that address issues that are of current political attention (I hesitate to say “importance” since a lot of really important stuff isn’t even discussed). And then there’s all the subliminal smiley faces, and whether your daughter gave you a big hug before going to school this morning and which TV show you watched last night, and the color of your socks… and those are all basically like the thermal noise in the molecules. They can’t really dramatically change political attitudes, but they can induce some noise some of the time especially under some certain isolated manipulations in a psych lab. Just like shining a bright light on a ball will make one side hotter than the other… you can do some stuff … but for the most part it’s noise chasing.

          The sad fact seems to be that all the attention in psych goes to chasing the “thermal noise” and everyone’s ignoring the big picture, stuff like “what life experiences during childhood affect people’s political attitudes at age 30?” because obviously that requires running a 30 year long individual person time-series type experiment and requires doing the hard work and fiddling with the noise only requires getting a few undergrads to sit in a chair and push buttons for an hour.

        • Daniel,

          your hitting on a topic I’ve been meaning to write up in a real post.

          Most marketing comes down to either tiebreakers or brand building. Either you are trying to get consumers to pick one completely interchangeable product over another or you are trying to build long-term default preferences (think Kellogg’s spending billions of dollars convincing consumers to pay a substantial premium over storebrand cereals). The first can be done on the fly but can only overcome the most trivial of advantages of one product over another. The second requires years of sustained advertising and promotion.

          Much of the popular coverage of behavioral economics recently has centered on a third category, the Jedi mind trick, where big changes in behavior and beliefs are produced with trivial, one-time effects. I tend to be highly skeptical of the third category.

      • The key word is indeed ‘consistently’, and I think you are (unfairly) equivocating
        its meaning.

        The type of consistency _needed_ for an effect to be detectible, and presumably what is being implicitly claimed, seems to have a ceteris paribus element: it talks about (when causal) the effect of an intervention in a population otherwise as it exists. Yes, it needs to apply to a population substantially wider than the actual experiment or survey performed (else, not interesting). But it doesn’t need to apply to any population we can imagine (e.g. any specially selected subpopulations, or populations that have had other chosen interventions).

        Suppose the shark attack research seemed unimpeachable and the effect was large. It would not be in the tiniest bit inconsistent to later find that bear attacks, lion attacks, and so on and so on, had a similar effect. To the contrary, these later results would bolster the shark attack finding.

    • The point, I think, is that if your theory doesn’t constrain your prior belief about what the outcome will be, it’s not a predictive theory. In this case, there are so many variables that any result can be explained by the theory, so it’s not useful. And that same fact ALSO means your posterior after seeing a result doesn’t materially change your model, because it could be caused by any of those 100 variables.

      This point is made more broadly / less technically in a post by Eliezer Yudkowsky about “beliefs” being meaningless if they don’t constrain expectation – http://lesswrong.com/lw/i3/making_beliefs_pay_rent_in_anticipated_experiences/

  2. This makes perfect sense as a consequence of how reasoning needs to work, as explained by Jaynes. It’s an important point, and in the absence of a better source, I’ll probably use it as a reference when trying to explain this point, but I’m unsure it’s fully intelligible to the wider public. Eliezer Yudkowsky explains part of this when he talks about absence of evidence being evidence of absence, and that explanation requires very little background (http://lesswrong.com/lw/ii/conservation_of_expected_evidence/). I’m unsure if I’ve seen any explanation of the fact that information propagation isn’t unidirectional that’s really accessible to people who don’t understand the math – this post is as close as I think I’ve seen.

    Does anyone else have a suggested explanation that’s clearer to non-bayesian statistics people?

  3. Sort of a side issue, but why did BUGS have the cut operator?

    I once had a model that had a poorly conditioned component — it made BUGS mix very slowly and occasionally barf. (This was well before the advent of Stan; BUGS was the only game in town for generic-ish Bayesian inference.) My attempt to use the cut operator didn’t improve anything — possibly I wasn’t using it correctly. The problematic component was bivariate so I split it off, did a kind of local Bayesian analysis in Matlab using a subset of the data that was (highly) relevant only to that component and used the posterior mean as a plug-in estimate in the BUGS specification of the model, preventing information flow from the rest of the data back to the component in question. That information flow not very informative which was both the cause of the numerical instability and the justification for the approximation. This solved the numerical issues but made the doctrinaire Bayesian in me grumpy.

    I guess if there’s a message in the above anecdote, it’s that some information flows can be neglected in the presence of other more informative, erm, information.

    • I can think of one reason why you might want a cut type operator, and that’s to use an unbiased estimate to plug into a model that investigates data where the estimate would be otherwise biased.

      For example, suppose you have some data on speed of light, or some other similar kind of “constant” aspect of the world. And then, you have some experiment in which you have measurement problems that induce bias in your measurements. If you let the speed of light be a parameter, in your model you’ll have trouble identifying whether the speed of light is low, or your clock is slow or the length measurements are biased or whatever…

      So you fit the speed of light to a bunch of data where the measurements are accurate, get a tight posterior distribution, and then you plug the output into your second model and investigate your situation without the problem of identifiability or bias.

      Now, could you just do one big model with all the data? Yes, but it may be computationally prohibitive, and particularly if you can re-use the result of your calibration experiments over and over, it could be useful to just run that calibration experiment, find a tight posterior for your unknown, and then use that tight posterior as a prior for lots of other analysis, over and over.

      • I guess in some sense, this is generally a good idea when you have the opportunity to calibrate a measurement system first with a bunch of calibration data, and then use the calibration results over and over as if they were well established facts about your instrument/method thereby making the analysis of individual experiments simpler and clearer.

    • I guess the reason for the cut operator is that people do not “understand the logic of Bayesian inference”, as Andrew Gelman so nicely writes it.

      Here are some links to articles by various people who don’t get it.

      Modularization in Bayesian analysis
      https://projecteuclid.org/euclid.ba/1340370392
      (M.J. Bayarri, J. Berger, F. Liu)

      Cuts in Bayesian graphical models
      http://statmath.wu.ac.at/research/talks/resources/Plummer_WU_2015.pdf
      (M. Plummer)

      Joining and splitting models with Markov melding
      https://arxiv.org/abs/1607.06779
      (R. Goudie, A. M. Presanis, D. Lunn, D. De Angelis, L. Wernisch)

      Statistical learning in models made of modules
      https://arxiv.org/abs/1708.08719
      (I’m on this one as well as L. Murray, C. Holmes, C. Robert)

      • From Bayarri et al.:

        Within Bayesian analysis, there is increasing use of modifications to posterior distributions that do not strictly flow from Bayes theorem… We are not questioning Bayesian analysis here; if one is comfortable with all the modeling and prior assessments that go into an analysis, and if it is possible to carry out the ensuing Bayesian computation, then certainly we would not argue for altering the posterior distribution. Uncertainties in modeling and practical computational realities, however, may suggest certain types of modifications of the posterior, and our goal is to try to understand which modifications are reasonable. Note that one does not have the coherency of Bayesian analysis as an automatic support for the modified analysis, so supplementary justifications are often needed.

        Sounds like they get it just fine.

  4. It might be useful to think about the underlying premise you advance, that people believe voters are easily manipulated. Or that people in general are. When I was 8, I could take a watch off a person while walking next to them. (I didn’t keep them.) It was easy when you understood how little attention people pay to most things. After a few, the bigger test was keeping my concentration because the natural tendency is to see how little attention you can pay and still do it. I had a strong aversion to getting caught and stopped myself because the skill, while cool, wasn’t as fun as other stuff. The priming idea plays off this but it’s woeful as a model because anyone who has ever manipulated attention knows that’s not how it works: you can pretty easily distract in order to gain an advantage, but that advantage is over them in the game you’re playing on them, not in something fundamental in their heads or behavior. You may even be much better at manipulating them – think a great boxer discovering and then taking advantage of an opponent’s physical and mental weaknesses – but that doesn’t mean you do more than influence them in the game you’re in. Like my friend no longer had a watch until I handed it back. So, it seems pretty clear that if you’re going to state that political decisions are so easily manipulated they’re like having your watch lifted as you walk, then you need some model which connects a decision made in your head, which may occur over a period of not only months but over your entire conscious life taking in elections and issues and how your family and your community and culture respond, to not paying attention as I tap the end of the strap so the loop pops up. (After that the only hardish part is getting it off the wrist elegantly. Interestingly, the best way is to involve the others around you because shifting the context removes that extra bit of attention to what you’re doing so no one notices the watch is gone until it’s been in your pocket. Most times it was a total surprise the watch was missing when I gave it back.) I have a model for watch lifting. It’s pretty much the same as when you’re duped in cards or duped in anything in the moment where the key words are in the moment. You can watch super-intelligent software engineers be taken advantage of by a con artist magician hired for an event because they enter the con artist world for that moment. So what would the model be for affecting a vote? Obviously, Facebook must rank high because advertising in general is highly effective and we’re all slaves to buy any product that has money put behind it, right? First questions would be: research of the effect of advertisements except that pretty much says there isn’t much effect: this campaign may work but then it may work because they’re actually telling you more often about a feature you want or they’ve improved the product or process and they’re letting you know. So we have some variable in a model that links manipulation to result when the manipulation is applied in the moment, like a pick-pocket only needs to jar you momentarily to dip you, and we can vary that as the context expands over time. Here’s one place I think they go mentally wrong: yes, an election takes place when you vote, which is a moment, but that doesn’t mean manipulation has the same variable as manipulation in the moment. I wouldn’t normally advance something so stupidly obvious, but I’m fairly convinced they need to be treated that way.

    There are also significant cultural effects. Like Sears’ top executives were called The Tall Men because it seemed you really needed to be tall to get to the top ranks. Women, minorities, short people, fat people, all feel those cultural effects: it’s harder to be viewed as the best candidate, etc. if you don’t fit the cultural expectation, whether that expectation is rational or not. It’s a version to me of super huge testicles: if we elected based on super huge testicles, that would be like a peacock’s tail, and it sounds silly, looks beautiful if you’re into peacock tails (and/or super huge testicles) but it’s also not necessarily the choice the external world might impose. This is getting into the interplay of selection and issues in the area of choice but I mean fairly generally that you develop a method – call it culture – and that seems to work so it continues and that’s great and you have super huge testicles and tall executives and giant tails until that makes no sense and selection imposes another ordering. I can’t see a hint any of these people understand basic issues like selection and choice outside of statistical sampling methods (if that). Obviously, again, Facebook ads are clearly the absolute best way to change minds because a) Facebook is about friends and family and b) ads and news from friends and family is most trusted … even though I can’t think of anyone who trusts what their Uncle says – and he doesn’t drink! – or, heaven forbid, what their parents think. I phrased it that way because it’s a simple statement of groupings where they’re positing ‘power’ into a group relation as though its imposing an ordering, one that results in and even changes a decision about which you receive a gazillion data points, and this is a fairly classic selection problem whose solutions tend to be chaotic as heck because random is too stable a solution to last. If that isn’t clear, I mean actual random is tough to find and hard to maintain so all you get in those selection problems is distributions which even if they’re treated as classically normal have tails. (And as finance shows, even if you try to make a distribution of tail events so you can treat them as normal, you push the tail out another step. We actually do this all day long in our lives and you’d think people could see how chaotic selection becomes just one step out. Try walking on the sidewalk.)

    Mathematical and logical garbage.

    • Your watch-taking example suggests a different interpretation. Rather than focusing on whether such influences are effective in the moment vs. over time, it might be whether the victim really cares about what is going on. My belief (again, not based on any research) is that people care so little about their votes and who/what they are voting for that simple tricks can influence them – and not just for the moment. Such simple tricks are less likely to work for something they really care about – such as their favorite sports team. But when they have little investment (and little real understanding), they may be quite manipulable.

Leave a Reply to Dale Lehman Cancel reply

Your email address will not be published. Required fields are marked *