A few words on a few words on Twitter’s 280 experiment.

Gur Huberman points us to this post by Joshua Gans, “A few words on Twitter’s 280 experiment.” I hate twitter but I took a look anyway, and I’m glad I did, as Gans makes some good points and some bad points, and it’s all interesting.

Gans starts with some intriguing background:

Twitter have decided to run an experiment. They are giving random users twice the character limit — 280 rather than 140 characters. Their motivation was their observation that in Japanese, Korean and Chinese 140 characters conveys alot more information and so people tend to tweet more often. Here is their full statement.

The instructive graph is this:

more-chars-1.png.img.fullhd.medium

The conclusion drawn is that Japanese tweeters do not hit their character limit as much as English tweeters. They also claim they see more people tweeting in the less constrained languages. Their conclusion is that not having as tight a character limit makes expression easier and so you get more of it.

Interesting.  Gans continues:

What Twitter have just told us is that the world gave them a natural experiment and they liked what they saw. . . . What was Twitter’s reaction to this? To do an experiment. In other words, they are worried that the natural experiment isn’t telling them enough. Since it is about as clean a natural experiment as you are likely to get in society, we can only speculate what they are missing. Are they concerned that this is something cultural? (They had three cultures do this so that is strange). Moreover, many of those users must also speak English so one has to imagine something could be learned from that.

I’m not quite sure what he means by a “culture,” but this generally seems like a useful direction to explore.  One thing, though:  Gans seems to think it’s a big mystery why Twitter would want to do an experiment rather than just draw inferences from observational data.  But an experiment here is much different from the relevant observational data.  In the observational data, the U.S. condition is unchanged; in the experiment, the U.S. condition is changed.  That’s a big deal!  We’re talking about two different comparisons:

observational:  U.S. with a 140 character limit vs. Japan with a 140 character limit.

experimential:  U.S. with a 140 character limit vs. U.S. with a 280 character limit.

These comparisons are a lot different!  It doesn’t matter how “clean” is the observational comparison (which I think Gans somewhat misleadingly calls a “natural experiment”); these are two different comparisons.

Gans continues:

My point is: the new experiment must be testing a hypothesis. But what is that hypothesis?

Huh?  There’s no requirement at all that an experiment “must be testing a hypothesis.”  An experiment is a way to gather data.  You can use experimental data to test hypotheses, or to estimate parameters, or to make predictions, or to make decisions.  All these can be useful.  But none of them is necessary.  In particular, I’m guessing that Twitter wants to make decisions (also to get some publicity, goodwill, etc.).  No need for there to be any testing of a hypothesis.

Gans does have some interesting thoughts on the specifics:

The obvious way [to do an experiment] would be to announce, say, a three month trial across the whole of English speaking twitter and observe changes. That would replicate the natural experiment to a degree. Or, alternatively, you might pick a language with a small number of users and conduct the experiment there. . . .

That is not what Twitter did. They decided to randomise across a subset of English users — giving them 280 characters — and leaving the rest out. That strikes me as a bad idea because those random people are not contained. They mix with the 140 people. . . .

Why is this a terrible idea? Because it is not an experiment that tests what Twitter was likely missing from the information they gained already. Instead, it is an experiment that tests the hypothesis — what if we gave some people twice the limit and threw all of them together with those without? The likelihood that Twitter learns anything with confidence to move to a 280 limit from everyone is very low from this.

All this seems odd to me.  Gans’s concern is spillover, and that’s a real concern, but any design has issues.  His proposed three-month trial has no spillover but is confounded with time trends.  If it’s not one thing it’s another.  My point is that I don’t think it’s right to say that a design is “terrible” just because there’s spillover, any more than you should say that the design is terrible if it is confounded with time, any more than you should describe an observational comparison which is confounded with country as if it is “as clean as you are likely to get.”

Yes, identify the problems in data and consider what assumptions are necessary to learn from these problems. No, don’t be so sure that what people are doing is a bad idea. Remember that Twitter has goals beyond testing hypotheses—indeed I’d guess that Twitter isn’t interested in hypothesis testing at all!  It’s a business decision and Twitter has lots of business goals. Just to start, see this comment from Abhishek on the post in question.

Finally, Gans writes:

What we should be complaining about is why they are running such an awful experiment and how they came to such a ludicrous decision on that.

Huh?  We should be complaining because a company is suboptimally allocating resources?  I don’t get it.  We can laugh at them, but why complain?

P.S.  Yes, I recognize the meta-argument, that if I think Gans has no reason to complain that Twitter did an experiment that’s different from the experiment he would’ve preferred, then, similarly, I have no reason to complain that Gans wrote a blog post different from the post that I would’ve preferred.  Fair enough.

What I’m really saying here is that I disagree with much of what Gans writes.  Or, to be more precise, I like Gans’s big picture—he’s looking at a data analysis (the above graph) and thinking of it as an observational study, and he’s looking at a policy change (the 280-character rule) and thinking of it as an experiment—but I think he’s getting stuck in the weeds, not fully recognizing the complexity of the situation and thinking that there’s some near-ideal experiment and hypothesis out there.

I appreciate that Gans is stepping back, taking a real-world business decision that’s in the news and trying to evaluate from first principles. We certainly shouldn’t assume that any decision made by Twitter, say, is automatically a wise choice, nor should we assume that change is bad.  It’s a good idea to look at a policy change and consider what can be learned from it.  (For more on this point, see Section 4 of this review.)  I’d just like to step back a few paces further and place this data gathering in the context of various goals of Twitter and its users.

So I thank Gans for getting this discussion started, and I thank Huberman for passing it over to us.

P.P.S.  I wrote this post in Sep 2017 and it’s scheduled to appear in Apr 2018, at which time, who knows, tweets might be 1000 characters long.  I still prefer blogs.

20 thoughts on “A few words on a few words on Twitter’s 280 experiment.

  1. Blablabla, but let’s just try the experiment of turning off Twitter entirely, and see if violent and anti-social behavior is reduced worldwide… and then if it is, keep it off, and if it isn’t… keep it off anyway ;-)

    Also I’d like to point out that the “reason” to do this experiment is to get better inference on what would happen if you moved to 280 characters… but you’d only do this if moving to 280 characters has some real risk to it… and it doesn’t. Only 9% of english tweets hit the limit as is… it’s not like you expect if you turn on 280 characters that suddenly you’re going to double the growth rate of your data storage or something. Just turn on the 280 characters, heck just turn on 1500 characters… it’s basically risk free.

    • For the first point, you’re obviously joking but that would be a terrible experiment, because it would be confounded with time trends. Isn’t that one of the points of the post?

      For the second point, it’s not risk free at all. Twitter’s appeal is short messages, not reading stories. Going to higher character limits could reduce user engagement. Just because only 9% are hitting the limit (which is pretty high to hit the limit exactly) doesn’t imply messages would stay under 140 characters if the limit was increased to 280. I would guess the average length would go up to 250 or so, similar to what it is now.

      • Depends on whose utility function you’re using. Risk free for me certainly ;-)

        I’ll agree though with your point about forcing people to say small trite and often divisive things may be selling a lot more ads than if they let people say things with nuance and substance.

        • I think what you’re looking for is something called blogs. Most of which I would never see if it wasn’t for short and concise Tweets mentioning them. Following Data Science influencers can be enormously helpful for people new to the profession and experienced as well.

        • Twitter may have its uses, I admit to thinking of it as an overall net negative on the world though. It really does promote a lot of snark and divisiveness, as does Facebook as far as I can see.

          Most likely, if you didn’t have Twitter, you’d find blogs by some other means, like perhaps google searching on certain topics, or following a few main ones and then they will link with substantive commentary to other ones, etc. The idea that if Twitter disappeared people would be unable to find useful data science articles is ludicrous.

        • In fact, it seems like what Twitter (and many other current public venues for discussion) does do is emphasize quantity over quality, and so we are swamped with crap, kinda like the modern scientific journal.

  2. I’d like to point out in Daniel’s comment that moving to 280 characters IS a risk. It may not be visible in short term but over time it might start to be obvious.

    Imagine if in your Twitter feed every one starts to compose more than 280 characters. These tweets occupy more real estate space compared to those less than 140ch. Now people have to read longer thoughts of someone than they are typically used to. Now lets introduce Time variable. Within a given time frame (say 10 mintues), you would have scrolled the same (or similar) amount in terms of pixels than before but the number of tweets consumed would be much less (probably half?!) than before. This is a risk to the business when their model is inserting Sponsored content within the feed. Now I don’t know how much did the Product Managers at Twitter thought through this scenario. Maybe these are some of the metrics that they want to learn by mixing 280ch users with 140ch users and measure mean scroll lengths of readers whose feeds were exposed to various proportions of 140ch:280ch tweets.

  3. Back when I was a kid, we were taught that Science works like this: (1) come up with a hypothesis, (2) perform an experiment to test the hypothesis, (3) accept or reject the hypothesis. As I matured and eventually became a scientist myself I realized that that’s not how it works at all…or, rather, it’s an incomplete and distorted picture. For one thing, most of the important bits are buried in step 1, come up with a hypothesis. Sure, fine, but how? Sure, the experimental confirmation of the Theory of General Relativity is important, indeed indispensible, but coming up with the theory in the first place was by far the hardest part. And this was a lot more than ‘I hypothesize that spacetime is distorted near the sun’, Einstein came up with a fully developed theory with exact mathematical predictions.

    Maybe that’s an extreme example. At any rate I 100% agree with Andrew’s point that you don’t need a specific hypothesis in order to justify an experiment. The only hypothesis you need is ‘my experimental manipulation may change something I care about.’

    I don’t know who Joshua Gans is, but I suspect he may have learned the BS about hypothesis-experiment-reject being ‘the way’ to do science. Twitter presumably thinks their business might be better in some way if they give users more characters. That’s the only hypothesis they need. They don’t even need to decide in advance what they mean by ‘better’: people tweet more? People read more tweets? People spend more time reading tweets? People say on surveys that they prefer reading longer tweets? Actually, presumably what they really care about is whether people click more on ads, or see more ads, or whether the change would allow them to charge more for ads (does Twitter use ads? I don’t use it and have no idea where their money comes from).

    Or, if we want to recast it, we can put it in terms Gans might like: Twitter’s hypothesis is that people’s tweet-writing and -reading behavior will change if people can write longer tweets. They’re testing that. Science!

    • Or, if we want to recast it, we can put it in terms Gans might like: Twitter’s hypothesis is that people’s tweet-writing and -reading behavior will change if people can write longer tweets. They’re testing that. Science!

      That’s not science… it’s a waste of time. I can tell you right now that the behavior will change.

      • Is there a name for the logical fallacy at play here. It is like constantly testing whether 2 + 2 = 4.

        If I put 2 apples next to two other apples do I get 4 apples?
        What if I do the same with stones?
        What if it is trees?
        How about people (this one is very socially relevant with public health implications)?

        Does addition also work if we do 3 + 3 = 6, or if adding two different numbers like 8 + 5 = 13? How high is this valid? Does it work for thousands, or even millions of items? The evidence regarding this is not yet published. We’ll need at least $30 billion a year for the foreseeable future to run all the tests to fill this gap.

        At some point you are supposed to stop.

      • Of course the behavior will change! I’m being sarcastic!

        Testing a hypothesis does not make something scientifically useful, and ding an experiment without having a specific hypothesis is perfectly acceptable science.

    • Phil:
      > most of the important bits are buried in step 1, come up with a hypothesis
      You might or might not find this interesting

      “Einstein was deeply puzzled by the success of natural science, and thought that we would never be able to explain it. He came to this conclusion on the ground that we cannot extract the basic laws of physics from experience using induction or deduction, and he took this to mean that they cannot be arrived at in a logical manner at all. In this paper I use Charles Peirce’s logic of abduction, a third mode of reasoning different from deduction and induction, and show that it can be used to explain how laws in physics are arrived at, thereby addressing Einstein’s puzzle about the incomprehensible comprehensibility of the universe.”

      Charles Sanders Peirce and the Abduction of Einstein: On the Comprehensibility of the World https://arxiv.org/abs/1610.00132

        • I don’t get the Frank Ramsey connection here but there is this – https://en.wikipedia.org/wiki/Charles_Santiago_Sanders_Peirce

          “There is no well-documented explanation of why Peirce adopted the middle name “Santiago” (Spanish for Saint James) but speculations and beliefs of contemporaries and scholars focused on his gratitude to his old friend William James and more recently on Peirce’s second wife Juliette (of unknown but possibly Spanish Gypsy heritage).”

        • Ahhh it was a joke.

          BTW, I purchased Charles S. Peirce biography by Joseph Brent. The only reason I took interest in Peirce b/c Robert Nozick, while we were having coffee in Harvard Square, mentioned that I engaged in abduction as distinct from induction and deduction. However then I had little understanding what he meant. That was in early 70’s. Nozick was quite interesting conversationalist. I think he said, ‘start a query with something interesting’ I got along with Nozick well as he was eclectic. And suggested I seek a degree in philosophy of science. He challenged me which is when I put in some good effort.

  4. I guest it should not be just about writing tweets, but also about reading tweets. Imagine the situation where writers jump to the opportunity of writing longer texts, but then nobody reads them anymore.

    And as I understand Internet, readers are more important than writers.

Leave a Reply to Sameera Daniels Cancel reply

Your email address will not be published. Required fields are marked *