Skip to content
 

New Twitter research

Drew Conway writes:

I [Drew] came across some new statistical research that may be of interest for your blog. Straight methods criticism is a bit outside the scope of my focus, however, I am a avid Twitter user and after reading a new report from Harvard Business on the service I noticed several methodological holes that I think are worth noting.

I would be very curious to get your take, as the research has been burning up Twitter all day and no one seems to be taking pause to ask any questions.

Here are my questions:

1. How does the fact that 80% of users follow or are followed by one or more test the capacity of a user base to understand the service? Do we have some expectation about the probability of an occurrence of a tie, and if so, why?

2. A large portion of Twitter users keep their gender identification ambiguous, therefore, to what extent does this alter their conclusions as it appears they made no attempt to correct for it?

3. As the article points out, all well developed online social media services follow have a contribution pattern that roughly follow power-law or exponential distributions. Does the fact that Twitter is within at the extreme bounds of these distributions only point to the fact that it is still settling into an equilibrium state?

I think Drew will be disappointed to hear that I have no comments at all on this study! I just don’t know enough about Twitter to even start to think about it. But I’m sure the rest of you have thoughts on this…

5 Comments

  1. The article is fairly vague on methods so it would be difficult to even answer your questions, Drew, but they are good ones. I noticed that one reader commented on how useful the research is, but I disagree based on what I read in the article. The conclusions seem to be drawn rather hastily without properly ruling out uncontrolled-for variables.

    The fact that 80% have one or more followers is likely due more to the use of applications such as Twollo. Within 30 minutes of creating my account, I had 7 followers and I wasn't following anyone yet. They were all businesses or social media experts looking for people to follow. The authors can't verify whether a user knows his/her followers – and you don't need to understand the service to have someone jump on your wagon.

    They did try to explain how they controlled for gender, but names alone are not enough to make any sweeping conclusions. They didn't take into account differences between the reasons that people use Twitter vs. other social networking sites that could alter gender use patterns.

    Finally, perhaps Twitter is still looking for an equilibrium, but perhaps it will never reach one. It is unique as social networking goes and may not be subject to the same 'behavioral' patterns. But you make a good point: it is still relatively new and perhaps it will just take time for contributions to become more widely distributed.

    Which brings me back to my original statement that I don't find this study, as it is presented in the article in question, to be useful.

  2. David Lockhart says:

    I don't know a whole lot about Twitter, but I know a thing or two about social network analysis.

    I think there's some potentially interesting stuff here. There are some major flaws that keep this from being something that you'll find in a peer-reviewed sociology journal (or at least it *should*). But I suspect several of the authors main claims have some truth to them.

    One thing that stuck out to me was that the median number of tweets in their sample was 1 and more than 25% had 0. This is important since the aim of the research seems to be the use of and interaction on Twitter itself, rather than using twitter to study real life friendships. I think it would have been more interesting to define criteria for "active users" and studied that population.

    I suspect that the similarity of twitter to Wikipedia may reflect professional use of both. Normal people are going to pale in comparison to someone who works for a PR agency handling the Wikipedia page or tweet feed of one or more companies. I think few other social network sites receive significant business use – maybe Facebook and MySpace but that's probably about it. I suspect it would be more enlightening to exclude organization accounts. I think the gender analyses probably implicitly did, but I suspect that the tweet volume analysis did not (which means that there were substantial differences between the populations contributing to the analyses reported without clarifying that fact, which to me is a pretty big no-no.)

    Re: Drew's questions

    1. I'm not an active Twitter user but I understand it to have a fair amount of spam. Some other social network sites do as well and others do not. Unlike most social network sites that I'm familiar with, following someone does *not* require confirmation or permission from the account being followed. In my opinion the 80% figure and its difference from other sites says more about this policy difference on the sites than anything about the users' understanding of them. In fact, it may even reflect on a specific lack of understanding: that anything they put on twitter is potentially viewable by anyone who goes looking, not just the friends who currently follow them.

    I agree with what K. Ross says on this point.

    It could also simply be a result of the fact that twitter has a minimal or non-existant profile. I think many people create accounts on these things on a lark, then get bogged down in answering bunches of profile questions and decide this isn't worth their time.

    2. I would be stunned if they had done something to address missing gender info. They almost certainly did a complete case analysis. The implicit assumptions of their analysis is different for different claims. Comparing the number of followers for men vs women assumes that gender is missing at random conditional on number of followers. In other words, it assumes that undeclared men (who hide their gender) have the same distribution of number of followers as declared men (who are publicly men), similarly declared and undeclared women also must have similar distribution of followers. It sounds like the authors interpretation of their data may not hold that assumption to be true, although there is also some suggestion that they believe these differences to be due to tweet content rather than declared gender.

    The claims about the mixing patterns (men more likely to follow men, women more likely to follow men, men more likely to be followed by men) require even more assumptions:
    -Declared men must follow the same distribution of declared men, undeclared men, declared women and undeclared women as do undeclared men. Similarly, the distributions for declared and undeclared women must allow match.
    -Men must follow declared men at the same rate that they follow undeclared men and declared women at the same rate they follow undeclared women. So must women.

    It is very unlikely that all these assumptions are true. Off the top of my head, I think these assumptions are testable. They are clearly disconfirmable. In fact, I'll go out on a limb and make a prediction that would disconfirm it: Undeclared accounts have more followers than declared accounts of either gender because a large proportion of them are organizations with massive numbers of followers.

    Krista Gile and Mark Handcock have a forthcoming article in the Annals of Applied Statistics on the impact of missing data in network analysis. An earlier version of the paper is here, although it focuses on missingness of the links between individuals rather than on missingness of a characteristic of individuals in a perfectly measured network:

    http://www.csss.washington.edu/Papers/wp75.pdf

    3. I dispute the claim that all mature networks follow a power law distribution. For one thing, it depends on exactly what relationship we are assessing. Consider this recent article from The Economist, which says that on Facebook although there is great variability in how many "friends" a user has, there is much less variability in how many people they actually interact with on the site:

    http://www.economist.com/science/displaystory.cfm

    I also suspect that different networks have different characteristics. I am not willing to take the authors' word that we see the same patterns in the networks of the career oriented site LinkedIn as on the sex partner site AdultFriendFinder. I found it curious that the authors didn't even say what other network sites they were comparing it to.

  3. Drew Conway says:

    You raise some great points Krisit, thank you for the response. I think the main thing that troubled me about the research, and something I did not mention in my questions, is that it seems that it was done by people who are not Twitter users; and therefore, were unaware of some of the technical and social norms that exist on the platform. Without this perspective they were not aware of the various issues that they should have controlled for.

  4. szarka says:

    See

    http://imstat.org/aoas/AOAS221.pdf?confirm=42d4d0

    for a more recent version of Giles' & Handcock's paper.

  5. Sid says:

    @David Lockhart

    >"I dispute the claim that all mature networks follow a power law distribution. For one thing, it depends on exactly what relationship we are assessing. "

    Yes i agree that it does depend on the type of relationships we are assessing. While other social networks like FB are about connecting to all your network and mapping them online, Twitter in my opinion is different. Twitter is inherently about short conversations(no pictures,albums or distractions).

    I think people on twitter would rather follow people they care about(or celebs) and not many others. Thus avoiding the unnecessary noise as in other networks. This makes Twitter more of a broadcast network. So i'm guessing its network would have a shorter diameter and less than 6 degrees of seperation.

    I would probably want to review the network structure of all common social networks for different purposes. We are most likely to see differences b/w microblogging services and others.