Skip to content
Archive of entries posted by

New Multiple Imputation R Package “mi” (beta release)

We recently uploaded on to CRAN multiple imputation package “mi” which we have been developing. The aim of package mi is to make multiple imputation transparent and easy to use for the user. Hence there are few characteristics that we believe are valuable. 1. Graphical diagnostics of imputation models and convergence of the imputation process. […]

Venn Diagram Challenge Summary 1.5

Few people have pointed us to some more of the Venn Diagram Challenge diagrams in response to the Venn Diagram Challenge Summary 1:

Venn Diagram Challenge Summary 1

The Venn Diagram Challenge which started with this entry has spurred exciting discussions at Junk Charts,, and at Perceptual edge. So I thought I will do my best to put them together in one piece.
Outcomes people created can be divided into 2 classes, first group dealt with the problem of expressing the “3-way Venn diagram of percentage with different base frequency”. Second group went a little deeper to figure out the better way to express what the paper is trying to express in a graphical way. Our ultimate goal is the second one, however, first problem is it’s self a interesting challenge and thus I will deal with them separately. ( Second group will be dealt with in the Venn Diagram Challenge Summary 2 which should come shortly after this article. )

Venn diagram converted into a table:
(For background you can look at the previous posts original entry, on Antony Unwin’s Mosaic chart, and Stack Lee’s bar chart.)

2D space of presidential election candidates, polarization and wedging

Masanao: Aleks and I did a PCA analysis on 2008 Presidential Election Candidates on the Issues data and plotted the 2 principal components scores against each other and got this nice result:
The horizontal axis is the 1st primary component score; it represents the degree to which a candidate supports Iraq War and Homeland Security (Guantanamo), and opposes Iraq War Withdrawal, Universal Healthcare, and Abortion Rights. The vertical axis is the 2nd primary component score which represents the degree to which a candidate supports Iraq War Withdrawal and Energy & Oil (ANWR Drilling), and opposes Death Penalty, Iran Sanctions, and Iran Military Action as Option.

The first principal component is the dividing axis for the Democrats and the Republicans. When we reorder the loadings according to the 1st component we get the following:
So for the first principal component, Republicans generally support red variables and the Democrats the blue colored variables. Ron Paul appears to be the only candidate that does not deviate much from the middle.

The second principal component is a little more difficult to interpret. Here most of the candidates are clustered around the middle except for candidate Ron Paul who supports Iraq withdrawal, Energy & Oil (ANWR Drilling), Immigration (Border Fence) but does not support other issues.
Here are the loadings ordered by the 2nd component:

Aleks: With the exception of Paul, there is a lot of polarization on the first component. To some extent the polarization is a consequence of the data expressing candidates’ opinions in terms of binary supports/opposes. When a candidate did not express an opinion, we have assumed that the opinion is unknown (so we use imputation), in contrast to a candidate refusing to take an opinion on an issue. When it comes to the issue of polarization: Delia Baldassarri and Andrew have suggested, that it’s the parties that are creating polarization, not the general public.

In fact, I think polarization is a runaway consequence of political wedging: in the spirit of Caesar’s divide et impera, one party wants to insert a particular issue to split the opposing party. This gives rise to the endless debates on rights of homosexuals, biblical literalism, gun toting, weed smoking, stem cells and abortion rights: these debates are counter-productive (especially at federal level), but the real federal-level problems of special interest influence, level of interventionism, economy, health care get glossed over. It just saddens me that the candidates are classified primarily by a bunch of wedge issues. A politician needs a wedge issue just as much as a soldier needs a new gun: it’s good for him, but once both sides come up with guns, the soldier loses. In the end, it’s better for all politicians to get rid of wedge issues every now and then by refusing to take a stance on a wedge issue. In summary, it would be refreshing if the candidates jointly decided not to take positions on these runaway wedge issues on which people will continue to disagree on, and delegate them to the state level, while focusing on the important stuff.

Masanao: Although the candidates’ opinions in the spreadsheet are probably not their final ones, it’s interesting to see the current political environment. If there was similar data on of the general public, it would be interesting to overlay them on top of each other to see who is more representative of the public.

Details of methodology:

Hans Rosling 2007

We had this entry almost a year ago. This year Hans Rosling gives yet another talk titled “New insights on poverty and life around the world”. The talk is great, and the ending is quite shocking..

In a follow-up to his now-legendary TED2006 presentation, Hans Rosling demonstrates how developing countries are pulling themselves out of poverty. He shows us the next generation of his Trendalyzer software — which analyzes and displays data in amazingly accessible ways, allowing people to see patterns previously hidden behind mountains of stats. (Ten days later, he announced a deal with Google to acquire the software.) He also demos Dollar Street, a program that lets you peer in the windows of typical families worldwide living at different income levels. Be sure to watch straight through to the (literally) jaw-dropping finale.

Overview of Missing Data Methods

We came across a interesting paper on missing data by Nicholas J. Horton and Ken P. Kleinman. The paper is about comparison of Statistical Methods and related Software to Fit Incomplete Data Regression Models.


Here is the abstract:

Missing data are a recurring problem that can cause bias or lead to inefficient analyses. Statistical methods to address missingness have been actively pursued in recent years, including imputation, likelihood, and weighting approaches. Each approach is more complicated when there are many patterns of missing values, or when both categorical and continuous random variables are involved. Implementations of routines to incorporate observations with incomplete variables in regression models are now widely available. We review these routines in the context of a motivating example from a large health services research dataset. While there are still limitations to the current implementations, and additional efforts are required of the analyst, it is feasible to incorporate partially observed values, and these methods should be used in practice.

Interaction in information software

I found this interesting article on information software and interaction.
Here is the abstract:

The ubiquity of frustrating, unhelpful software interfaces has motivated decades of research into “Human-Computer Interaction.” In this paper, I suggest that the long-standing focus on “interaction” may be misguided. For a majority subset of software, called “information software,” I argue that interactivity is actually a curse for users and a crutch for designers, and users’ goals can be better satisfied through other means.

Information software design can be seen as the design of context-sensitive information graphics. I demonstrate the crucial role of information graphic design, and present three approaches to context-sensitivity, of which interactivity is the last resort. After discussing the cultural changes necessary for these design ideas to take root, I address their implementation. I outline a tool which may allow designers to create data-dependent graphics with no engineering assistance, and also outline a platform which may allow an unprecedented level of implicit context-sharing between independent programs. I conclude by asserting that the principles of information software design will become critical as technology improves.

Color of Flags

I’m not a pie chart person. But here is an example where I don’t mind the use (I found it here):

Using a list of countries generated by The World Factbook database, flags of countries fetched from Wikipedia (as of 26th May 2007) are analysed by a custom made python script to calculate the proportions of colours on each of them. That is then translated on to a piechart using another python script. The proportions of colours on all unique flags are used to finally generate a piechart of proportions of colours for all the flags combined. (note: Colours making up less than 1% may not appear)

It’s pretty, it’s something about proportion, it’s not trying to show clear numeric result, data-to-ink/pixel ratio is not a problem in this case, yet there’s some information that you will have hard time seeing from table. (Such as Tunisia has slightly more white then Turkey.)

Election & Public Opinion by PIIM

Here is interactive visualization of Election & Public Opinion by PIIM. It’s an interactive display of Red / Blue state. Election data goes all the way back to 1789, the first presidential election. This application will familiarize you with the voting process of the United States. Explore how public opinion and “creative democracy” has such […]

Mirror, mirror on the wall..

In Snow White it was the magical mirror that answered the question “who’s the fairest of them all?” Now Australian researchers have created software to answer this question. They extracted 13 features and used C4.5 as classification method (more features below). (Detail can be found in: Assessing facial beauty through proportion analysis by image processing and supervised learning)

With that in hand, it may be natural to wonder who’s the most beautiful of them all? Shocking answer may be found in the research done at the universities of Regensburg and Rostock in Germany, where they did a large research project on ‘facial attractiveness’.

A remarkable result of our research project is that faces which have been rated as highly attractive do not exist in reality. This became particularly obvious when test subjects (independently of their sex!) favored women with facial shapes of about 14 year old girls. There is no such woman existing in reality! They are artificial products – results of modern computer technology.

Thus, sad as it may be, your ideal beauty may not be in this world. So going back to the good old Snow White, if the magical mirror were asked the question today, it may answer; “You’re the fairest where you are, but in the virtual world, well let’s not go into that..”

Can Music Tell Your Personality?

I read this entry on study of correlation between music and personality . A series of 6 studies investigated lay beliefs about music, the structure underlying music preferences, and the links between music preferences and personality. The data indicated that people consider music an important aspect of their lives and listening to music an activity […]

1200+ examples of information visualization at PIIM

A friend of mine introduced this site to me. It’s a database for information graphics that Parsons Institute for Information Mapping (PIIM) is building. They are accepting submissions so if you have interesting graphical display, take a shot to be in the “most comprehensive, manually annotated (and taxonomically classified) information graphics database in the world” […]