In a single day, New York City obtained two data analysis/statistics/machine learning organizations: Microsoft Research New York City with John Langford (machine learning), Duncan Watts (networks), and Dave Pennock (algorithmic economics). eBay technology center focusing on data – led by Chris Dixon, the co-founder of the recommendation engine company Hunch, which has recently been acquired [...]
Adrien Friggeri has a lovely visualization of US Senators movement between clusters: You have to click the image and play with it to appreciate it. The methodology isn’t yet published – but I can see how this could be very illuminating. The dynamic clustering aspect hasn’t been researched much – one of the notable pieces [...]
Factual collects data on a variety of topics, organizes them, and allows easy access. If you ever wanted to do a histogram of calorie content in Starbucks coffees or plot warnings with a live feed of earthquake data – your life should be a bit simpler now. Also see DataMarket, InfoChimps, and a few older [...]
Steve Hsu summarizes the research of economic historian Greg Clark and Neil Cummins: Using rare surnames we track the socio-economic status of descendants of a sample of English rich and poor in 1800, until 2011. We measure social status through wealth, education, occupation, and age at death. Our method allows unbiased estimates of mobility rates. [...]
Robert Zubrin writes in “How Much Is an Astronaut’s Life Worth?” (Reason, Feb 2012): …policy analyst John D. Graham and his colleagues at the Harvard Center for Risk Analysis found in 1997 that the median cost for lifesaving expenditures and regulations by the U.S. government in the health care, residential, transportation, and occupational areas ranges [...]
I stumbled across a chart that’s in my opinion the best way to express a comparison of quantities through time: It compares the new PC companies, such as Apple, to traditional PC companies like IBM and Compaq, but on the same scale. If you’d like to see how iPads and other novelties compare, see here. [...]
From CNN: In July, KDNuggets.com, an online newsite focused on data mining and analytics software, ran an unusual listing in its jobs section: “We are looking for Predictive Modeling/Data Mining Scientists and Analysts, at both the senior and junior level, to join our department through November 2012 at our Chicago Headquarters,” read the ad. “We [...]
This is a mini research note, not deserving of a paper, but perhaps useful to others. It reinvents what has already appeared on this blog. Let’s say we have a line chart with numbers between 152.134 and 210.823, with the mean of 183.463. How should we label the chart with about 3 tics? Perhaps 152.132, [...]
Joan Ginther has won the Texas lottery four times. First, she won $5.4 million, then a decade later, she won $2million, then two years later $3million and in the summer of 2010, she hit a $10million jackpot. The odds of this has been calculated at one in eighteen septillion and luck like this could only [...]
This is Many Bills, a visualization of US bills by IBM: I learned about it a few days ago from Irene Ros at Foo Camp. It definitely looks better than my own analysis of US Senate bills.
I always thought predicting traffic for a particular day and time would be something easily predicted from historic data with regression. Google Maps now has this feature: It would be good to actually include season, holiday and similar information: the predictions would be better. I wonder if one can find this data easily, or if [...]
With all this data floating around, there are some interesting analyses one can do. I came across “The Association of Tree Pollen Concentration Peaks and Allergy Medication Sales in New York City: 2003-2008″ by Perry Sheffield. There they correlate pollen counts with anti-allergy medicine sales – and indeed find that two days after high pollen [...]
WeatherSpark: prediction and observation quantiles, historic data, multiple predictors, zoomable, draggable, colorful, wonderful: Via Jure Cuhalev.
At GetTheData, you can ask and answer data related questions. Here’s a preview: I’m not sure a Q&A site is the best way to do this. My pipe dream is to create a taxonomy of variables and instances, and collect spreadsheets annotated this way. Imagine doing a search of type: “give me datasets, where an [...]
Andrew has pointed to Jonathan Livengood’s analysis of the correlation between poverty and PISA results, whereby schools with poorer students get poorer test results. I’d have written a comment, but then I couldn’t have inserted a chart. Andrew points out that a causal analysis is needed. This reminds me of an intervention that has been [...]
In the spirit of Gapminder, Washington Post created an interactive scatterplot viewer that’s using alpha channel to tell apart overlapping fat dots better than sorting-by-circle-size Gapminder is using: Good news: the rate of fattening of the USA appears to be slowing down. Maybe because of high gas prices? But what’s happening with Oceania?
Emanuel Derman and Paul Wilmott wonder how to get their fellow modelers to give up their fantasy of perfection. In a Business Week article they proposed, not entirely in jest, a model makers’ Hippocratic Oath: I will remember that I didn’t make the world and that it doesn’t satisfy my equations. Though I will use [...]
The R language is definitely going mainstream:
I Paid a Bribe by Janaagraha, a Bangalore based not-for-profit, harnesses the collective energy of citizens and asks them to report on the nature, number, pattern, types, location, frequency and values of corruption activities. These reports would be used to argue for improving governance systems and procedures, tightening law enforcement and regulation and thereby reduce [...]
Word count stats from the Google books database prove that Bayesianism is expanding faster than the universe. A n-gram is a tuple of n words.
Sciencedaily has posted an article titled Apes Unwilling to Gamble When Odds Are Uncertain: The apes readily distinguished between the different probabilities of winning: they gambled a lot when there was a 100 percent chance, less when there was a 50 percent chance, and only rarely when there was no chance In some trials, however, [...]
From Discover: Razib Khan asks: But follow the gradient from El Paso to the Illinois-Missouri border. The differences are small across state lines, but the consistent differences along the borders really don’t make. Are there state-level policies or regulations causing this? Or, are there state-level differences in measurement? This weird pattern shows up in other [...]
Posted at MediaBistro: The Harvard Sports Analysis Collective are the group that tackles problems such as “Who wrote this column: Bill Simmons, Rick Reilly, or Kevin Whitlock?” and “Should a football team give up free touchdowns?” It’s all fun and games, until the students land jobs with major teams. According to the Harvard Crimson, sophomore [...]
Visual Economics shows statistics on average food consumption in America: My brief feedback is that water is confounded with these results. They should have subtracted water content from the weight of all dietary items, as it inflates the proportion of milk, vegetable and fruit items that contain more water. They did that for soda (which [...]
Journalism in the age of data is a video report including interviews with many visualization people. It’s also a great example of how citations, and further information appear alongside with the video – showing us the future of video content online.