Skip to content
Archive of entries posted by

Measuring Beauty

I’ve come across a paper that was using “beauty” as one of the predictors. To measure beauty, the authors used Anaface.com I don’t trust metrics without trying them on a gold standard first. So, I tried how well Anaface does on something that the arts world considers as one of gold standards of beauty – […]

New New York data research organizations

In a single day, New York City obtained two data analysis/statistics/machine learning organizations: Microsoft Research New York City with John Langford (machine learning), Duncan Watts (networks), and Dave Pennock (algorithmic economics). eBay technology center focusing on data – led by Chris Dixon, the co-founder of the recommendation engine company Hunch, which has recently been acquired […]

Agreement Groups in US Senate and Dynamic Clustering

Adrien Friggeri has a lovely visualization of US Senators movement between clusters: You have to click the image and play with it to appreciate it. The methodology isn’t yet published – but I can see how this could be very illuminating. The dynamic clustering aspect hasn’t been researched much – one of the notable pieces […]

Factual – a new place to find data

Factual collects data on a variety of topics, organizes them, and allows easy access. If you ever wanted to do a histogram of calorie content in Starbucks coffees or plot warnings with a live feed of earthquake data – your life should be a bit simpler now. Also see DataMarket, InfoChimps, and a few older […]

Rare name analysis and wealth convergence

Steve Hsu summarizes the research of economic historian Greg Clark and Neil Cummins: Using rare surnames we track the socio-economic status of descendants of a sample of English rich and poor in 1800, until 2011. We measure social status through wealth, education, occupation, and age at death. Our method allows unbiased estimates of mobility rates. […]

Statistical Murder

Robert Zubrin writes in “How Much Is an Astronaut’s Life Worth?” (Reason, Feb 2012): …policy analyst John D. Graham and his colleagues at the Harvard Center for Risk Analysis found in 1997 that the median cost for lifesaving expenditures and regulations by the U.S. government in the health care, residential, transportation, and occupational areas ranges […]

Beautiful Line Charts

I stumbled across a chart that’s in my opinion the best way to express a comparison of quantities through time: It compares the new PC companies, such as Apple, to traditional PC companies like IBM and Compaq, but on the same scale. If you’d like to see how iPads and other novelties compare, see here. […]

Data mining efforts for Obama’s campaign

From CNN: In July, KDNuggets.com, an online newsite focused on data mining and analytics software, ran an unusual listing in its jobs section: “We are looking for Predictive Modeling/Data Mining Scientists and Analysts, at both the senior and junior level, to join our department through November 2012 at our Chicago Headquarters,” read the ad. “We […]

DBQQ rounding for labeling charts and communicating tolerances

This is a mini research note, not deserving of a paper, but perhaps useful to others. It reinvents what has already appeared on this blog. Let’s say we have a line chart with numbers between 152.134 and 210.823, with the mean of 183.463. How should we label the chart with about 3 tics? Perhaps 152.132, […]

Luck or knowledge?

Joan Ginther has won the Texas lottery four times. First, she won $5.4 million, then a decade later, she won $2million, then two years later $3million and in the summer of 2010, she hit a $10million jackpot. The odds of this has been calculated at one in eighteen septillion and luck like this could only […]