If you leave your datasets sitting out on the counter, they get moldy

I received the following in the email:

I had a look at the dataset on speed dating you put online, and I found some big inconsistencies. Since a lot of people are using it, I hope this can help to fix them (or hopefully I did a mistake in interpreting the dataset).

Here are the problems I found.

1. Field dec is not consistent at all (boolean for a big chunk of the dataset, in the range 1-10 later). Should this be the field of the decision and dec_o be the decision of the partner? dec and match should be the same thing? I tried to used match instead of dec but then I get the following problem

2. I tried to see if matches are consistent (if my partner decided yes it should mean that in his record I see a match): if I look at the record with iid x and pid y, dec_o=1 should mean that in the record with iid y and pid x I should see a match (in match or dec). This is not in general true. So dec_o is not consinstent with the matches.

3. Same thing for like and attr_o (or attr and attr_o)

I sent this to Ray Fisman, the source of the data, who replied:

Saurabh Bhargava used the underlying files and has posted data in a replication file for a study in the Review of Economics and Statistics.

I’m glad somebody put those data in the freezer.

Leave a Reply

Your email address will not be published. Required fields are marked *