Skip to content

Privacy vs knowledge and the nature of insurance

Wired reports a great new opportunity to make money online by suing internet companies for revealing the data:

An in-the-closet lesbian mother is suing Netflix for privacy invasion, alleging the movie rental company made it possible for her to be outed when it disclosed insufficiently anonymous information about nearly half-a-million customers as part of its $1 million contest to improve its recommendation system.

I’m not sure whether the litigators have read this particular section of the Netflix prize rules:

To prevent certain inferences being drawn about the Netflix customer base, some of the rating data for some customers in the training and qualifying sets have been deliberately perturbed in one or more of the following ways: deleting ratings; inserting alternative ratings and dates; and modifying rating dates.

So yes, you can match a set of reviews with someone else, but how will you know that it’s really a person and not a random coincidence? 0.5 million review traces give plenty of opportunity for a false positive match. Netflix learned from AOL’s data release disaster, which resulted in a few people getting fired.

But this theme is important. Many internet companies provide free services in return for the ability to employ user data for profit. Andrew Parker looked at which companies make profit out of user data. Usually, the data is never given away, but just used to make other people’s lives easier. Let’s say that you bookmark a particular page – others won’t see that you’ve done it, but they will see that there are people that find that page worthy of saving – therefore it can be listed higher up in search results.

A more problematic area is medicine. Wired reports that there is a market out there for medical records, and that anonymity protection isn’t very secure.

Keeping medical data public would allow massive advances in medicine. For example, the Personal Genomes project seeks to analyze a number of volunteers in a lot of detail (see, for example, Steven Pinker’s medical record). If a few million people did that, we’d know so much more about disease, risks, factors affecting it, effectiveness of drugs, diet, the effects of genome.

One-sided disclosure gets many people worried – their insurance rates might go up, they might not get a job. It would help if everyone was doing that: nobody feels well being naked when others wear swimsuits.

But we should also ask ourselves as a society – what is insurance? Is insurance a protection against uncontrollable risk or is it an instrument of equality? Is genome our destiny or an uncontrollable risk?

Previous posts on this topic: EU data protection guidelines, Privacy vs Transparency.

One Comment

  1. aram harrow says:

    To publish summary statistics without compromising individual privacy, the best known approach is to use differential privacy.