Here’s the problem with anonymized data: if it were truly anonymized, it wouldn’t be useful to anyone for anything. With enough data about a person–say, their age, gender, and zip code–it’s not hard to narrow down who someone is. That’s the idea behind a class-action lawsuit against Netflix regarding the customer data they released to the public as part of the Netflix Prize project, a contest to help create better movie recommendations. A closeted lesbian alleges that the data available about her could reveal her identity.
So it wasn’t surprising that just weeks after the contest began, two University of Texas researchers — Arvind Narayanan and Vitaly Shmatikov — identified several NetFlix users by comparing their “anonymous” reviews in the Netflix data to ones posted on the Internet Movie Database website. Revelations included identifying their political leanings and sexual orientation.
The complaint calls that the Brokeback Mountain factor, arguing that marketers will suck up the data, combine it with other data sets and start pigeon-holing people into marketing categories, based on assumptions about the movies they rated.
You could just not post movie reviews online. Still, if you’re a Netflix customer, are you comfortable with potentially identifying data floating out there? It seems that there are larger and scarier privacy fish to fry in our everyday lives.
The suit asks for at least $2,500 in damages for two million Netflix customers, or five billion dollars.