Researches Claim To Reverse Netflix's Anonymization
Researchers from Department of Computer Sciences at the University of Texas at Austin say they can reverse Netflix's anonymous data (which was released in to the public as part of a contest to see if someone could design a better rating system) by comparing it to only a few ratings on IMDb. The result? Specific users can be identified and linked to their (ostensibly) private ratings.
Releasing the data and just removing the names does nothing for privacy," Shmatikov told SecurityFocus. "If you know their name and a few records, then you can identify that person in the other (private) database."Scary, scary, scary, scary, scary.While Netflix's dataset did not include names, instead using an anonymous identifier for each user, the collection of movie ratings -- combined with a public database of ratings -- is enough to identify the people, the researchers argued in a paper published soon after Netflix released the data, but which only recently came to light. Narayanan and Shmatikov demonstrated the danger by using public reviews published by a "few dozen" people in the Internet Movie Database (IMDb) to identify movie ratings of two of the users in Netflix's data.
Exposing movie ratings that the reviewer thought were private could expose significant details about the person. For example, the researchers found that one of the people had strong -- ostensibly private -- opinions about some liberal and gay-themed films and also had ratings for some religious films.
More generally, the research demonstrated that information that a person believes to be benign could be used to identify them in other private databases.
From the research paper:
Does privacy of Netflix ratings matter? The privacy question is not "Does the average Netflix subscriber
care about the privacy of his movie viewing history?," but "Are there any Netflix subscribers whose privacy
can be compromised by analyzing the Netflix Prize dataset?" The answer to the latter question is, undoubtedly,
yes. As shown by our experiments with cross-correlating non-anonymous records from the Internet Movie Database with anonymized Netflix records (see below), it is possible to learn sensitive non-public information about a person's political or even sexual preferences. We assert that even if the vast majority of Netflix subscribers did not care about the privacy of their movie ratings (which is not obvious by any means), our analysis would still indicate serious privacy issues with the Netflix Prize dataset.
Researchers reverse Netflix anonymization [SecurityFocus] (Thanks, Scott!)
How To Break Anonymity Of Netflix Prize Dataset [ARXIV]
(Photo:Maulleigh)
This is a test using rich text formatting and html links. It's the generic "company" ad that should appear on all posts with the Company category if they don't have an ad attached to a specific company.
Post a comment
Comments:
It's not like the AOL searches because this involves multiple databases.
The reality of today's world is that you'll end up on multiple databases at once. Partial information on multiple databases can lead to a full suite of information if properly pieced together.
The only solution I know if is anonymization (such as k-anonymity). Netflix *claimed* to anonymize their data but it's becoming evident now that all they did was remove some entries from each user.
Their rating/recommendation system is crap anyway. I say I really like "Attack the Gas Station!" and they recommend Ong-Bak. So, because I like a contemporary Korean comedy, I should like a Thai period action flic? Hell, I mark that I hate Saw, then mark that I love Hostel and they recommend Saw 2. Crap, crap, crap...
Yeah, Netflix is the only place I rate movies because it's the only place where rating movies benefits me. And I really honestly don't care what people think about my movie ratings.
I run a blog where I rant and rave about what I think about the movies I've seen. It's pretty much all out there.
I guess I can see how some people would be upset, but did Netflix really promise to keep the movie ratings secret?
I'm of the opinion that it's all a big deal unless these companies obtain explicit permission from the users to release that data.
@EncephelanetRepairHelperGuy: I don't think so, but "Hi!" right back atcha :)
Privacy policies exist for a reason. Netflix is in direct violation of their own privacy policy (and possibly certain state privacy policies) by pulling this stunt.
"Disclosure of Personal Information -
Except as otherwise disclosed to you, we will not sell, rent or disclose your personal information to third parties without notifying you of our intent to share the personal information in advance and giving you an opportunity to prevent your personal information from being shared."
@Joe_Bagadonuts: I did, and I'm right outside your door.
Really people, does it matter if people know what movie you like? chances are you would tell people anyway, unless of course it's "How to be a Skinhead" or "I Love Big Jugged Grandmas 7: The Golden Years" or "I'm Gay as Can be, But Don't Tell My Pastor"
I think some of you are missing the point. It doesn't matter how seemingly benign the data is; it DOES matter that these companies are promising their customers anonymity and control over who sees their data and then flagrantly violating their own privacy policies. Not cool. And possibly sue-worthy.
Just because a user rates movies at netflix, that doesn't mean they gave netflix permission to fling their data across cyberspace for some stupid contest. How do we know that someone can't piece together enough info to commit identity theft? If the anonymous data was that easy to crack, that doesn't bode well for the rest of the Netflix's security practices.
@Imaginary_Friend: Perhaps agreeing to rate said movie changes things as far as privacy is concerned, kind of like peeing in an alley and getting mad that someone looks at your junk. You don't HAVE to rate them.
@vanilla-fro: Or "How to be a Skinhead who loves Big Jugged Grandmas while being Gay as Can be without Telling My Pastor 7: The Golden Years"
I have the directors cut of that one.
PS, Best movie ever: Terror in Year Zero.











Sounds like the AOL search records situation again, albeit not as horrid.