Researches Claim To Reverse Netflix's Anonymization

Researchers from Department of Computer Sciences at the University of Texas at Austin say they can reverse Netflix’s anonymous data (which was released in to the public as part of a contest to see if someone could design a better rating system) by comparing it to only a few ratings on IMDb. The result? Specific users can be identified and linked to their (ostensibly) private ratings.

Releasing the data and just removing the names does nothing for privacy,” Shmatikov told SecurityFocus. “If you know their name and a few records, then you can identify that person in the other (private) database.”

While Netflix’s dataset did not include names, instead using an anonymous identifier for each user, the collection of movie ratings — combined with a public database of ratings — is enough to identify the people, the researchers argued in a paper published soon after Netflix released the data, but which only recently came to light. Narayanan and Shmatikov demonstrated the danger by using public reviews published by a “few dozen” people in the Internet Movie Database (IMDb) to identify movie ratings of two of the users in Netflix’s data.

Exposing movie ratings that the reviewer thought were private could expose significant details about the person. For example, the researchers found that one of the people had strong — ostensibly private — opinions about some liberal and gay-themed films and also had ratings for some religious films.

More generally, the research demonstrated that information that a person believes to be benign could be used to identify them in other private databases.

Scary, scary, scary, scary, scary.

From the research paper:

Does privacy of Netflix ratings matter? The privacy question is not “Does the average Netflix subscriber
care about the privacy of his movie viewing history?,” but “Are there any Netflix subscribers whose privacy
can be compromised by analyzing the Netflix Prize dataset?” The answer to the latter question is, undoubtedly,
yes. As shown by our experiments with cross-correlating non-anonymous records from the Internet Movie Database with anonymized Netflix records (see below), it is possible to learn sensitive non-public information about a person’s political or even sexual preferences. We assert that even if the vast majority of Netflix subscribers did not care about the privacy of their movie ratings (which is not obvious by any means), our analysis would still indicate serious privacy issues with the Netflix Prize dataset.

Researchers reverse Netflix anonymization [SecurityFocus] (Thanks, Scott!)
How To Break Anonymity Of Netflix Prize Dataset [ARXIV]
(Photo:Maulleigh)

Comments

Edit Your Comment

  1. Adam Rock says:

    Sounds like the AOL search records situation again, albeit not as horrid.

  2. SexCpotatoes says:

    Well, some of the ranked movies could reveal certain proclivities one might have. And some movies are so bad that the people who would review them as 5-stars should be hunted down and prevented from breeding.

  3. UpsetPanda says:

    I suppose the obvious thing is that if you don’t rate movies on IMDB or Netflix or one and not the other, you’re safe?

  4. Roadkill says:

    It’s not like the AOL searches because this involves multiple databases.

    The reality of today’s world is that you’ll end up on multiple databases at once. Partial information on multiple databases can lead to a full suite of information if properly pieced together.

    The only solution I know if is anonymization (such as k-anonymity). Netflix *claimed* to anonymize their data but it’s becoming evident now that all they did was remove some entries from each user.

  5. smitty1123 says:

    Their rating/recommendation system is crap anyway. I say I really like “Attack the Gas Station!” and they recommend Ong-Bak. So, because I like a contemporary Korean comedy, I should like a Thai period action flic? Hell, I mark that I hate Saw, then mark that I love Hostel and they recommend Saw 2. Crap, crap, crap…

  6. Imaginary_Friend says:

    Additionally, Amazon own imdb, so they could really go to town building a very interesting dossier about their customers with this information.

  7. Mr. Gunn says:

    This is why OpenID and microID matter. Store your own data on your own computer/server, then you won’t be floating around in dozens of external databases.

    I know, we’re years out from being able to do this.

  8. SexCpotatoes says:

    Time for the “do not track” list, like the ‘do not call’ list only it’s for the internet and nobody is allowed to track or data-mine you ever. (yeah right!)

  9. Michael Belisle says:

    I like the comparison to grocery shopping. In case you’re wondering, my last shopping cart contained a gallon of 2% H-E-B milk, a package of 6 H-E-B grade AA eggs, strawberries (imported from Mexico), and a box of Kellog’s Raisin Bran.

  10. EncephelanetRepairHelperGuy says:

    @Imaginary_Friend: Hi, have we met?

  11. Adam Rock says:

    I heard that the marketing databases that exist for supermarkets, for example, can know exactly what people buy within a 2-3 house radius. This way they stock what you need and price accordingly.

  12. Mary says:

    Yeah, Netflix is the only place I rate movies because it’s the only place where rating movies benefits me. And I really honestly don’t care what people think about my movie ratings.

    I run a blog where I rant and rave about what I think about the movies I’ve seen. It’s pretty much all out there.

    I guess I can see how some people would be upset, but did Netflix really promise to keep the movie ratings secret?

  13. LeJerk says:

    Unless it’s porn or Nazi propaganda, who cares?

  14. overbysara says:

    wait.. is this really that big a deal? the facebook beacon was a privacy intrusion. but someone finding out I really liked High Fidelity…? I dunno…

  15. Imaginary_Friend says:

    I’m of the opinion that it’s all a big deal unless these companies obtain explicit permission from the users to release that data.

    @EncephelanetRepairHelperGuy: I don’t think so, but “Hi!” right back atcha :)

  16. czarandy says:

    If you rate movies on IMDB/Netflix I’m not sure you have a reasonable expectation of privacy (for the ratings themselves).

  17. Imaginary_Friend says:

    Privacy policies exist for a reason. Netflix is in direct violation of their own privacy policy (and possibly certain state privacy policies) by pulling this stunt.

    “Disclosure of Personal Information -
    Except as otherwise disclosed to you, we will not sell, rent or disclose your personal information to third parties without notifying you of our intent to share the personal information in advance and giving you an opportunity to prevent your personal information from being shared.”

    • fantomesq says:

      Its not hardly as clear cut as you make it sound – based on this policy:
      1) Had they previously disclosed that they would make anonymized data available to contractors to improve their service? Disclosure is remarkably easy – notice on the website, maybe on a clickthru?

      2) Does a court consider the anonymized data to still be ‘personal information’?
      3) Did they notify you of the release and you missed it (see #1) – constructive notification would likely be sufficient.
      4) Did they provide a means of opting out and you failed to do so?

      and this is a big one….
      5) How do you prove conclusively that the data they released was in fact yours and not somebody elses?

      Lets add in:
      6) What specific injury have you suffered?
      7) What effect do the Non Disclosure Agreements that the contractors likely agreed to play into this?

      Its a very uphill case.

  18. Joe_Bagadonuts says:

    Uh oh, I hope no one has found out I rated “Gigli” 5-stars…

  19. vanilla-fro says:

    @Joe_Bagadonuts: I did, and I’m right outside your door.

    Really people, does it matter if people know what movie you like? chances are you would tell people anyway, unless of course it’s “How to be a Skinhead” or “I Love Big Jugged Grandmas 7: The Golden Years” or “I’m Gay as Can be, But Don’t Tell My Pastor”

  20. Imaginary_Friend says:

    I think some of you are missing the point. It doesn’t matter how seemingly benign the data is; it DOES matter that these companies are promising their customers anonymity and control over who sees their data and then flagrantly violating their own privacy policies. Not cool. And possibly sue-worthy.

    Just because a user rates movies at netflix, that doesn’t mean they gave netflix permission to fling their data across cyberspace for some stupid contest. How do we know that someone can’t piece together enough info to commit identity theft? If the anonymous data was that easy to crack, that doesn’t bode well for the rest of the Netflix’s security practices.

  21. vanilla-fro says:

    @Imaginary_Friend: Perhaps agreeing to rate said movie changes things as far as privacy is concerned, kind of like peeing in an alley and getting mad that someone looks at your junk. You don’t HAVE to rate them.

  22. Andrew says:

    @LeJerk: Or Nazi Porn.

  23. Andrew says:

    @vanilla-fro: Or “How to be a Skinhead who loves Big Jugged Grandmas while being Gay as Can be without Telling My Pastor 7: The Golden Years”

    I have the directors cut of that one.

    PS, Best movie ever: Terror in Year Zero.