7.2. Finding the CheatersΒΆ

In this lesson you are going to do a simplified version of the analysis outlined in the The Tennis Racket article. I have prepared an anonymized data file for you that contains a numeric identifier instead of a name, along with the starting odds and the ending odds of a number of tennis matches. Your goal is to identify the cheaters. You can get the anonymous from puzzle_anon.csv.

    Q-1: Check the ids of all of the cheaters here.

  • 7002589994262270000
  • Cheater number 1
  • 2416068425895370000
  • Cheater number 2
  • 1547483661413490000
  • Cheater number 3
  • 6228119144908420000
  • Cheater number 4
  • 1718561694846000000
  • Cheater number 5
  • 4643766977283540000
  • You accuse an innocent person
  • 1693568023468290000
  • You accuse an innocent person
  • All are cheaters
  • No, not everyone is a cheater
  • All are honest
  • No, not everyone is honest

Now that you have identified the cheaters can you match the cheaters with their real names. Here is a dataset that contains their names puzzle_real.csv

    Q-2: Match the numeric identifiers from the first part of the project with the real names. Please keep going with your analysis.
  • bob
  • 7002589994262270000
  • jane
  • 2416068425895370000
  • john
  • 1547483661413490000
  • sally
  • 6228119144908420000
  • sue
  • 1718561694846000000
  • don
  • 4643766977283540000
  • hill
  • 1693568023468290000

Q-3: What are three main points that you take away from the Tennis Racket article?

Q-4: What are three main points that you take away from the unmasking article? What ethical considerations are important to you when considering de-anonymizing some other data set?

Lesson Feedback

Next Section - 8. Text Analysis with UN General Debates