5.6. Dealing with Multiple DataFrames

Forget about budget or runtimes as criteria for selecting a movie, let’s take a look at popular opinion. Our dataset has two relevant columns: vote_average and vote_count.

Let’s create a variable called df_high_rated that only contains movies that have received more than 20 votes, and whose average score is greater than 8.

import pandas as pd
df = pd.read_csv("https://media.githubusercontent.com/media/RunestoneInteractive/httlads/master/Data/movies_metadata.csv").dropna(axis=1, how='all')

df_highly_voted = df[df.vote_count > 20]
df_high_rated = df_highly_voted[df_highly_voted.vote_average > 8]
df_high_rated[['title', 'vote_average', 'vote_count']].head()
title vote_average vote_count
46 Se7en 8.1 5915.0
49 The Usual Suspects 8.1 3334.0
109 Taxi Driver 8.1 2632.0
256 Star Wars 8.1 6778.0
289 Leon: The Professional 8.2 4293.0

Here we have some high-quality movies, at least according to some people.

But what about my opinion?

Here are my favorite movies and their relative scores. Create a DataFrame called compare_votes that contains the title as an index and both the vote_average and my_vote as its columns. Also, only keep the movies that are both my favorites and popular favorites.

Hint: You’ll need to create two Series, one for my ratings and one that maps titles to vote_average.

my_votes = {
    "Star Wars": 9,
    "Paris is Burning": 8,
    "Dead Poets Society": 7,
    "The Empire Strikes Back": 9.5,
    "The Shining": 8,
    "Return of the Jedi": 8,
    "1941": 8,
    "Forrest Gump": 7.5,
}

There should be only 6 movies remaining.

Now add a column to compare_votes that measures the percentage difference between the popular rating and my rating for each movie. You’ll need to take the difference between the vote_average and my_vote and divide it by my_vote.

compare_votes

Q-3: Make up 3 questions you would like to answer about this movie data using the techniques you have learned in this lesson and write them in the box.

Q-4: Summarize the answers to your questions here.

Lesson Feedback

    During this lesson I was primarily in my...
  • 1. Comfort Zone
  • 2. Learning Zone
  • 3. Panic Zone
    Completing this lesson took...
  • 1. Very little time
  • 2. A reasonable amount of time
  • 3. More time than is reasonable
    Based on my own interests and needs, the things taught in this lesson...
  • 1. Don't seem worth learning
  • 2. May be worth learning
  • 3. Are definitely worth learning
    For me to master the things taught in this lesson feels...
  • 1. Definitely within reach
  • 2. Within reach if I try my hardest
  • 3. Out of reach no matter how hard I try
You have attempted of activities on this page