5.6. Dealing with Multiple DataFrames¶
Forget about budget or runtimes as criteria for selecting a movie, let’s take a
look at popular opinion. Our dataset has two relevant columns: vote_average
and vote_count
.
Let’s create a variable called df_high_rated
that only contains movies that
have received more than 20 votes, and whose average score is greater than 8.
Here we have some high-quality movies, at least according to some people.
But what about my opinion?
Here are my favorite movies and their relative scores. Create a DataFrame called
compare_votes
that contains the title as an index and both the
vote_average
and my_vote
as its columns. Also, only keep the movies that
are both my favorites and popular favorites.
Hint: You’ll need to create two Series, one for my ratings and one that maps titles to vote_average.
There should be only 6 movies remaining.
Now add a column to compare_votes
that measures the percentage difference
between the popular rating and my rating for each movie. You’ll need to take the
difference between the vote_average
and my_vote
and divide it by
my_vote
.
compare_votes
Q-3: Make up 3 questions you would like to answer about this movie data using the techniques you have learned in this lesson and write them in the box.
Q-4: Summarize the answers to your questions here.
Lesson Feedback
-
During this lesson I was primarily in my...
- 1. Comfort Zone
- 2. Learning Zone
- 3. Panic Zone
-
Completing this lesson took...
- 1. Very little time
- 2. A reasonable amount of time
- 3. More time than is reasonable
-
Based on my own interests and needs, the things taught in this lesson...
- 1. Don't seem worth learning
- 2. May be worth learning
- 3. Are definitely worth learning
-
For me to master the things taught in this lesson feels...
- 1. Definitely within reach
- 2. Within reach if I try my hardest
- 3. Out of reach no matter how hard I try