This is a hybrid model for creating a personalized list of movie recommendations. The hybrid model builds a person's movie taste-profile and generates lists of similar movies to develop the recommendations. These are created using collaborative filtering and content filtering. What makes this recommendation system unique is the use of its third model: a graph filter, implemented in neo4j. The graph filter eliminates movies from being recommended if they don't fit a network criterion.
Data to create this model was sourced from the Kaggle's The Movies Dataset. This dataset contains information about 45,000 movies and 25,000,000+ movie ratings.
Three different models to construct the hybrid are:
- Content Filter
- Overview: A Content Filter determines movies that are similar to each other. For example, There Will Be Blood is similar to No Country for Old Men because they have similar plot themes, release years, and budgets. If someone likes There Will be Blood, No Country for Old Men is recommended by the Content Filter. In contrast, a movie like Girls Just Wanna Have Fun would not be recommended in the same context.
- Implementation: Scikit-Learn Python API
- Code: Modeling/content_filtering.py
- Collaborative Filter
- Overview: A collaborative filter is used to create an estimated personal-rating for a movie based on other people who have seen the movie. I used a collaborative filter based on Singular Value Decomposition. In other words, the personal-rating is determined by creating an average of people who have the most similar taste to you, who have already seen the movie to recommend.
- Implementation: Surprise Python API
- Code: Modeling/collaborative_filtering.py
- Graph Filter
- Overview: I utilized a graph and created a network of all 45,000 films connected by the 500,000+ actors in the database. Before I create a recommendation, I chose a user's top three rated films. Then, once I have a set of recommendations from the Collaborative Filter and Content Filter (CF_recs1, CF_recs2), I utilize the Graph FIlter:
- check to verify that CF_recs1 is connected on the graph to the top-3 films by a degree of two.
- check to verify that CF_recs2 is connected on the graph to the top-3 films by a degree of two.
- Films that are connected on the graph are recommended
- Implementation: Python neo4j API, and neo4j Cypher GraphQL
- Code: Modeling/graphing.py
- Overview: I utilized a graph and created a network of all 45,000 films connected by the 500,000+ actors in the database. Before I create a recommendation, I chose a user's top three rated films. Then, once I have a set of recommendations from the Collaborative Filter and Content Filter (CF_recs1, CF_recs2), I utilize the Graph FIlter:
To see a more in-depth explanation of the features of this hybrid move recommender, you can watch the presentation I delivered. Skip to the 11:45 mark.
Model Architecture Diagrams:
