Movie Recommendation !!TOP!!
There are several ways to approach recommendation problems, such as recommending a list of movies or recommending a list of related products, but in this case you will predict what rating (1-5) a user will give to a particular movie and recommend that movie if it's higher than a defined threshold (the higher the rating, the higher the likelihood of a user liking a particular movie).
Movie recommendation
The recommendation ratings data is split into Train and Test datasets. The Train data is used to fit your model. The Test data is used to make predictions with your trained model and evaluate model performance. It's common to have an 80/20 split with Train and Test data.
In this case, you should eliminate the timestamp column as a Feature because the timestamp does not really affect how a user rates a given movie and thus would not contribute to making a more accurate prediction:
MovieRating specifies an input data class. The LoadColumn attribute specifies which columns (by column index) in the dataset should be loaded. The userId and movieId columns are your Features (the inputs you will give the model to predict the Label), and the rating column is the Label that you will predict (the output of the model).
Since userId and movieId represent users and movie titles, not real values, you use the MapValueToKey() method to transform each userId and each movieId into a numeric key type Feature column (a format accepted by recommendation algorithms) and add them as new dataset columns:
The MatrixFactorizationTrainer is your recommendation training algorithm. Matrix Factorization is a common approach to recommendation when you have data on how users have rated products in the past, which is the case for the datasets in this tutorial. There are other recommendation algorithms for when you have different data available (see the Other recommendation algorithms section below to learn more).
While this is a good start, in reality you might want to add other attributes or Features (for example, age, gender, geo-location, etc.) if they are included in the dataset. Adding more relevant Features can help improve the performance of your recommendation model.
One common problem in collaborative filtering is the cold start problem, which is when you have a new user with no previous data to draw inferences from. This problem is often solved by asking new users to create a profile and, for instance, rate movies they have seen in the past. While this method puts some burden on the user, it provides some starting data for new users with no rating history.
Skate Kitchen (2018) is a coming of age movie about Camille (Rachelle Vinberg) who finds her place in a group of girls who skateboard in New York City. We follow Camille as she explores dating, her relationship with her mom and finding herself at the ripe age of eighteen. The backdrop of skateboarding culture in NYC sets apart Skate Kitchen from other films about growing up. Unlike most skateboarding movies, the film highlights female skateboarders rather than males. Instead, the focus is on the nuances of female friendships rather than pigeonholing teen girls as only romantic leads.
In this context, video recommender systems play an important role in helping users of online streaming services, as well as of social networks, cope with this rapidly increasing volume of videos and provide them with personalized experiences. Nevertheless, the growing availability of digital videos has not been fully accompanied by comfort in their accessibility via video recommender systems. The causes of this problem are twofold: (i) the type of recommendation models in service today, which are heavily dependent on usage data (in particular, implicit or explicit preference feedback) and/or metadata (e.g., genre and cast associated with the videos) (cf. Sect. 1.1), and (ii) the nature of video data, which are information intensive when compared to other media types, such as music or images (cf. Sect. 1.2). In the following article, we analyze each of these dimensions. Throughout this paper, we will use a number of abbreviations, which, for convenience are summarized in Table 1.
To date, collaborative filtering (CF) methods (Koren and Bell 2015) lie at the core of most real-word movie recommendation engines, due to their state-of-the-art accuracy (McFee et al. 2012; Yuan et al. 2016). In most video-streaming services, however, new movies and TV series are continuously added. CF models are not capable of providing meaningful recommendations when items in the catalogue contain few interactions, a problem commonly known as the cold start (CS) problem. The most severe case of CS is when new items are added that lack any interactions, technically known as the new item CS problem.Footnote 3 In such a situation, CF models are completely unable to make predictions. As such, these new items are not recommended, go unnoticed by a large part of the user community, and remain unrated, creating a vicious circle in which a set of items in the RS is left out of the vote/recommendation process (Bobadilla et al. 2012). Being able to provide high-quality recommendations for cold items has several advantages. Firstly, it will increase the novelty of the recommendations, which is a highly desirable property and inherent in the user-centric and business-centric goals of RS, i.e., the discovery of new content and the increase of revenues (Aggarwal 2016b; Liu et al. 2014). Secondly, providing good new movie recommendations will allow enough interactions/feedbacks to be collected in a brief amount of time enabling effective CF recommendation. Despite previous efforts, the new item CS problem remains far from being solved in the general case, and most existing approaches suffer from it (Bobadilla et al. 2012; Zhou et al. 2011; Zhang et al. 2011).
Many approaches have been proposed to address the new item CS issue, mainly based on hybrid CF and CBF models (Lika et al. 2014; Cella et al. 2017; Sharma et al. 2015; Ferrari Dacrema et al. 2018). Most recent work relies on machine learning to combine content and collaborative data. We focus on feature weighting rather than on other types of hybrids (e.g., joint matrix factorization) because we aim to build a hybridization strategy that can be easily applied to a CBF model. For instance, the authors in Gantner et al. (2010) proposed a method to map item features into the item embeddings learned in a matrix factorization algorithm, while the authors in Schein et al. (2002) defined a probabilistic model trained via expectation minimization. Another example is Sharma et al. (2015), where the authors proposed a feature weighting model that learns feature weight by optimizing the ranking of the recommendations over the user interactions for warm items.
In this paper, we specifically address the above-mentioned shortcomings of purely metadata-based MRS by proposing a practical solution for the new item CS challenge that exploits the movie genome. We set out to answer the following research questions:
RQ1 Can the exploitation of movie genome describing rich item information as a whole, provide better recommendation quality compared with traditional approaches that use editorial metadata such as genre and cast in CS scenarios?
The remainder of this article is structured as follows. Section 2 positions our work in the context of the state of the art and highlights its novel contributions. Section 3 introduces the proposed general content-based recommendation framework. Sections 4 and 5 report on the experimental validation, namely the experimental setup and parameter tuning, offline experimentation, and a user study in a web survey, respectively. Section 6 concludes the article in the context of the research questions and discusses limitations and future perspectives.
One main contribution of this work is the introduction of a solution for the new item CS problem in the multimodal movie domain. In this section, we therefore review the existing, state-of-the-art approaches in content-based multimedia recommender systems (Sect. 2.1) and feature weighting for CS recommender systems (Sect. 2.2) and position our contribution (Sect. 2.3).
A multimedia recommendation system is a system that recommends a particular media type, such as audio, image, video, and/or text, to the users (Deldjoo et al. 2018e, f). We therefore organize the state-of-the-art CB-MMRS based on the target media type, namely: (i) audio recommendation, (ii) image recommendation, and (iii) video recommendation. In the following subsections, we describe each of these systems.
The most common example of audio recommendation is music recommendation (Schedl et al. 2018; Vall et al. 2019). Over the past several years, a wealth of approaches, including CF, CBF, context-aware recommenders, and hybrid methods, have been proposed to address this task. An overview of popular approaches can be found in Schedl et al. (2015, 2018). Perhaps more than in other MM domains, CB recommenders have attracted substantial interest from researchers in the music domain, not least due to their superior performance in CS scenarios.
Recent work has proposed deep learning-based CB approaches. For instance, the authors in van den Oord et al. (2013) use a deep convolutional neural network (CNN) trained on audio features, more precisely, on the log-scaled Mel spectrograms extracted from 3-second-snippets of the audio, resulting in a latent factor representation for each song. The authors evaluate their approach for tag prediction and music recommendation using the Million Song Dataset (Bertin-Mahieux et al. 2011). In tenfold cross-validation experiments using 50-dimensional latent factors, they show that the CNN outperforms both metric learning to rank and a multilayer perceptron trained on bag-of-words representations of vector-quantized Mel frequency cepstral coefficients (MFCC) (Logan 2000a) in both tasks.
In our approach, we follow a strategy in between these two extremes (i.e., fully automated feature learning by deep learning and pure manual expert annotations). The proposed movie genome uses well-established, state-of-the-art audio descriptors that are semantically more meaningful than deep learned features, but at the same time do not require a massive number of human annotators. 041b061a72