This research paper delves into the realm of recommendation systems, particularly focusing on movie recommendations, a pivotal component of modern streaming platforms with extensive film libraries. The paper highlights a significant limitation in existing approaches, which treat user inputs as uniform, despite the fact that individual users perceive movies differently, influenced by factors such as genre, story, director, and cast. To address this, the authors introduce two novel metrics: TextLike_score (TL_score) and GenreLike_score (GL_score). These scores play a critical role in their Cross-Attention-based Model, which outperforms the current state-of-the-art recommendation systems by considering these nuanced user preferences.
The research is supported by evaluations conducted on two diverse datasets: MovieLens-100K (ML-100K) and MFVCD-7K. Notably, the authors leverage multi-modal data, including audio, video, and textual information, to calculate the introduced scores. Their experimental results affirm that their Cross-Attention-based multi-modal recommendation system, incorporating the Meta_score, effectively addresses user preferences, making it a compelling solution for real-time movie recommendations.
The importance of understanding user preferences in the digital age, particularly for platforms like streaming services, is emphasised. With the proliferation of personal viewing devices and the rise of OTT platforms, the demand for tailored movie recommendations is ever-increasing. Traditional recommendation systems have focused on predicting user-movie ratings based on embeddings derived from text, audio, or video data, ignoring the intricate nuances of user preferences for genres, directors, cast, and storylines.