Reinforcement recommendation reasoning through knowledge graphs for explanation path quality

Knowledge Graph-based recommender systems naturally produce explainable recommendations, by showing the reasoning paths in the knowledge graph (KG) that were followed to select the recommended items. One can define metrics that assess the quality of the explanation paths in terms of recency, popularity, and diversity. Combining in- and post-processing approaches to optimize for both recommendation quality and reasoning path quality leads to effective recommendations and to increased reasoning path quality.

In a study, published in the Knowledge-Based Systems journal (Elsevier) and conducted with Giacomo Balloccu, Gianni Fenu, and Mirko Marras, we extend our SIGIR 2022 study. In that previous study, we defined the explanation quality metrics and optimized the recommendation lists according to them, via post-processing approaches. In this journal paper, we present a novel approach that combines in- and post-processing strategies.

Given that we have described the explanation quality metrics in a previous post, here, we will focus just on our new approach.

A preprint of this study is also available online.

Explanation property optimization

We optimize a recommendation model for reasoning path metrics. To this end, we propose two classes of approaches. The first class (in-processing) includes approaches that embed reasoning path quality properties in the internal model learning process. Whereas, the second class (post-processing) covers approaches that re-arrange the recommended lists (and the explanations) returned by the original recommendation model optimized only for recommendation quality.

In-processing optimization. Our goal is to generate product recommendations accompanied by reasoning paths, considering both recommendation quality and reasoning path quality. We propose to model the problem behind this task as a Markov Decision Process (MDP), which first generates candidate paths between users and products based on a certain similarity measure, and then performs a sampling among candidate paths. To solve this problem, we adopted a reinforcement learning (RL) strategy.

Post-processing optimization. Compared to an in-processing optimization, post-processing approaches perform a re-ranking of both products to recommend and their corresponding reasoning paths, to optimize certain reasoning path quality metrics. The input of such step are the recommended products and their selected reasoning paths, originally returned by any pre-trained model that solves the KGRE-Rec task. For the implementation, we capitalized on a maximum marginal relevance approach, with the reasoning path metric(s) as support metric(s).

Experimental evaluation

We conducted experiments on three data sets, namely MovieLens1M (ML1M), LastFM-1B (LASTFM), and Amazon-Cellphones (CELL). The considered baselines included factorization models (BPR, FM, NFM), three knowledge-aware models based on regularization (CKE, CFKG, KGAT), and one knowledge-aware model based on reasoning paths (PGPR). Our optimization approaches could be applied only to models based on reasoning paths (i.e., PGPR).

Our paper contains the detailed results. In this post, we summarize the main outcomes emerging from our results:

Optimizing for reasoning path quality through our approaches led to state-of-the-art NDCG. The measured NDCG was equal or at most two points lower than that of non-(path-)explainable baselines, on all data sets. It emerged that accounting for user-level reasoning path quality often does not lead to a loss (when observed, it is negligible) in recommendation utility.
Compared to PGPR, our in- and post-processing optimization approaches showed a substantially higher reasoning path quality based on the proposed metrics, on all data sets. Higher gains were observed for PTD than for the other properties. There were gains also on reasoning path metrics not directly optimized, highlighting a positive interdependence across metrics according to the domain.
The combination of both in- and post-processing not only generally reported the highest reasoning path quality scores (explanations linked to more recent user interactions, more popular shared entities, and more diverse reasoning path types) but also showed the highest diversity of linking interactions and shared entities as well as the lowest concentration of path types.