Counterfactual Graph Augmentation for Consumer Unfairness Mitigation in Recommender Systems

It is possible to effectively address consumer unfairness in recommender systems by using counterfactual explanations to augment the user-item interaction graph. This approach not only leads to fairer outcomes across different demographic groups but also maintains or improves the overall utility of the recommendations.

In a study with Francesco Fabbri, Gianni Fenu, Mirko Marras, and Giacomo Medda, published in the proceedings of ACM CIKM ’23, we introduce a method for mitigating consumer unfairness in recommender systems by employing counterfactual graph augmentation to adjust user-item interactions.

Methodology

We represent the interactions between users and items through a bipartite graph, composed of user and item nodes, connected by edges that signify interactions. Our goal is to make Graph Neural Networks (GNNs) produce recommendations that are altered to be fairer. We employ counterfactual reasoning techniques to generate an augmented version of the user-item interaction graph, ensuring the utility estimates across consumer groups are fair when the GNN uses the augmented graph.

We propose a novel augmentation mechanism for bipartite graphs to modify their topology and improve fairness.

Augmentation Mechanism. Our mechanism is designed to modify the topology of bipartite graphs. Unlike other methods that perturb the adjacency matrix, our approach augments the graph with a predefined set of edges based on a vector. This process involves generating an augmented adjacency matrix by adding edges if certain conditions are met.

Augmented Graph Generation. We extend the standard GNN implementation to include our augmentation mechanism. This augmented graph generation process leverages a parameter vector and retrieves a matrix with altered linking probabilities. The process iteratively augments the graph until a fairness requirement is met, resulting in a counterfactual explanation of the prior unfairness.

Loss Function Optimization. Our approach uses a two-term loss function, operationalized based on demographic groups. This function includes a fairness quantification term and a control term for the distance between the original and augmented graph. We balance these losses using a scaling factor, emphasizing the importance of the mitigation task.

Sampling Policies

Recognizing the vast possibilities of edges to be added, we implement various sampling policies to narrow down the selection, each focusing on different characteristics of user and item nodes.

Our approach in this study is rooted in the belief that leveraging counterfactual reasoning and graph augmentation can significantly mitigate consumer unfairness in recommender systems. This methodology not only ensures fairness but also maintains, and in some cases, enhances the recommendation utility.

BM (Base): This is the base algorithm, where no specific sampling is applied. It serves as a reference point to evaluate the impact of other, more targeted sampling policies.
ZN (Zero NDCG): This policy targets users who have no relevant items in their top-N recommendation lists. Essentially, it selects users for whom the system’s current recommendations are least effective or relevant.
LD (Low Degree): Under this policy, we focus on user nodes with the fewest interactions in the training set. This means selecting users based on the ‘degree’ of their activity, prioritizing those with fewer interactions.
SP (Sparse): Here, we consider the ‘density’ of a user’s interactions, defined as the average popularity of the items they have interacted with. This policy selects users who interact mostly with niche or less popular items, indicating a pattern of sparsity in their interaction history.
FR (Furthest): This approach involves selecting user nodes that are the furthest from the advantaged group within the graph’s topology. It’s based on the average lengths of the shortest paths between disadvantaged and advantaged user nodes, focusing on those who are most ‘distant’ in the graph structure.
IP (Item Preference): This policy is grounded in the preference patterns of the disadvantaged group. It selects items that are most preferred by the disadvantaged user group, aiming to add interactions that align with these users’ preferences.

Evaluation

Our analysis focused on two primary research questions (RQs):

RQ1: Do the edges selected by our method positively impact the recommendation fairness on the perturbation set?
RQ2: To what extent is unfairness mitigated by our method in comparison with state-of-the-art procedures?

We conducted our experiments using two public datasets: MovieLens 1M (ML-1M) and Last.FM 1K (LFM-1K). These datasets helped us examine the effectiveness of our method in different contexts. In setting up our experiments, we split the user interactions into training, validation, and test sets, using the validation set for both model selection and as the perturbation set for our method. We employed several GNN-based models (GCMC, LightGCN, NGCF) to address the top-N recommendation task, optimizing their hyper-parameters under a grid search strategy.

RQ1: Edges Augmentation Analysis

We evaluated the mitigation performance of various policies using the relative difference in Normalized Discounted Cumulative Gain (NDCG) before and after applying each policy. Here, we report the main insights coming from our analysis:

Effectiveness of Individual Policies. While individual policies like ZN (Zero NDCG) and IP (Item Preference) showed varied results, their combination (ZN+IP) was particularly effective. This policy added interactions for disadvantaged user nodes, significantly reducing the gap in NDCG between gender groups.
Policy Performance Variations. The effectiveness of policies varied across different models and demographic groups, highlighting the complexity and multifaceted nature of unfairness in recommender systems. For instance, ZN+IP was the only policy that consistently mitigated unfairness across gender groups on ML-1M for both GCMC and NGCF models.
Characteristics of Added Edges. The features of the nodes involved in the added edges under each policy provided insights into the causes of the original unfairness. For example, the ZN policy targets users with no relevant items in their top-N recommendation lists, and adding interactions for these users can significantly improve their recommendation utility.
Dataset Level Unfairness. Some policies systematically excelled more than others under the same settings, suggesting that unfairness might originate at the dataset level. Conversely, the variable performance of some policies across different models indicated that unfairness could also be model-dependent.

Counterfactual Graph Augmentation for Consumer Unfairness Mitigation in Recommender Systems

RQ2: Mitigation Procedures Comparison

We compared our method’s performance with various state-of-the-art algorithms to assess both the fairness and utility of recommendations. We used two key metrics to evaluate the performance: Normalized Discounted Cumulative Gain (NDCG) and utility disparity (ΔNDCG). NDCG measures the effectiveness of the recommendations, while ΔNDCG quantifies the fairness in terms of utility disparity between different user groups. Here, we report the main insights coming from our analysis:

Impact on Unfairness. Our findings revealed that our method was generally more effective in reducing unfairness compared to other algorithms. Specifically, we noted significant reductions in ΔNDCG, indicating successful mitigation of unfairness across different demographic groups and datasets.
Policies Effectiveness. The study highlighted the effectiveness of specific policies like ZN+IP (Zero NDCG + Item Preference), which reported the lowest ΔNDCG on the perturbation set, thereby demonstrating a substantial impact in mitigating unfairness.
Impact on Recommendation Utility. We observed that our approach not only mitigated unfairness effectively but also maintained or enhanced the overall utility of the recommendations. This was evidenced by our algorithm’s consistent performance in keeping NDCG values stable or improved, unlike some other methods that decreased utility in several settings.
Data and Model-Dependent Unfairness. The analysis showed that some policies systematically excelled more than others under the same settings, suggesting that unfairness might originate at the dataset level. However, the variable performance of some policies across different models indicated that unfairness could also be model-dependent.

Concluding remarks and future directions

Our approach has demonstrated greater reliability in mitigating consumer unfairness compared to state-of-the-art (SOTA) algorithms. This effectiveness is attributed to the augmentation of the user-item interaction graph using counterfactual explanations.
The method has shown not only to reduce unfairness but also to potentially increase the overall recommendation utility, making it a promising tool for developing fairer recommender systems.
A key observation was that confining the algorithm to the disadvantaged user nodes who received no relevant items in their recommendation list led to a positive effect on their utility.
We aim to explore new sampling policies and objective functions to further enhance our method’s effectiveness.
Our future work will also include applying our method to other models beyond GNNs, expanding its applicability and potential for mitigating unfairness in a wider range of recommender systems.