Algorithmic fairness Recommender systems

MOReGIn: Multi-Objective Recommendation at the Global and Individual Levels

It is possible to provide effective recommendations while simultaneously optimizing beyond-accuracy perspectives for the individual users (e.g., genre calibration) and, globally, for the entire system (e.g., provider fairness).

In a study, with Elizabeth Gómez, David Contreras, and Maria Salamó, published in the proceedings of ECIR 2024, we present a model designed to meet both global and individual goals in recommendation systems.

Multi-Objective Recommender Systems (MORSs) are designed to fulfill multiple, often conflicting, goals beyond just accuracy. For instance, a MORS can focus on global-level objectives, benefiting the system as a whole, or individual-level objectives, tailoring recommendations to each user’s unique needs. Existing MORSs typically focus on either global or individual objectives, but not both simultaneously. Global objectives refer to goals met for the system as a whole, like provider fairness, whereas individual objectives involve tailoring recommendations to individual users’ needs, like genre calibration. When global and individual objectives coexist, MORSs find it difficult to meet both types of goals effectively. This gap in the existing systems led to the development of MOReGIn.

The MOReGIn Approach

The MOReGIn algorithm is designed to adjust recommendations based on the continent of the providers and the demographic group representation, alongside individual user preferences for specific genres. It operates in four main steps:

  1. Steps 1 and 2: These initial steps involve computing two critical metrics – Rc​ (representation of each demographic group) and Pug (propensity of each user to rate items of a given genre). This computation is based on the data in the training set.
  2. Step 3: This step involves processing items predicted as relevant for a user by the recommender system and creating a ‘bucket list’, named joinBucket. This list considers each continent-genre pair and stores the predicted items accordingly. Each item in a bucket is associated with its genre(s) and continent(s), with each bucket having attributes Rc and Pug.
  3. Step 4: The final step involves a three-phase re-ranking based on the bucket lists generated in the previous step. The algorithm starts with Phase 1, where items are selected starting from the least represented continents and moving towards the most represented. The selection of items is based on specific conditions: the proportion of items for a continent in the recommendation list should not exceed its representation (Rc), the proportion of items of a given genre should not exceed the user’s genre propensity (Pug), and the number of recommended items so far should be lower than a predefined top-k value. Phase 2 relaxes one of these conditions (specifically condition 2 regarding genre proportion), and Phase 3 further relaxes the conditions, focusing on selecting items with the greatest relevance to the user to complete the top-k recommendations.

Evaluation

We applied our approach to the output of well-known Collaborative Filtering algorithms, namely, ItemKNN, UserKNN, BPRMF, SVDpp, and NeuMF. The results obtained by MOReGIn were compared with those of the original recommendation algorithm (denoted as OR) and against two baselines: a greedy calibration algorithm (CL) and a provider fairness algorithm (PF).

The Elliot framework was used to generate recommendations, and the dataset was split into 80% for training and 20% for testing, considering a temporal split of the data. For each user, the top-1000 recommendations were generated to be re-ranked through the MOReGIn algorithm.

Here are the main outcomes emerging from our results:

  • Mitigation of Disparities. MOReGIn was assessed for its effectiveness in mitigating disparities in the Movie and Song domains regarding provider fairness and calibration. The results indicated that MOReGIn performed better in reducing disparities compared to the baseline approaches. In terms of miscalibration, MOReGIn consistently achieved lower values, indicating better calibration in recommendations.
  • Recommendation Accuracy. The study evaluated the accuracy of different approaches using the NDCG metric. MOReGIn generally achieved better NDCG values compared to the PF model and was on par or better than the OR models in most cases. The results suggested that while there is a need for fairer and calibrated recommendations, which might slightly impact recommendation quality, MOReGIn compensates for this with more unbiased recommendations.

Conclusion

In our study, we have introduced and evaluated MOReGIn, a novel approach that effectively navigates the dual objectives of individual genre calibration and global provider fairness in recommendation systems. Our findings indicate that by innovatively bucketing recommended items by both genre and geographic origin, MOReGIn notably excels in delivering calibrated and provider-fair recommendations, surpassing traditional methods in both fairness and precision. As we move forward, our research opens new avenues for refining recommendation list generation techniques and broadening the scope of fairness to include consumer perspectives.