Algorithmic bias Recommender systems

Bias characterization, assessment, and mitigation in location-based recommender systems

Location-based recommender systems (LBRSs) provide suggestion for Points of Interest (POIs) in Location-based social networks. However, we can characterize different forms of bias, associated with polarized interactions of the users with the PoIs. Post-processing and hybrid mitigation approaches can help alleviate the impact of those biases.

In a study, published in the Data Mining and Knowledge Discovery journal (Springer) and conducted with Pablo Sánchez and Alejandro Bellogín, we focus on four forms of polarization, namely venue popularity, category popularity, venue exposure, and geographical distance. We characterize them on different families of recommendation algorithms when using a realistic (temporal-aware) offline evaluation methodology while assessing their existence. Besides, we propose two automatic approaches to mitigate those biases.

Polarization characterization

Given the peculiarities of the POI recommendation problem with respect to the traditional recommendation, it is important to control which forms of polarization occur in this domain. Here, we explain how to measure different forms of polarization towards popular venues and categories, regarding the venue exposure, and with respect to the geographical distance between the user and the recommended venues.

Here, we report the definitions of the forms of polarization we consider and remind the reader to the paper for their mathematical formulation.

Definition 1 (Venue Popularity Polarization) The polarization of a recommendation model rec towards popular venues is the probability that a more popular venue is ranked higher than a less popular one, when considering the top-n items recommended to a user.

Definition 2 (Category Popularity Polarization) The polarization of a recommendation model rec towards popular categories is the likelihood of recommending venues belonging to categories associated with the highest number of user interactions.

Definition 3 (Venue Exposure Polarization) The polarization of a recommendation model rec in terms of exposure is the likelihood of the model to suggest a venue proportionally to the number of times the users will consider that venue in the future.

Definition 4 (Geographical Distance Polarization) The polarization of a recommendation model rec towards geographical distance is the likelihood of the model to suggest a venue that is close to/far from the current position of the user.

Polarization assessment

We performed experiments on the Foursquare global check-in dataset. Specifically, we selected the check-ins from the cities of Tokyo, New York, and London from this dataset.

In order to analyze and characterize the biases that may exist in the Foursquare dataset, we considered several state-of-the-art algorithms, including Non-Personalized (Rnd, Pop), Collaborative-filtering (UB, IB, HKV), Temporal/Sequential (TD, MC, FPMC, Fossil), geographical (KDE), and Point-of-Interest (FMFMGM, GeoBPR, IRenMF, PGN) models.

Impact on accuracy metrics. One of the most accurate models is the Pop recommender in all cities, even though in Tokyo the TD model and in London the GeoBPR and PGN models obtain a slightly better value than Pop. This could be due to several causes, including (i) the high sparsity found in the datasets, (ii) the test set that only contains new interactions (and hence popular venues are safe recommendations), and (iii) the temporal evaluation methodology, as there could be users in the test set that do not appear in the training subset (for whose, again, popular venues can be very useful recommendations).

With respect to the POI algorithms, we observe that, in terms of accuracy, their performance is very similar to other classical approaches, like the UB or the BPR. This may be due to the high number of both hyper-parameters and parameters that these models have, making it sometimes difficult to find a good configuration of hyper-parameters that obtains a decent performance.

Measuring recommendation polarization. Most of the recommenders suffer from a great popularity bias, evidencing the difficulty of finding good representatives for all metrics. Therefore, among all the experimented recommenders, we consider IB and PGN to be of particular interest, since even though they do not perform as well in terms of accuracy as Pop, they obtain competitive results in terms of other metrics like novelty, diversity, and item exposure; this is a direct consequence of suffering less from the popularity bias.

We observe that the popularity of a category is not always associated with the number of POIs that share that category; more specifically, category 7 (Travel & Transport) concentrates the largest number of check-ins in the city of Tokyo, while category 3 (Food) is the second most popular category; however, since this category covers a large number of different venues, those recommenders with a strong item popularity bias (such as Pop) recommend almost no POIs from this category, since its corresponding items are not globally popular.

Polarization mitigation

We aim to combine several algorithms to create models that obtain decent levels of accuracy while overcoming the analyzed polarization measurements: popularity, exposure, and geographical distance. In order to do so, we propose two different but complementary approaches to mitigate the aforementioned biases:

  • We create hybrid recommenders by combining several models; we apply simple models based on weighting differently each of the combined recommendation algorithms;
  • We use a re-ranker approach in the xQuAD framework, where we combine the results of two recommenders. Specifically, the second one is used to re-rank the results from the first one.

When analyzing these results, we notice some interesting outcomes:

  • In New York, we observe that the best recommender in terms of accuracy is still the pure Pop model, however, when using the hybrid IB with a weight of 0.5 we reduce the popularity bias while improving almost in half the exposure values.
  • Regarding the geographical polarization, we observe that in the case of New York we are able to reduce this bias when using a weight of 0.8 with the IB approach in the hybrid model or when using the re-ranker. However, the reduction of the bias in these metrics is still far from the values reported in the Skyline.
  • The results for the Tokyo dataset, shown in Table 6, confirm a very interesting case where the best algorithm in terms of accuracy outperforms the best recommender. Here, the best performing configuration is the PGN with the IB re-ranker. Although this is a promising result, we observe that in this case, the re-ranker is obtaining lower values in terms of novelty and diversity while suffering from a larger popularity bias (but lower category bias).
  • However, all these examples confirm that it is possible to find configurations where better results than the original recommenders are obtained, either in terms of accuracy while keeping similar polarization values, or reduced polarization measurements while keeping comparable accuracies.