Algorithmic bias Ranking systems

Reputation (in)dependence in ranking systems: demographics influence over output disparities

Your reputation on the Web does not depend only on your behavior, but also on your sensitive attributes. Concretely, belonging to a minority demographic group affects your reputation and how your preferences are valued in online ranking systems.

In a recent SIGIR 2020 paper with Guilherme Ramos, we considered reputation-based ranking systems, which is a class of algorithms that rank the items based on the reputation of the users. Reputation scores are usually assigned based on the behavior of the users and how they interact with items. In this study, we observed how demographic attributes influence reputation scores.

Concretely, we split users into demographic groups, considering the age and gender attributes, and defined a disparate reputation score, which accounts for the difference between the average reputation scores assigned to two user groups. Considering the widely-known MovieLens-1M dataset, the figures on the right show that smaller demographic groups receive, on average, lower reputation scores.

Introducing reputation independence

We avoid users’ sensitive attributes to impact the ranking system systematically. To do so, we design a strategy that mitigates a given sensitive attribute bias in the user reputations for each group of users with different values of a sensitive attribute. Thus, leading the computation of reputation to be independent of the sensitive attribute (reputation independence).

Given a reputation-based ranking system that updates the items’ ranking as a weighted average of ratings with users’ reputations, we propose to harmonize the reputations inside each group of a specific attribute to achieve a similar reputation distribution among each group. Concretely, with our approach (whose technical details can be found in the paper), we allow the reputation’s distributions for each class of an attribute to follow a common probability distribution, ensuring that reputations of each class are “statistically indistinguishable.”

The figures on the left visually show the results of our mitigation approach. Notice that there is no difference in the average reputation scores, showing that our method can introduce reputation independence. Results of the Mann-Whitney tests (not reported in this blog post) confirm this result by not rejecting the null hypothesis that the reputations’ median difference in the reputation scores is 0.

Take-home messages

Reputation-based ranking systems try to rank items by ensuring the community preferences, as a whole, are reflected in the way that items are sorted. Therefore, it is vital to compute effective formulations of user reputation to weigh individual preferences.

In this work, we introduce a measure of disparate reputation to analyze if user reputation is affected by users’ sensitive attributes. To avoid this, we propose a novel approach that ensures reputation independence from sensitive user attributes. Experiments on real data, which considered different users’ demographic attributes, showed that disparate reputation occurs and that our mitigation can introduce reputation independence from sensitive attributes.