Fair performance-based user recommendation in eCoaching systems

When ranking sportspeople so that a coach can assist those who are in need, users of different genders might be affected by disparate exposure, meaning that the users in the minority group are systematically ranked in lower positions. A re-ranking can help mitigate disparities, without affecting recommendation quality.

In a study, published by the User Modeling and User-Adapted Interaction journal (Springer) and conducted with Salvatore Carta, Walid Iguider, Fabrizio Mulas, and Paolo Pilloni, we consider an eCoaching platform for runners. Our goal is to provide a coach with a ranked list of users, according to the support they need. Moreover, we want to guarantee a fair exposure in the ranking, to make sure that users of different groups have equal opportunities to get supported. We provide measures of fairness that allow us to assess the exposure of users of different groups in the ranking and propose a re-ranking algorithm to guarantee a fair exposure.

After the athlete chooses a coach and specifies their objectives and current physical skills, the coach receives the athlete’s data and creates a tailored workout plan and sends it to the athlete’s app. (See points 1 and 2 in the figure)

When the athlete receives the workout plan, the virtual personal trainer functionality of the mobile app guides them to correctly complete the workout, and the mobile app records training data. (See points 3 and 4 in the figure)

At the end of the workout, the coach receives training statistics and remotely monitors the athlete’s performance, modifies the workout (if needed), and motivates them by means of the internal messaging system. (See point 5 in the figure).

Fair user recommendation

The user recommendation process is divided into two main steps:

Performance-based ranking: we rank the users based on the performance in the last workout, contextualized to their recent behavior.
Fair re-ranking: we assess how fair is the ranking algorithm in terms of exposure of the users and provide a re-ranking algorithm for the cases in which users of a given gender are affected by disparate exposure.

You can refer to our paper for the details of our approach and for our formulations of disparate exposure.

Experimental results

While our paper contains the detailed results, here we report a summary of the main outcomes:

Classifiers comparison. The ratings that the coaches use to assess workout quality are in an ordinal scale and, conceptually, an ordinal classifier would better suit this task. However, the multi-class classifiers outperform the ordinal ones. Hence, we conjecture that coaches might have a more schematic way of evaluating workouts, better captured by multi-class approaches.
Ablation study. Regardless of the users’ characteristics and how a workout is composed, the workout quality depends above all on how much the runners stick to their workout objectives and how much effort they are putting in during workouts. Apart from being adherent to the goals set by the coach, the period of the year when the workouts are planned can also influence the performance of runners; we conjecture that this last phenomenon means that good weather positively influences workout quality.
Ranking under fairness constraints. The discriminated gender when assessing fairness coincides with the gender of the minority group. This phenomenon aligns our work with what is usually observed in the fairness literature, where the demographic group representing the minority in the training data is the discriminated one.