Equality of Learning Opportunity via Individual Fairness in Personalized Recommendations

The formalization of the learning opportunities that should be offered by the recommendation of online courses can lead to defining what fairness means for a platform. A post-processing approach that balances personalization and equality of recommended opportunities can lead to effective and fair recommendations.

In a study published by the International Journal of Artificial Intelligence in Education (Springer), and conducted with Mirko Marras, Guilherme Ramos, and Gianni Fenu, we envision a scenario wherein the educational platform should guarantee that a set of learning principles are met for all the learners, to a certain degree, when generating recommendations according to the learner’s interests. Under this scenario, we characterize the recommendations proposed to learners in a real-world online course platform as a function of seven principles derived from knowledge and curriculum literature.

The results of our study motivated us to devise a novel post-processing approach that balances equality and personalization in recommendations. Specifically, the core assumption is that the notion of equality might be enhanced by trying to balance out desirable properties of course recommendations and that this objective can be achieved by re-ranking the courses originally suggested (and optimized for personalization) by the recommender system, such that the lists recommended to learners meet desirable principles of course recommendations equally across learners.

Modeling recommended learning opportunity through principles

Capturing, formalizing, and operationalizing notions of equality can shape our understanding of the extent to which the educational offerings available to learners provide them with equal opportunities and how recommender systems influence the normal course of educational business. To this end, defining the variables to be equalized constitutes a natural pre-requisite. We consider the following seven principles, whose mathematical formulation is provided in the paper.

Definition 1 (Familiarity) Familiarity is defined as whether the learner is familiar with the recommended content, as measured by whether the relative frequency of the course categories in a recommended set is proportional to that in the courses the learner took.
Definition 2 (Validity) Validity is defined as whether the course is likely to be up-to-date and not obsolete, as measured by when content was last updated. A subject is assumed to be more valid if it has been newly updated.
Definition 3 (Learnability) Learnability is defined as whether the recommended courses present an opportunity that is coherent with the learners’ ability, with the learnability measured as whether the set of courses varies in terms of instructional level.
Definition 4 (Variety) Variety is defined as whether the recommendation takes into account that learners are different and learn in different ways based on their interests and ability, as measured by the degree to which the recommended courses present a mix of different asset types.
Definition 5 (Quality) Quality is defined as the perceived appreciation of the recommended resources by the learners, as measured by the ratings that the learners assign to resources after interacting with them.
Definition 6 (Manageability) Manageability is defined as whether the online classes are large or small, as measured by the number of learners enrolled in the recommended courses, with small classes considered more manageable.
Definition 7 (Affordability) Affordability is defined as the cost of accessing the recommended opportunities, as measured by the enrolment fees of the suggested courses, with less expensive courses having higher affordability value.

Equality of Recommended Learning Opportunity

To formalize the equality of recommended learning opportunities, we first need to define how much the list recommended to each learner meets the principles targeted by the educational platform. In this paper, we propose to operationalize the concept of consistency across principles as the similarity between (i) the degree to which all principles are met into the recommended list and (ii) the degree of importance for the principles targeted by the educational platform. The higher the similarity, the higher the extent to which the principles are met. We resorted to the operationalization of this metric locally on each ranked list so that it will be possible to optimize such a metric on a pre-computed recommended list through a post-processing function.

To assess the extent to which the principles’ goals targeted by the educational platform are met (are consistent between each other), we compare the vectors measuring the targeted degree of each principle by the platform and degree achieved in the recommended list, measuring the distance between the two. We define the notion of Consistency between (i) target principles and (ii) the extent to which the principles are achieved in recommendations, by the complement of the Manhattan distance, a symmetric and bounded distance measure. The higher the distance is, the lower the consistency score for the target principles is.

Exploratory analysis

To illustrate the trade-off between learners’ interests and the considered principles and further emphasize the value of our analytical modeling, we characterize the learn- ing opportunities proposed by ten algorithms to learners of a real-world educational dataset as a function of the proposed principles. To the best of our knowledge, COCO is the widest educational dataset with all the attributes required to model the proposed principles and with enough data to assess performance significantly.

We considered ten methods and investigated the recommendations they generated. Two of them are baseline recommenders, and the other eight are state-of-the-art algorithms. These algorithms are:

Non-Personalized: Random and TopPopular.
Neighbor-based: UserKNN and ItemKNN.
Matrix-Factorized: GMF, NeuMF.
Graph: P3-Alpha and RP3-Beta.
Content: ItemKNN-CB.
Hybrid: CoupledCF.

Real-world observations

We characterize how the proposed principles were met in the lists of courses suggested by the algorithms considered. Here’s a summary of the main outcomes:

Recommenders that embed content metadata ensure higher equality across learners. When the recommender uses only user-item interactions, the equality is reduced. This holds regardless of the algorithm’s subfamily.
Recommenders with high consistency lead to higher equality of recommended learning opportunities. This property is stronger for neural collaborative, content-based, and hybrid recommenders.
Quality, validity, and manageability are guaranteed to a high extent by different recommenders, regardless of the family. Familiarity, affordability, learnability, and variety experience low absolute values and substantial deviations over algorithms, independently of the algorithm’s subfamily.
Familiarity, learnability, and affordability are the most influencing principles on the overall consistency across principles. This effect is stronger for content-based and hybrid recommenders.
Learners who interacted with courses aligned with the principles are likely to receive recommendations that meet those principles. Similar learners in terms of consistency in the courses they took are likely to receive a similar treatment in terms of future consistency.

Optimizing for equality of learning opportunities

With the observations made so far, we conjecture that re-ranking each list of recommendations to maximize the considered principles will lead to higher consistency, and, consequently, to higher equality.

For each learner, our goal is to determine an optimal set k courses to be recommended to that learner, so that the principles pursued by the platform are met while preserving accuracy (i.e., the extent to which the recommended items are among those included in the test set for that learner, meaning that the recommender system predicts well the future interests of the learner). To this end, we capitalize on a maximum marginal relevance approach.

For each position of the ranking, for each course, we compute the weighted sum between (i) the relevance of that course for the learner and (ii) the consistency the recommended list to the learner would achieve if we include that course in the list of recommendations.

Evaluation scenario and experimental results

We run experiments to assess (i) the influence of our procedure and the weight-based strategy on accuracy, consistency, and equality, and (ii) the relation between a loss in accuracy and a gain in consistency and equality while applying our procedure. To this end, we envisioned three approaches of principle weight assignment:

Glob assigns the same weight to all the principles, for all users. This method would not account for the level of consistency the recommended list to a given user already achieved and will treat all the principles equally.
User assigns, to a principle, a weight proportional to the consistency gap for that principle concerning the target of the platform, computed during the exploratory analysis. The consistency gap for a principle has been obtained by averaging the individual consistency gaps across users.
Pers, given a user, assigns the weight for a principle by considering only their (individual) consistency gap for that principle. Thus, different weights are used along with the user population.

Here are the main outcomes emerging from our evaluation:

The considered weight assignment strategies do not differ in terms of accuracy loss. However, User and Pers lead to consistency and equality values higher than Glob, at the same λ (which expresses the trade-off between accuracy and learning opportunity consistency). This property means that higher equality of recommended learning opportunities can be achieved by considering the consistency gaps experienced by the individual learner for each principle.
Controlling learning opportunity results in higher consistency for all principles, except for Quality and Massiveness that maintain stable consistency scores. Quality may decrease in some cases with collaborative filtering.

Conclusions

Based on the results, we can conclude that:

Recommendation algorithms tend to produce ranked lists with low equality of recommended learning opportunities across learners, especially when the algorithm uses only user-item interactions as training data.
Under our definition of the targeted principles, equality of quality, validity, and manageability are guaranteed by recommenders. Familiarity, affordability, learnability, and variety exhibit strong deviations over algorithms.
Optimizing recommendations for consistency concerning a set of principles leads to higher equality of recommended learning opportunities. This effect is remarkable when learner-specific weights are adopted.
Controlling learning opportunity results in higher familiarity, variety, and affordability while maintaining stable values for the other principles. However, quality may experience small losses after applying our procedure.
The impact of our approach on accuracy and consistency depends on the density of the relevance score distribution of the original recommendation algorithm. The higher the density, the higher the drop in accuracy is.