Algorithmic bias Education Explainability Recommender systems

Can Path-Based Explainable Recommendation Methods based on Knowledge Graphs Generalize for Personalized Education?

In personalized education platforms, explainable recommendation is often pursued by transferring knowledge-graph path reasoning methods from other domains, yet differences in educational data and evaluation practices can make these transfers misaligned and leave it unclear which methods remain reliable and why. Knowledge-graph reasoning can enable transparent, structure-aware personalization in this setting by producing recommendation paths that double as explanations under a consistent representation, as demonstrated in this study.

Course marketplaces and online learning platforms expose learners to large catalogs with heterogeneous content, prerequisites, and learning goals. In this setting, recommendation is not only about ranking relevant courses; it is also about supporting decisions that learners may experience as high-stakes, effortful, and sequential. Under this setting, explanations can make a suggestion actionable: a learner can judge whether a proposed course fits their background, whether it builds on prior learning, and whether it aligns with an intended trajectory.

Knowledge graphs are useful because they encode educational structure explicitly: courses relate to concepts, instructors, institutions, and prerequisite relations. When a recommender system can reason over these relations, it can potentially justify recommendations through concrete relational chains, rather than opaque similarity scores. The open question is whether the reasoning mechanisms that generate such chains in domains like movies or e-commerce actually transfer to education, where interaction patterns and decision rationales can differ substantially.

In a study, in cooperation with Neda Afreen, Giacomo Balloccu, Gianni Fenu, Francesca Maridina Malloci, Mirko Marras, and Andrea Giovanni Martis, and published in the Proceedings of ACM UMAP 2025, we introduce a unified investigation of how representative path-based explainable recommendation methods over knowledge graphs generalize to course recommendation.

Our approach

We study generalizability by treating education as the target domain and bringing multiple reasoning paradigms into a single, controlled comparison. Conceptually, the work is a benchmarking and analysis contribution: we align several educational datasets to a common knowledge-graph view, instantiate multiple families of path-based reasoning methods, and evaluate them under one protocol that captures ranking quality, beyond-accuracy properties, and explanation-relevant characteristics of the generated paths.

This framing lets us examine which paradigms are robust across datasets with different sparsity regimes, which ones trade accuracy for novelty or diversity, and which ones generate explanation paths that are plausible, varied, and grounded in learner history.

Methodology

A key methodological choice is to reduce avoidable sources of variation so that observed differences can be attributed to reasoning paradigms rather than to representation mismatches or evaluation artifacts.

First, we standardize the educational knowledge graph representation to control what “reasoning over a KG” means across datasets. The problem is that educational datasets differ in what they describe (providers, instructors, topics, resources), so the same reasoning algorithm may effectively receive different “worlds” to reason over. We introduce a shared ontology-driven abstraction, centered on learners, courses, and educationally meaningful entities and relations, so that reasoning methods operate over comparable structural primitives. This matters because path-based explanation quality is inseparable from what kinds of entities and relations can appear in a path.

Second, we impose a temporal view of learner interactions to respect educational sequentiality. Course enrollments and learning actions are naturally ordered, and explanations often appeal to what a learner did before. We therefore treat time as a conceptual constraint: model learning and evaluation are organized so that the system is assessed on predicting future interactions from past ones. This matters because path reasoning can otherwise exploit information that would not be available at recommendation time, producing explanations that look coherent offline but do not reflect realistic decision support.

Third, we use meta-path abstractions to make “explanation types” comparable across paradigms. Path-based methods can generate many chains, but without a shared notion of what kinds of chains are meaningful, it becomes hard to compare explanation diversity or interpretability across models. We introduce a set of learner-to-course path templates that represent recurring explanatory rationales (for instance, paths that connect a recommended course to subjects, institutions, or other educational entities). This matters because it provides a common lens for analyzing whether a model’s explanations rely on a narrow rationale versus covering multiple educationally relevant rationales.

Fourth, we evaluate along three conceptual axes (namely, utility, beyond-utility, and path-based explainability) to explore trade-offs. Accuracy alone cannot capture whether a system is expanding the catalog surface, promoting variety, or grounding explanations in recent learner activity. By jointly measuring these dimensions, we can identify cases where a method improves ranking but collapses novelty, or where it produces diverse explanations but weak retrieval performance. This matters for education because “what is best” depends on whether the platform prioritizes efficient matching, exploration, or transparent guidance.

Findings and insights

Across datasets, we observe that generative path reasoning (especially when constrained to remain faithful to the knowledge graph) tends to be the most robust option for ranking utility and for exposing a wider portion of the catalog. Conceptually, this suggests that modeling paths as structured sequences can align well with educational graphs when the generated chains remain grounded in valid relations. At the same time, the study highlights a critical failure mode: an unconstrained generative approach can produce paths that look syntactically plausible but do not correspond to actual graph connections, undermining explanation faithfulness even when ranking performance may appear competitive. This draws a clear line between “explanations as fluent text” and “explanations as verifiable relational evidence.”

Beyond-utility behavior is more nuanced and reveals a persistent tension. Traditional collaborative filtering baselines remain strong on dimensions associated with surprise and novelty, particularly under sparse data. The conceptual implication is that in educational settings with limited interaction density, simpler preference-based signals can still be effective at pushing learners beyond their immediate neighborhood, while path reasoning may become conservative, staying close to well-supported relational regions of the graph.

Within path-based explainability, different paradigms express different explanatory styles. Reinforcement-learning-based reasoning stands out in how strongly it links recommendations to recent learner actions and in how broadly it varies the entities used to justify suggestions. This points to a mechanism-level insight: exploration-oriented traversal procedures may naturally yield explanation paths that feel anchored in user history and support exploratory browsing. Neuro-symbolic reasoning, in contrast, tends to rely more on popular, high-connectivity entities, which can yield familiar explanations but may bias the explanatory narrative toward well-trodden parts of the knowledge graph.

Finally, the study suggests that sparsity is not just a performance nuisance but a regime-defining factor: as data becomes less sparse, the gap between paradigms narrows. This indicates that some “generalization failures” attributed to education may instead be regime mismatches, where a paradigm’s advantages require sufficient interaction evidence and graph connectivity to manifest.

Conclusions

This work contributes an evidence-based view of what transfers (and what breaks) when path-based explainable recommendation over knowledge graphs is applied to personalized education. By aligning multiple educational datasets to a common knowledge-graph representation and evaluating multiple reasoning paradigms under a shared protocol, we move the discussion from isolated claims toward a comparative understanding of trade-offs between ranking utility, catalog exposure, and explanation characteristics.

Several concrete research directions emerge. Improving educational knowledge graphs, both in completeness and in the alignment between relations and learning rationales, should directly affect the headroom of reasoning-based recommenders. Faithful generative reasoning appears promising, but it motivates stronger mechanisms for ensuring that generated explanation paths remain verifiable evidence rather than unconstrained sequences. Finally, offline explainability metrics are only a proxy for whether explanations help learners make better choices; integrating user-facing evaluations that test comprehension, trust, and decision quality would be a natural extension, especially in scenarios where prerequisites and learning trajectories are central to the recommendation rationale.