Accuracy and beyond-accuracy perspectives of controllable multi-objective recommender systems

In interactive recommendation settings, optimizing primarily for estimated relevance often leads to recommendation lists that over-emphasize familiar and popular items, which can reduce discovery and undermine longer-term value. Individual-level multi-objective control can enable recommendation lists that better reflect heterogeneous user goals, by translating explicit preference signals into objective trade-offs that the recommender is designed to respect.

Recommender systems are increasingly evaluated not only by whether users select items in the moment, but also by whether the system supports broader notions of quality, such as helping users discover novel items, encounter a wider range of content, or explore beyond their established profile. These “beyond-accuracy” perspectives matter because they shape what users are exposed to and, over time, what their preferences can become.

A practical obstacle is that beyond-accuracy objectives are not universally desirable in the same way for everyone. Some users appreciate novelty and exploration, while others prefer stable, popular, or tightly profile-aligned suggestions. This makes a purely system-wide trade-off hard to justify, and it motivates interfaces that let users express how they want recommendations to be shaped.

The resulting question is not only whether multi-objective recommenders can produce diverse outcomes, but whether they can do so in a way that is interpretable, controllable, and consistent with what users explicitly ask for.

In a study, in cooperation with Patrik Dokoupil and Ladislav Peška, and published by Information Processing and Management, we introduce an evaluation framework and empirical analysis for controllable, individual-level multi-objective recommender systems. The paper targets a specific gap: even when recommender systems expose “control knobs” for objectives like diversity or novelty, it is often unclear whether (i) users actually use these controls, (ii) the system meaningfully changes its outputs accordingly, and (iii) the resulting trade-offs improve anything beyond short-term selection.

Rather than treating multi-objective optimization as an offline ranking exercise, we position controllability and user-centric evaluation as first-class concerns, and we examine how different algorithmic choices affect both recommendation outcomes and the system’s faithfulness to user-stated preferences.

High-level solution overview

Our core idea is to treat user control as an explicit, measurable alignment target for multi-objective recommendation. Users specify their propensities toward multiple objectives (e.g., relevance, diversity, novelty, exploration), and the recommender is designed to produce lists whose objective “composition” reflects those propensities.

To study this, we compare a strong accuracy-oriented baseline with multiple variants of proportionality-preserving multi-objective recommenders. These variants differ along two conceptual dimensions: how they interpret the user’s propensity values (importance weights vs. target levels) and how they construct recommendation lists (simpler incremental strategies vs. more complex global optimization).

Our approach

Making “user control” operational via propensities

The first mechanism is a representation of user intent that is simple enough to elicit but rich enough to drive multi-objective trade-offs. We use per-objective propensity values as the user-facing abstraction: users can increase or decrease the desired emphasis on relevance and several beyond-accuracy criteria, and they can revise these settings across iterations as they observe the effect on recommendations.

In this way, controllability only becomes meaningful if the signal is (i) understandable to users and (ii) actionable for algorithms. Propensities serve as the bridge between an interface-level interaction and an algorithm-level optimization goal.

Aligning lists through results-level proportionality

The second mechanism is the notion of proportionality at the result level: instead of merely improving multiple objectives “as much as possible,” we aim for recommendation lists whose achieved objective values reflect the relative prominence the user requested.

Conceptually, this reframes multi-objective recommendation from “maximize a basket of metrics” to “shape the outcome according to a declared preference profile.” It also exposes a central challenge: different objectives are not naturally commensurate, so proportionality requires a normalization view of what it means for an objective to be “high” or “low” in a given domain and interaction context.

Two interpretations of propensity signals: importance vs. target levels

A third mechanism is explicitly modeling that users may interpret the same control differently. We study two semantics.

One semantics treats propensities as importance weights: higher values mean “push harder” on that objective relative to others. The other semantics treats propensities as desired levels: users may want “some diversity, but not too much,” which turns the objective from a maximization direction into a calibration-like target.

This distinction matters because it creates a real design fork. Systems optimized under one interpretation can be poorly behaved under the other, even if both claim to be “respecting user preferences.”

Expanding the objective space with “two-sided” preferences

A fourth mechanism is allowing users to express not only “more novelty/exploration/diversity,” but also preferences in the opposite direction, toward popularity, uniformity, and exploitation (i.e., staying closer to the known profile). This is important because user control is incomplete if it only permits intensifying a predefined notion of “good.”

By making inverse objectives explicit, we can observe whether users actually steer toward stability and familiarity when given the option, and how such choices interact with short-term engagement and longer-term indicators.

Findings and insights

A consistent qualitative pattern is that a relevance-focused baseline tends to support stronger short-term consumption behavior, while controllable multi-objective variants tend to improve beyond-accuracy characteristics of the produced lists. In other words, multi-objective control exposes a trade-off between what drives immediate selection and what expands the space of exposure.

At the same time, the study suggests that beyond-accuracy improvements are not merely cosmetic. Multi-objective variants often produce recommendation trajectories that are more serendipitous and broaden user profiles in terms of covered content categories, indicating that “discovery-oriented” changes can manifest in user-chosen items rather than only in list-level statistics. The results support a view where discovery signals can contribute positively to overall experience, even when they do not maximize immediate selection.

A notable methodological insight is that increased algorithmic complexity does not automatically translate into better controllability. More computationally demanding evolutionary variants do not yield clearer benefits in terms of how proportionally the system reflects user propensities, compared to simpler incremental strategies. This is consequential for deployable controllable recommenders, where responsiveness and predictability are often more valuable than marginal optimization gains.

The paper also highlights that “beyond-accuracy” objectives can collapse into each other empirically. Diversity, novelty, and exploration are strongly intertwined in practice (especially when derived from similar behavioral signals), meaning that adjusting one dimension can unintentionally move others. This makes interface design and objective selection a core research problem: controllability depends not only on optimization, but on whether the objective dimensions are behaviorally separable enough to support meaningful user intent.

Finally, users do use the control interface, often experimenting early and then converging toward stable settings. Yet their final settings can differ substantially from propensities estimated from initial behavior, suggesting that preference elicitation from history and preference articulation through control can diverge in systematic ways. This is a key warning for systems that assume they can infer “what users want” about beyond-accuracy trade-offs without letting users correct the system.

Conclusions

This work contributes a user-centric lens on controllable multi-objective recommendation: we can evaluate not only what lists look like under multiple objectives, but whether the system’s behavior is consistent with user-declared trade-offs and whether these trade-offs influence longer-horizon indicators such as serendipity and profile broadening. The results emphasize that controllability is not a UI add-on; it is an alignment requirement that interacts with objective definitions, optimization choices, and the behavioral entanglement among beyond-accuracy signals.

Several research directions follow naturally. One is to better model the semantics of user controls, including when users treat sliders as “importance” versus “target levels,” and how interfaces can make that interpretation unambiguous. Another is to design objective sets whose dimensions are less behaviorally redundant, so that control becomes meaningfully expressive rather than indirectly correlated. A third direction is to move from short controlled sessions to longitudinal settings, where the promised benefits of discovery-oriented trade-offs can be validated as sustained engagement outcomes rather than proxies. Finally, richer control modalities (potentially including natural language preference statements) raise the prospect of making controllable multi-objective recommendation both less burdensome and more faithful to what users actually mean when they ask for “something different, but still for me.”