Personalized Ranking for Recommender Systems

Personalized Ranking

Most real-world recommender data is implicit — clicks, watches, purchases. There are no explicit ratings, and the unobserved (user, item) pairs are a mix of “didn’t like it” and “haven’t seen it yet”. MSE on a 0/1 target is wrong.

Better framing: personalized ranking — given an observed positive (user, i), the model should rank i above sampled unobserved items. Treating every unobserved pair as a literal negative target is usually misaligned with ranking because exposure is missing-not-at-random.

Two pairwise losses for this:

BPR (Bayesian Personalized Ranking, Rendle et al.
1. — log-sigmoid of score margin: -\log \sigma(\hat r_{ui} - \hat r_{uj}) for sampled negatives j.
Hinge — max-margin variant: \max(0, m - (\hat r_{ui} - \hat r_{uj})).

Both turn implicit feedback into pairwise comparisons; the model learns to put positives above negatives.

Training triples

For each user u, let I_u^+ be observed positives (clicked, watched, bought) and sample negatives j \notin I_u^+ from the item catalog. Training examples are triples:

D = \{(u,i,j): i\in I_u^+,\, j\notin I_u^+\}.

The model never needs an absolute rating target. It only needs the score gap

\Delta_{uij} = \hat r_{ui} - \hat r_{uj}.

Large positive gaps mean the positive item outranks the sampled negative. The sampled negative is a training contrast, not proof that the user would dislike the item.

BPR loss

Sampled negatives j per positive (u, i); loss is log-sigmoid of the score margin:

from mxnet import gluon, np, npx
npx.set_np()

class BPRLoss(gluon.loss.Loss):
    def __init__(self, weight=None, batch_axis=0, **kwargs):
        super(BPRLoss, self).__init__(weight=None, batch_axis=0, **kwargs)

    def forward(self, positive, negative):
        distances = positive - negative
        loss = - np.sum(np.log(npx.sigmoid(distances)), 0, keepdims=True)
        return loss

Hinge loss

Hard-margin alternative — equivalent to a max-margin classifier over score differences:

class HingeLossbRec(gluon.loss.Loss):
    def __init__(self, weight=None, batch_axis=0, **kwargs):
        super(HingeLossbRec, self).__init__(weight=None, batch_axis=0,
                                            **kwargs)

    def forward(self, positive, negative, margin=1):
        distances = positive - negative
        loss = np.sum(np.maximum(- distances + margin, 0))
        return loss

BPR vs hinge

Both losses reward positive margins, but their gradients behave differently:

\ell_\textrm{BPR}(\Delta) = -\log \sigma(\Delta), \qquad \ell_\textrm{hinge}(\Delta) = \max(0, m-\Delta).

BPR keeps a smooth, nonzero gradient for every sampled pair.
Hinge stops updating once the margin is satisfied.
The most important implementation choice is often the negative sampler, not the algebraic form of the loss.

Recap

Personalized ranking turns implicit feedback into a pairwise comparison task.
BPR: log-sigmoid of the (positive - negative) score margin. Soft, differentiable, the most-used choice.
Hinge: hard margin; sometimes better with very imbalanced data.
Negative sampling is the implementation hammer that makes either loss tractable on large item catalogs.