Personalized Ranking for Recommender Systems

Personalized Ranking

Most real-world recommender data is implicit — clicks, watches, purchases. There are no explicit ratings, and the unobserved (user, item) pairs are a mix of “didn’t like it” and “haven’t seen it yet”. MSE on a 0/1 target is wrong.

Better framing: personalized ranking — given an observed positive (user, i), the model should rank i above sampled unobserved items. Treating every unobserved pair as a literal negative target is usually misaligned with ranking because exposure is missing-not-at-random.

Two pairwise losses for this:

  • BPR (Bayesian Personalized Ranking, Rendle et al.
    1. — log-sigmoid of score margin: -\log \sigma(\hat r_{ui} - \hat r_{uj}) for sampled negatives j.
  • Hinge — max-margin variant: \max(0, m - (\hat r_{ui} - \hat r_{uj})).

Both turn implicit feedback into pairwise comparisons; the model learns to put positives above negatives.

Training triples

For each user u, let I_u^+ be observed positives (clicked, watched, bought) and sample negatives j \notin I_u^+ from the item catalog. Training examples are triples:

D = \{(u,i,j): i\in I_u^+,\, j\notin I_u^+\}.

The model never needs an absolute rating target. It only needs the score gap

\Delta_{uij} = \hat r_{ui} - \hat r_{uj}.

Large positive gaps mean the positive item outranks the sampled negative. The sampled negative is a training contrast, not proof that the user would dislike the item.

BPR loss

Sampled negatives j per positive (u, i); loss is log-sigmoid of the score margin:

import torch
from torch import nn
class BPRLoss(nn.Module):
    def __init__(self):
        super(BPRLoss, self).__init__()

    def forward(self, positive, negative):
        distances = positive - negative
        loss = -torch.sum(torch.log(torch.sigmoid(distances)), dim=0,
                          keepdim=True)
        return loss

Hinge loss

Hard-margin alternative — equivalent to a max-margin classifier over score differences:

class HingeLossbRec(nn.Module):
    def __init__(self):
        super(HingeLossbRec, self).__init__()

    def forward(self, positive, negative, margin=1):
        distances = positive - negative
        loss = torch.sum(torch.clamp(-distances + margin, min=0))
        return loss

BPR vs hinge

Both losses reward positive margins, but their gradients behave differently:

\ell_\textrm{BPR}(\Delta) = -\log \sigma(\Delta), \qquad \ell_\textrm{hinge}(\Delta) = \max(0, m-\Delta).

  • BPR keeps a smooth, nonzero gradient for every sampled pair.
  • Hinge stops updating once the margin is satisfied.
  • The most important implementation choice is often the negative sampler, not the algebraic form of the loss.

Recap

  • Personalized ranking turns implicit feedback into a pairwise comparison task.
  • BPR: log-sigmoid of the (positive - negative) score margin. Soft, differentiable, the most-used choice.
  • Hinge: hard margin; sometimes better with very imbalanced data.
  • Negative sampling is the implementation hammer that makes either loss tractable on large item catalogs.