" "

Explaining Negative Sampling in Recommender Systems

By Jerry Wang and Jiachen Huang

As practitioners of Recommender Systems (RecSys) in different industries and fields, we often find situations where explicit feedback is a rarity. Imagine a scenario in which the only information available is product transaction history data: products a customer purchased and when they purchased them. What is lacking is information on products the customer perused, but rejected. In this implicit feedback dataset, we have positive feedback only (purchases), and all non-purchases are a mix of unknown (products customers haven’t seen) and true negative feedback (products customers have seen but didn’t like). This poses a challenge when building a recommender model that predicts a customer’s propensity to purchase other products.

The most common way of addressing this implicit feedback challenge is either building models on positive samples only or applying some mathematical functions to include non-purchases. Typical algorithms for the positive-samples-only approach include Neighborhood Methods (User-based CF, Item-based CF), while Matrix Factorization Methods (ALS, eALS) are often used when applying mathematical functions on all non-purchases.

The sophistication and performance of the above methods are quite limited, especially in the era of Deep Learning RecSys. A second method called Negative Sampling is becoming increasingly popular in implicit feedback scenarios. Because this approach constructs the data into a format that’s the same as the input to typical supervised learning models, it paves the way for the use of a wider range of powerful supervised learning models, from Factorization Machines to Tree-based Ensemble Models to Deep Learning.


Multiple methodologies exist for negative sampling implementation, but the fundamental principle remains consistent: For every true positive instance, negative samples can be derived by using specific algorithms to select instances the user hasn’t interacted with. Two types of negative samplings, random and popularity-based, can be particularly useful in addressing common tasks.

Random negative sampling (RNS) is a straightforward approach in which the algorithm uniformly selects instances from the entire product pool to serve as negative samples. This randomness, while seemingly unstructured, allows the model to learn a broad base of negative instances, increasing the model’s robustness.

These negative samples are beneficial for retrieval/candidate generation models where the objective is to find a large pool of relevant items without undue concern for precision. These samples are sometimes called Easy Negatives.

Popularity-based negative sampling (PNS), by contrast, incorporates item-interaction frequency when selecting negatives. Compared to RNS, the purpose of popularity-based sampling is to better utilize the information contained in the non-purchase data. For instance, a very popular product that appears in a user’s non-purchase set is highly likely to be a product the user doesn’t like. Conversely, a product that everyone dislikes and doesn’t purchase provides little information to the model. By putting more weight on popular items, the models can more accurately distinguish between purchases and non-purchases in their consideration set, which is the objective of ranking models. These samples are sometimes referred to as Hard Negatives.

Popularity-driven algorithms typically require weighted sampling, which demands array-like input and is impractical when working with data sizes exceeding main memory capacity or with distributed-computation engines like Spark. However, an efficient implementation seamlessly adapts to sequential, parallel, and distributed scenarios:

By incorporating this algorithm, we can implement PNS with a few lines of code and maximize the performance of any distributed-computing engine.

Through negative samples, it becomes feasible to provide the algorithm with positive and negative instances, and then build a classification algorithm on top of it. This approach helps the model identify not only the user’s preferences but also their dislikes, which are equally important in generating relevant recommendations.

Case study

We successfully employed negative sampling for recommender models when working with a major player in the travel and tourism industry. The model we helped the company build recommends itineraries to customers and is deployed into multiple marketing channels. Our challenge was that, due to the nature of booking behavior and available tracking data, we could obtain historical records of successful bookings—but no records of seen/considered, which are typical in implicit feedback scenarios.

There is still, however, rich data tracked at the customer level on demographics, spend behavior and channel interaction. By combining first- and third-party data, we constructed more than 250 features as a 360-view of the customers. Given that ALS matrix factorization doesn’t accept customer-level features, we opted instead for the negative sampling approach.

Using popularity-based negative sampling for ranking purposes, we sampled negative instances from itineraries that the target customer didn’t book—but that other customers from the same region and in the same week did book. This created a good proxy for itineraries the customer saw and/or considered but didn’t purchase.

For each actual booking record, we used PNS to create 20 negative samples. With this augmented dataset, we built a LightGBM model to predict the likelihood of each customer booking each itinerary, which achieved both high offline model AUC and online A/B test success. We tested a deep learning-based recommender on the same dataset and achieved even higher offline AUC, which means the resulting negative samples encompassed rich and learnable information that could be mined further using more complex algorithms.

As the field of Recommender Systems continues to evolve, negative sampling techniques are increasingly being proposed as a core part of new recommender algorithms. With the recent rapid advancement in GenAI, RecSys has seen a paradigm shift to pre-training/fine-tuning/prompting, which may no longer require techniques like negative sampling. But until the industry fully tests this new paradigm, RecSys practitioners will continue to use current recommendation methodologies such as negative sampling.

For those seeking a more comprehensive understanding of negative sampling and more advanced implementations, we recommend delving into the academic papers referenced (here) and (here). This comprehensive overview of LLM-powered Recommender Systems provides additional information on this rapidly evolving field.

Tech + Us: Monthly insights for harnessing the full potential of AI and tech.