Skip to main content

Enhancing MongoDB® Atlas Vector Search by Boosting Effective Scores

Overview

In this article, we explore how to use MongoDB’s aggregation pipeline to achieve preference-based retrieval boosting alongside vector searching. Using the embedding_movies mFlix sample dataset as an example, we demonstrate increasing the effective movie recommendation score by 10% for documents with a preferred genre (such as "Adventure"). This is in contrast to having a genre requirement of “Adventure”. The result set is then sorted by this boosted score, ensuring that personalized preferences are slightly prioritized in the recommendations.

Qarbine is a MongoDB AI applications program (MAAP) partner and uses MongoDB internally to manage the analytic suite’s component catalog. The suite also integrates with a wide range of AI services. This article’s techniques are being researched as part of a future release.

Scenario

You provide a movie recommendation service which allows users to find movies by describing what they want in natural language. You chose MongoDB’s Developer Data Platform™ because it includes powerful built-in vector search capabilities. By embedding movie plots as vectors, the application can match user queries to the most relevant films, even when the input is general or open-ended.

To enhance the recommendations you want to consider user preferences such as for adventure movies or PG-rated films. These are not strict requirements though. This approach ensures users see suggestions that closely match their interests while still offering a broad and personalized selection of movies.

MongoDB offers robust querying capabilities, but within the aggregation pipeline, you cannot directly modify the internal vector search score (retrieved via { $meta: "vectorSearchScore" }). However, you can use aggregation pipeline stages to create a custom scoring mechanism and then sort results by this new field, effectively implementing your own score boosting logic.

Data Background

In MongoDB the sample_mflix database has information about movies and movie theaters. The embedded_movies collection contains a curated selection of movies from the original movies collection, focusing specifically on the Western, Action, and Fantasy genres. Each document in embedded_movies includes a new plot_embedding field, which is generated using OpenAI’s text-embedding-ada-002 model. These embeddings enable powerful semantic search capabilities when used with MongoDB Atlas Vector Search. Below is a table of the field names in this collection.

Field Name Field Name (continued) Field Name (continued)
_idplotgenres
runtimecastnum_mflix_comments
postertitlefullplot
languagesreleaseddirectors
writersawardslastupdated
yearimdbcountries
typetomatoesplot_embedding

More details on the embedded_movies collection can be found at
https://www.mongodb.com/docs/atlas/sample-data/sample-mflix/#sample_mflix.embedded_movies

Query Alternatives and Tradeoffs

There are multiple approaches to prioritize documents with matching genre elements in a MongoDB Atlas Vector Search query, effectively boosting their scores in the results. While Atlas Vector Search primarily ranks results by the vector similarity between the query embedding and document embeddings, you can use the filter option in the $vectorSearch stage to restrict results to only those with a specified genre. However, this acts as a filter rather than a score booster—it narrows the candidate set but does not directly influence the similarity score.

If you wish to boost the score for documents with a matching genre, you cannot directly modify the similarity score within the $vectorSearch stage. Instead, you can post-process the results using aggregation pipeline stages to apply a custom score boost or reorder results based on your own criteria.

1. Pre-filter by Genre in $vectorSearch

The following query only returns movies that have the desired genre (e.g., "Adventure") in their genres array:

db.embedded_movies.aggregate([
{
"$vectorSearch": {
"index": "vector_index",
"path": "plot_embedding",
"queryVector": [ /* your query vector here */ ],
"filter": { "genres": "Adventure" },
"numCandidates": 100,
"limit": 10
}
},
{
$project: {
vectorSearchScore: { $meta: "vectorSearchScore" },
title: 1,
// genres: 1,
// plot: 1,
_id: 0
}
}
]
)

A sample result is shown below.

Row Title Genres Vector Search Score
1GodzillaAction, Adventure, Sci-Fi0.7414208650588989
2Forest WarriorAction, Adventure, Comedy0.7403745651245117
3Heavy MetalAnimation, Adventure, Fantasy0.7403595447540283
4Forest WarriorAction, Adventure, Comedy0.7403473854064941
5Clash of the TitansAdventure, Family, Fantasy0.7400811314582825
6Tales from EarthseaAnimation, Adventure, Fantasy0.737678587436676
7Sinbad and the Eye of the TigerAction, Adventure, Drama0.7307860851287842
8The Brothers GrimmAction, Adventure, Comedy0.7293913960456848
9Vampire EffectAction, Adventure, Comedy0.7279759049415588
10BatmanAction, Adventure0.7276626229286194

2. Combine Vector Search with Aggregation to Boost Matching Genres

The following query retrieves relevant movies, but gives a score boost to those with the preferred (vs. required) genre:

db.embedded_movies.aggregate(
[
{
"$vectorSearch": {
"index": "vector_index",
"path": "plot_embedding",
"queryVector": [ /* your query vector here */ ],
"numCandidates": 150,
"limit": 20
}
},
{
$addFields: {
boostedScore: {
$multiply: [
{ $meta: "vectorSearchScore" },
{
$cond: [
{ $in: ["Adventure", "$genres"] },
1.1, // boost if "Adventure" is present
1 // no boost
]
}
]
}
}
},
{
$project: {
vectorSearchScore: { $meta: "vectorSearchScore" },
boostedScore: 1,
title: 1,
// genres: 1,
// plot: 1,
_id: 0
}
},
{
"$sort": { "boostedScore": -1 }
},
{
"$limit": 10
}
]
)

Movies no longer must have "Adventure" in their genres array to be a candidate in the answer set. This pipeline gives movies with "Adventure" in their genres array a boosted score, then sorts and limits the results based on the boosted score. You can replace "Adventure" with any other genre or tweak the condition as needed.

Below is a sample boosted result.

Row Title Vector Search Score Boosted Score Boosted
1Godzilla0.74142086505889890.8155629515647889Yes
2Forest Warrior0.74037456512451170.8144120216369629Yes
3Heavy Metal0.74035954475402830.8143954992294312Yes
4Forest Warrior (duplicate)0.74034738540649410.8143821239471436Yes
5Clash of the Titans0.74008113145828250.8140892446041108Yes
6Tales from Earthsea0.7376785874366760.8114464461803437Yes
7Sinbad and the Eye of the Tiger0.73078608512878420.8038646936416627Yes
8The Brothers Grimm0.72939139604568480.8023305356502534Yes
9Nightbreed0.75147831439971920.7514783143997192No
10Troll0.74011439085006710.7401143908500671No

With the new criteria approach, several additional movies (such as Nightbreed and Troll, which have genres of ["Fantasy", "Horror"]) became eligible candidates for the result set. These films were previously excluded from the first answer set, which required "Adventure" as a mandatory genre.

Under the new, preference-based query, Nightbreed would typically appear at the top of the list based on its vector score alone. However, because the updated query boosts scores for movies with "Adventure" in their genres, eight movies moved higher in the rankings. This approach broadens the candidate pool and delivers results that reflect a preference for "Adventure," rather than making it a strict requirement.

Summary Table

Approach Description Effect on Results
Pre-filter in $vectorSearchOnly search documents with matching type. Requires "type": "filter in the vector search index definition for the document field path.Only matching genres documents returned.
Boost with aggregationAdd custom score boost to matching genres.All high scoring documents returned, but matching genres ranked higher

Other Options

While genre is a practical field for filtering or boosting results in the movies dataset, you can apply the same logic to any relevant field to suit your application’s needs. For instance, you might provide a preference-based score boost for movies with high IMDb ratings (using the imdb.rating field) or filter results by official movie ratings such as "PG" or "R". Below is an example of how to boost the score for movies with an IMDB rating of 8 or higher:

$cond: [
{ $gte: ["$imdb.rating", 8] },
1.1, // Boost by 10% (or any factor you prefer)
1 // No boost
]

Try it Out Yourself

Prerequisites

  • An OpenAI API key is optional if you want to generate your own query vectors. The appendix contains 2 ready to go query vector values and also a utility to generate an embedding.

Load the Sample Data

Review the MongoDB documentation subsections at

https://www.mongodb.com/docs/atlas/sample-data/sample-mflix/#std-label-sample-mflix
https://www.mongodb.com/docs/atlas/sample-data/sample-mflix/#sample_mflix.embedded_movies

See the steps at the following link to load the mFlix data set with embeddings.
https://www.mongodb.com/docs/atlas/sample-data/

The result in the Atlas UI should be

  

Create the Search Index

After the dataset is loaded, select

  

Choose the tab highlighted below

  

Click

  

Choose the right hand option as shown below.

  

Leave the default name for this exercise. If you already have such a name then use a different one and adjust the query "index": "vector_index" accordingly in the sample queries.

  

For the

  

Choose

  

Choose the right hand option as shown below.

  

Click

  

Enter the index definition:

{
"fields": [
{
"type": "vector",
"path": "plot_embedding",
"numDimensions": 1536,
"similarity": "euclidean"
},
{ "type": "filter",
"path": "genres"
}
]
}

Since our example vector search filters on genres the “type” portion of the definition is required.

Click

  

Review the information and then click

  

Wait for the index to become ready to use

  

Query Your Data

Open up the MongoDB shell to your Atlas instance.

Navigate to the sample database.

use sample_mflix

Review a sample document by running

db.embedded_movies.findOne( { } )

Several precomputed search vectors are in the appendix which can be stored in local files. In the MongoDB shell, load one of the query vectors by running

load('scaryEmbedding.js')

This defines the variable “myQueryVector” and it is available for your queries in the shell.

In the vector queries above replace

      "queryVector": [ /* your query vector here */ ],

with the variable reference via

      "queryVector": myQueryVector,

For example,

db.embedded_movies.aggregate(
[
{
"$vectorSearch": {
"index": "vector_index",
"path": "plot_embedding",
"queryVector": myQueryVector,
"numCandidates": 150,
"limit": 20
}
},
{
$addFields: {
boostedScore: {
$multiply: [
{ $meta: "vectorSearchScore" },
{
$cond: [
{ $in: ["Adventure", "$genres"] },
1.1, // boost if "Adventure" is present
1 // no boost
]
}
]
}
}
},
{
$project: {
vectorSearchScore: { $meta: "vectorSearchScore" },
boostedScore: 1,
title: 1,
// genres: 1,
// plot: 1,
_id: 0
}
},
{
"$sort": { "boostedScore": -1 }
},
{
"$limit": 10
}
]
)

Below is a sample result.

Row Title Vector Search Score Boosted Score Boosted
1Godzilla0.74142086505889890.8155629515647889Yes
2Forest Warrior0.74037456512451170.8144120216369629Yes
3Heavy Metal0.74035954475402830.8143954992294312Yes
4Forest Warrior (duplicate)0.74034738540649410.8143821239471436Yes
5Clash of the Titans0.74008113145828250.8140892446041108Yes
6Tales from Earthsea0.7376785874366760.8114464461803437Yes
7Sinbad and the Eye of the Tiger0.73078608512878420.8038646936416627Yes
8The Brothers Grimm0.72939139604568480.8023305356502534Yes
9Nightbreed0.75147831439971920.7514783143997192No
10Troll0.74011439085006710.7401143908500671No

As noted in the previous section, with the new criteria approach several other movies became candidates. Both Nightbreed and Troll have genres of [ 'Fantasy', 'Horror' ] and were not in the first answer set which required a genres element of “Adventure”. With the newly structured query the vector score for Nighbreed would normally put it at the top of the list. Instead, there were 8 movies that got boosted up in the answer set because they had “Adventure” as a genres value.The result is similar to the first, but with additional candidates because of the “preference” for “Adventure” rather than the “requirement” of “Adventure”.

Open AI and Voyage AI Embeddings

The sample dataset’s embeddings were generated using Open AI’s text-embedding-ada-002 model. Voyage AI is another embedding service that you could use in your own application. The company was purchased by MongoDB for future technology integration into itsDeveloper Data Platform. The Voyage AI interface is quite similar to Open AI’s. Here is a simple Open AI cURL request.

curl https://api.openai.com/v1/embeddings \
-H "Content-Type: application/json" \
-H "Authorization: Bearer $OPENAI_API_KEY" \
-d '{
"input": "Your text string goes here",
"model": "text-embedding-3-small"
}'

For more details see https://platform.openai.com/docs/api-reference/embeddings

In a general sense, only the base URL changes. Here is a corresponding Voyage AI cURL request.

curl https://api.voyageai.com/v1/embeddings \
-H "Content-Type: application/json" \
-H "Authorization: Bearer $VOYAGE_API_KEY" \
-d '{
"input": "Your text string goes here",
"model": "voyage-3-large",
}

For more details see https://docs.voyageai.com/docs/embeddings. Sample output for either API is shown below.

{
"object": "list",
"data": [
{
"object": "embedding",
"index": 0,
"embedding": [0.002, -0.009, 0.078, ...]
}
],
"model": "the_model_name"
}

Next Steps

This article demonstrated that user driven criteria in MongoDB vector search can also be treated as preferences rather than strict requirements. In the MongoDB shell experiment with adjusting your search parameters and conditions and view the affects on the results. Also see how boosting percentages also affect the results.

To deepen your expertise, explore the free MongoDB University courses on vector search at https://learn.mongodb.com/learning-paths/atlas-vector-search. These courses will help you optimize your indexing strategies and unlock advanced vector search capabilities.

We welcome feedback to support@qarbine.com.

References

When applying vector search to your applications, make sure your vector index is created on the correct embedding field (e.g., plot_embedding) and that the model used to create the vectors matches the one used later to obtain the query vector.

About mFlix Movies with Embeddings

About Atlas Vector Search

Creating Vector Search Indexes

About Qarbine

Qarbine’s integration with MongoDB supports direct, native queries and preserves the flexibility of the document structure, enabling advanced analytics, vector search, and seamless collaboration across teams. This approach allows Qarbine to deliver publication-quality drill-down analysis and actionable insights, making it a strong fit for modern AI and data-driven applications.

Appendix

scaryEmbedding.js

The JavaScript set the embedding for 'scary movies with mythical villians'.
Download the file from here.

funnyEmbedding.js

The JavaScript sets the embedding for 'funny cartoons with family themes'.

Download the file from here.

Vector Utility

This is a simple node.js utility to obtain a vector query value.

const axios = require('axios');
const apiKey = 'YOUR_OPEN_AI_API_KEYi';
async function fetchEmbedding(text) {
const response = await axios.post('https://api.openai.com/v1/embeddings',
{ input: [text],
model: 'text-embedding-ada-002',
},
{ headers: {
'Content-Type': 'application/json',
Authorization: 'Bearer ' + apiKey,
},
}
);
const data = response.data.data[0];
console.log('queryVector = ', JSON.stringify(data.embedding) );
}
const inputText = 'funny cartoons with family themes';
fetchEmbedding(inputText);

You can pipe the output into a file and then load that file in the MongoDB shell.