Enhancing MongoDB® Atlas Vector Search by Boosting Effective Scores
Overview
In this article, we explore how to use MongoDB’s aggregation pipeline to achieve preference-based retrieval boosting alongside vector searching. Using the embedding_movies mFlix sample dataset as an example, we demonstrate increasing the effective movie recommendation score by 10% for documents with a preferred genre (such as "Adventure"). This is in contrast to having a genre requirement of “Adventure”. The result set is then sorted by this boosted score, ensuring that personalized preferences are slightly prioritized in the recommendations.
Qarbine is a MongoDB AI applications program (MAAP) partner and uses MongoDB internally to manage the analytic suite’s component catalog. The suite also integrates with a wide range of AI services. This article’s techniques are being researched as part of a future release.
Scenario
You provide a movie recommendation service which allows users to find movies by describing what they want in natural language. You chose MongoDB’s Developer Data Platform™ because it includes powerful built-in vector search capabilities. By embedding movie plots as vectors, the application can match user queries to the most relevant films, even when the input is general or open-ended.
To enhance the recommendations you want to consider user preferences such as for adventure movies or PG-rated films. These are not strict requirements though. This approach ensures users see suggestions that closely match their interests while still offering a broad and personalized selection of movies.
MongoDB offers robust querying capabilities, but within the aggregation pipeline, you cannot directly modify the internal vector search score (retrieved via { $meta: "vectorSearchScore" }). However, you can use aggregation pipeline stages to create a custom scoring mechanism and then sort results by this new field, effectively implementing your own score boosting logic.
Data Background
In MongoDB the sample_mflix database has information about movies and movie theaters. The embedded_movies collection contains a curated selection of movies from the original movies collection, focusing specifically on the Western, Action, and Fantasy genres. Each document in embedded_movies includes a new plot_embedding field, which is generated using OpenAI’s text-embedding-ada-002 model. These embeddings enable powerful semantic search capabilities when used with MongoDB Atlas Vector Search. Below is a table of the field names in this collection.
Field Name | Field Name (continued) | Field Name (continued) |
---|---|---|
_id | plot | genres |
runtime | cast | num_mflix_comments |
poster | title | fullplot |
languages | released | directors |
writers | awards | lastupdated |
year | imdb | countries |
type | tomatoes | plot_embedding |
More details on the embedded_movies collection can be found at
https://www.mongodb.com/docs/atlas/sample-data/sample-mflix/#sample_mflix.embedded_movies
Query Alternatives and Tradeoffs
There are multiple approaches to prioritize documents with matching genre elements in a MongoDB Atlas Vector Search query, effectively boosting their scores in the results. While Atlas Vector Search primarily ranks results by the vector similarity between the query embedding and document embeddings, you can use the filter option in the $vectorSearch stage to restrict results to only those with a specified genre. However, this acts as a filter rather than a score booster—it narrows the candidate set but does not directly influence the similarity score.
If you wish to boost the score for documents with a matching genre, you cannot directly modify the similarity score within the $vectorSearch stage. Instead, you can post-process the results using aggregation pipeline stages to apply a custom score boost or reorder results based on your own criteria.
1. Pre-filter by Genre in $vectorSearch
The following query only returns movies that have the desired genre (e.g., "Adventure") in their genres array:
db.embedded_movies.aggregate([
{
"$vectorSearch": {
"index": "vector_index",
"path": "plot_embedding",
"queryVector": [ /* your query vector here */ ],
"filter": { "genres": "Adventure" },
"numCandidates": 100,
"limit": 10
}
},
{
$project: {
vectorSearchScore: { $meta: "vectorSearchScore" },
title: 1,
// genres: 1,
// plot: 1,
_id: 0
}
}
]
)
A sample result is shown below.
Row | Title | Genres | Vector Search Score |
---|---|---|---|
1 | Godzilla | Action, Adventure, Sci-Fi | 0.7414208650588989 |
2 | Forest Warrior | Action, Adventure, Comedy | 0.7403745651245117 |
3 | Heavy Metal | Animation, Adventure, Fantasy | 0.7403595447540283 |
4 | Forest Warrior | Action, Adventure, Comedy | 0.7403473854064941 |
5 | Clash of the Titans | Adventure, Family, Fantasy | 0.7400811314582825 |
6 | Tales from Earthsea | Animation, Adventure, Fantasy | 0.737678587436676 |
7 | Sinbad and the Eye of the Tiger | Action, Adventure, Drama | 0.7307860851287842 |
8 | The Brothers Grimm | Action, Adventure, Comedy | 0.7293913960456848 |
9 | Vampire Effect | Action, Adventure, Comedy | 0.7279759049415588 |
10 | Batman | Action, Adventure | 0.7276626229286194 |
2. Combine Vector Search with Aggregation to Boost Matching Genres
The following query retrieves relevant movies, but gives a score boost to those with the preferred (vs. required) genre:
db.embedded_movies.aggregate(
[
{
"$vectorSearch": {
"index": "vector_index",
"path": "plot_embedding",
"queryVector": [ /* your query vector here */ ],
"numCandidates": 150,
"limit": 20
}
},
{
$addFields: {
boostedScore: {
$multiply: [
{ $meta: "vectorSearchScore" },
{
$cond: [
{ $in: ["Adventure", "$genres"] },
1.1, // boost if "Adventure" is present
1 // no boost
]
}
]
}
}
},
{
$project: {
vectorSearchScore: { $meta: "vectorSearchScore" },
boostedScore: 1,
title: 1,
// genres: 1,
// plot: 1,
_id: 0
}
},
{
"$sort": { "boostedScore": -1 }
},
{
"$limit": 10
}
]
)
Movies no longer must have "Adventure" in their genres array to be a candidate in the answer set. This pipeline gives movies with "Adventure" in their genres array a boosted score, then sorts and limits the results based on the boosted score. You can replace "Adventure" with any other genre or tweak the condition as needed.
Below is a sample boosted result.
Row | Title | Vector Search Score | Boosted Score | Boosted |
---|---|---|---|---|
1 | Godzilla | 0.7414208650588989 | 0.8155629515647889 | Yes |
2 | Forest Warrior | 0.7403745651245117 | 0.8144120216369629 | Yes |
3 | Heavy Metal | 0.7403595447540283 | 0.8143954992294312 | Yes |
4 | Forest Warrior (duplicate) | 0.7403473854064941 | 0.8143821239471436 | Yes |
5 | Clash of the Titans | 0.7400811314582825 | 0.8140892446041108 | Yes |
6 | Tales from Earthsea | 0.737678587436676 | 0.8114464461803437 | Yes |
7 | Sinbad and the Eye of the Tiger | 0.7307860851287842 | 0.8038646936416627 | Yes |
8 | The Brothers Grimm | 0.7293913960456848 | 0.8023305356502534 | Yes |
9 | Nightbreed | 0.7514783143997192 | 0.7514783143997192 | No |
10 | Troll | 0.7401143908500671 | 0.7401143908500671 | No |
With the new criteria approach, several additional movies (such as Nightbreed and Troll, which have genres of ["Fantasy", "Horror"]) became eligible candidates for the result set. These films were previously excluded from the first answer set, which required "Adventure" as a mandatory genre.
Under the new, preference-based query, Nightbreed would typically appear at the top of the list based on its vector score alone. However, because the updated query boosts scores for movies with "Adventure" in their genres, eight movies moved higher in the rankings. This approach broadens the candidate pool and delivers results that reflect a preference for "Adventure," rather than making it a strict requirement.
Summary Table
Approach | Description | Effect on Results |
---|---|---|
Pre-filter in $vectorSearch | Only search documents with matching type. Requires "type": "filter in the vector search index definition for the document field path. | Only matching genres documents returned. |
Boost with aggregation | Add custom score boost to matching genres. | All high scoring documents returned, but matching genres ranked higher |
Other Options
While genre is a practical field for filtering or boosting results in the movies dataset, you can apply the same logic to any relevant field to suit your application’s needs. For instance, you might provide a preference-based score boost for movies with high IMDb ratings (using the imdb.rating field) or filter results by official movie ratings such as "PG" or "R". Below is an example of how to boost the score for movies with an IMDB rating of 8 or higher:
$cond: [
{ $gte: ["$imdb.rating", 8] },
1.1, // Boost by 10% (or any factor you prefer)
1 // No boost
]
Try it Out Yourself
Prerequisites
- The minimum requirement is a MongoDB Atlas Subscription. The free tier is fine and you can easily sign up at https://www.mongodb.com/cloud/atlas/register
- An OpenAI API key is optional if you want to generate your own query vectors. The appendix contains 2 ready to go query vector values and also a utility to generate an embedding.
Load the Sample Data
Review the MongoDB documentation subsections at
https://www.mongodb.com/docs/atlas/sample-data/sample-mflix/#sample_mflix.embedded_movies
See the steps at the following link to load the mFlix data set with embeddings.
https://www.mongodb.com/docs/atlas/sample-data/
The result in the Atlas UI should be
Create the Search Index
After the dataset is loaded, select
Choose the tab highlighted below
Click
Choose the right hand option as shown below.
Leave the default name for this exercise. If you already have such a name then use a different one and adjust the query "index": "vector_index" accordingly in the sample queries.
For the
Choose
Choose the right hand option as shown below.
Click
Enter the index definition:
{
"fields": [
{
"type": "vector",
"path": "plot_embedding",
"numDimensions": 1536,
"similarity": "euclidean"
},
{ "type": "filter",
"path": "genres"
}
]
}
Since our example vector search filters on genres the “type” portion of the definition is required.
Click
Review the information and then click
Wait for the index to become ready to use
Query Your Data
Open up the MongoDB shell to your Atlas instance.
Navigate to the sample database.
use sample_mflix
Review a sample document by running
db.embedded_movies.findOne( { } )
Several precomputed search vectors are in the appendix which can be stored in local files. In the MongoDB shell, load one of the query vectors by running
This defines the variable “myQueryVector” and it is available for your queries in the shell.
In the vector queries above replace
"queryVector": [ /* your query vector here */ ],
with the variable reference via
"queryVector": myQueryVector,
For example,
db.embedded_movies.aggregate(
[
{
"$vectorSearch": {
"index": "vector_index",
"path": "plot_embedding",
"queryVector": myQueryVector,
"numCandidates": 150,
"limit": 20
}
},
{
$addFields: {
boostedScore: {
$multiply: [
{ $meta: "vectorSearchScore" },
{
$cond: [
{ $in: ["Adventure", "$genres"] },
1.1, // boost if "Adventure" is present
1 // no boost
]
}
]
}
}
},
{
$project: {
vectorSearchScore: { $meta: "vectorSearchScore" },
boostedScore: 1,
title: 1,
// genres: 1,
// plot: 1,
_id: 0
}
},
{
"$sort": { "boostedScore": -1 }
},
{
"$limit": 10
}
]
)
Below is a sample result.
Row | Title | Vector Search Score | Boosted Score | Boosted |
---|---|---|---|---|
1 | Godzilla | 0.7414208650588989 | 0.8155629515647889 | Yes |
2 | Forest Warrior | 0.7403745651245117 | 0.8144120216369629 | Yes |
3 | Heavy Metal | 0.7403595447540283 | 0.8143954992294312 | Yes |
4 | Forest Warrior (duplicate) | 0.7403473854064941 | 0.8143821239471436 | Yes |
5 | Clash of the Titans | 0.7400811314582825 | 0.8140892446041108 | Yes |
6 | Tales from Earthsea | 0.737678587436676 | 0.8114464461803437 | Yes |
7 | Sinbad and the Eye of the Tiger | 0.7307860851287842 | 0.8038646936416627 | Yes |
8 | The Brothers Grimm | 0.7293913960456848 | 0.8023305356502534 | Yes |
9 | Nightbreed | 0.7514783143997192 | 0.7514783143997192 | No |
10 | Troll | 0.7401143908500671 | 0.7401143908500671 | No |
As noted in the previous section, with the new criteria approach several other movies became candidates. Both Nightbreed and Troll have genres of [ 'Fantasy', 'Horror' ] and were not in the first answer set which required a genres element of “Adventure”. With the newly structured query the vector score for Nighbreed would normally put it at the top of the list. Instead, there were 8 movies that got boosted up in the answer set because they had “Adventure” as a genres value.The result is similar to the first, but with additional candidates because of the “preference” for “Adventure” rather than the “requirement” of “Adventure”.
Open AI and Voyage AI Embeddings
The sample dataset’s embeddings were generated using Open AI’s text-embedding-ada-002 model. Voyage AI is another embedding service that you could use in your own application. The company was purchased by MongoDB for future technology integration into itsDeveloper Data Platform. The Voyage AI interface is quite similar to Open AI’s. Here is a simple Open AI cURL request.
curl https://api.openai.com/v1/embeddings \
-H "Content-Type: application/json" \
-H "Authorization: Bearer $OPENAI_API_KEY" \
-d '{
"input": "Your text string goes here",
"model": "text-embedding-3-small"
}'
For more details see https://platform.openai.com/docs/api-reference/embeddings
In a general sense, only the base URL changes. Here is a corresponding Voyage AI cURL request.
curl https://api.voyageai.com/v1/embeddings \
-H "Content-Type: application/json" \
-H "Authorization: Bearer $VOYAGE_API_KEY" \
-d '{
"input": "Your text string goes here",
"model": "voyage-3-large",
}’
For more details see https://docs.voyageai.com/docs/embeddings. Sample output for either API is shown below.
{
"object": "list",
"data": [
{
"object": "embedding",
"index": 0,
"embedding": [0.002, -0.009, 0.078, ...]
}
],
"model": "the_model_name"
}
Next Steps
This article demonstrated that user driven criteria in MongoDB vector search can also be treated as preferences rather than strict requirements. In the MongoDB shell experiment with adjusting your search parameters and conditions and view the affects on the results. Also see how boosting percentages also affect the results.
To deepen your expertise, explore the free MongoDB University courses on vector search at https://learn.mongodb.com/learning-paths/atlas-vector-search. These courses will help you optimize your indexing strategies and unlock advanced vector search capabilities.
We welcome feedback to support@qarbine.com.
References
When applying vector search to your applications, make sure your vector index is created on the correct embedding field (e.g., plot_embedding) and that the model used to create the vectors matches the one used later to obtain the query vector.
About mFlix Movies with Embeddings
About Atlas Vector Search
Creating Vector Search Indexes
About Qarbine
Qarbine’s integration with MongoDB supports direct, native queries and preserves the flexibility of the document structure, enabling advanced analytics, vector search, and seamless collaboration across teams. This approach allows Qarbine to deliver publication-quality drill-down analysis and actionable insights, making it a strong fit for modern AI and data-driven applications.
Appendix
scaryEmbedding.js
The JavaScript set the embedding for 'scary movies with mythical villians'.
Download the file from here.
funnyEmbedding.js
The JavaScript sets the embedding for 'funny cartoons with family themes'.
Download the file from here.
Vector Utility
This is a simple node.js utility to obtain a vector query value.
const axios = require('axios');
const apiKey = 'YOUR_OPEN_AI_API_KEYi';
async function fetchEmbedding(text) {
const response = await axios.post('https://api.openai.com/v1/embeddings',
{ input: [text],
model: 'text-embedding-ada-002',
},
{ headers: {
'Content-Type': 'application/json',
Authorization: 'Bearer ' + apiKey,
},
}
);
const data = response.data.data[0];
console.log('queryVector = ', JSON.stringify(data.embedding) );
}
const inputText = 'funny cartoons with family themes';
fetchEmbedding(inputText);
You can pipe the output into a file and then load that file in the MongoDB shell.