Enhancing MongoDB® Atlas Vector Search by Boosting Effective Scores

Overview

In this article, we explore how to use MongoDB’s aggregation pipeline to achieve preference-based retrieval boosting alongside vector searching. Using the embedding_movies mFlix sample dataset as an example, we demonstrate increasing the effective movie recommendation score by 10% for documents with a preferred genre (such as "Adventure"). This is in contrast to having a genre requirement of “Adventure”. The result set is then sorted by this boosted score, ensuring that personalized preferences are slightly prioritized in the recommendations.

Qarbine is a MongoDB AI applications program (MAAP) partner and uses MongoDB internally to manage the analytic suite’s component catalog. The suite also integrates with a wide range of AI services. This article’s techniques are being researched as part of a future release.

Scenario

You provide a movie recommendation service which allows users to find movies by describing what they want in natural language. You chose MongoDB’s Developer Data Platform™ because it includes powerful built-in vector search capabilities. By embedding movie plots as vectors, the application can match user queries to the most relevant films, even when the input is general or open-ended.

To enhance the recommendations you want to consider user preferences such as for adventure movies or PG-rated films. These are not strict requirements though. This approach ensures users see suggestions that closely match their interests while still offering a broad and personalized selection of movies.

MongoDB offers robust querying capabilities, but within the aggregation pipeline, you cannot directly modify the internal vector search score (retrieved via { $meta: "vectorSearchScore" }). However, you can use aggregation pipeline stages to create a custom scoring mechanism and then sort results by this new field, effectively implementing your own score boosting logic.

Data Background

In MongoDB the sample_mflix database has information about movies and movie theaters. The embedded_movies collection contains a curated selection of movies from the original movies collection, focusing specifically on the Western, Action, and Fantasy genres. Each document in embedded_movies includes a new plot_embedding field, which is generated using OpenAI’s text-embedding-ada-002 model. These embeddings enable powerful semantic search capabilities when used with MongoDB Atlas Vector Search. Below is a table of the field names in this collection.

Field Name	Field Name (continued)	Field Name (continued)
_id	plot	genres
runtime	cast	num_mflix_comments
poster	title	fullplot
languages	released	directors
writers	awards	lastupdated
year	imdb	countries
type	tomatoes	plot_embedding

More details on the embedded_movies collection can be found at
https://www.mongodb.com/docs/atlas/sample-data/sample-mflix/#sample_mflix.embedded_movies

Query Alternatives and Tradeoffs

There are multiple approaches to prioritize documents with matching genre elements in a MongoDB Atlas Vector Search query, effectively boosting their scores in the results. While Atlas Vector Search primarily ranks results by the vector similarity between the query embedding and document embeddings, you can use the filter option in the $vectorSearch stage to restrict results to only those with a specified genre. However, this acts as a filter rather than a score booster—it narrows the candidate set but does not directly influence the similarity score.

If you wish to boost the score for documents with a matching genre, you cannot directly modify the similarity score within the $vectorSearch stage. Instead, you can post-process the results using aggregation pipeline stages to apply a custom score boost or reorder results based on your own criteria.

1. Pre-filter by Genre in $vectorSearch

The following query only returns movies that have the desired genre (e.g., "Adventure") in their genres array:

db.embedded_movies.aggregate([
  {
    "$vectorSearch": {
      "index": "vector_index",
      "path": "plot_embedding",
      "queryVector": [ /* your query vector here */ ],
      "filter": { "genres": "Adventure" },
      "numCandidates": 100,
      "limit": 10
    }
  },
  {
    $project: {
          vectorSearchScore: { $meta: "vectorSearchScore" },
          title: 1,
//          genres: 1,
//          plot: 1,
          _id: 0
        }
  }
]
)

A sample result is shown below.

Row	Title	Genres	Vector Search Score
1	Godzilla	Action, Adventure, Sci-Fi	0.7414208650588989
2	Forest Warrior	Action, Adventure, Comedy	0.7403745651245117
3	Heavy Metal	Animation, Adventure, Fantasy	0.7403595447540283
4	Forest Warrior	Action, Adventure, Comedy	0.7403473854064941
5	Clash of the Titans	Adventure, Family, Fantasy	0.7400811314582825
6	Tales from Earthsea	Animation, Adventure, Fantasy	0.737678587436676
7	Sinbad and the Eye of the Tiger	Action, Adventure, Drama	0.7307860851287842
8	The Brothers Grimm	Action, Adventure, Comedy	0.7293913960456848
9	Vampire Effect	Action, Adventure, Comedy	0.7279759049415588
10	Batman	Action, Adventure	0.7276626229286194

2. Combine Vector Search with Aggregation to Boost Matching Genres

The following query retrieves relevant movies, but gives a score boost to those with the preferred (vs. required) genre:

db.embedded_movies.aggregate(
[
{
    "$vectorSearch": {
      "index": "vector_index",
      "path": "plot_embedding",
      "queryVector": [ /* your query vector here */ ],
      "numCandidates": 150,
      "limit": 20
    }
},
{
    $addFields: {
      boostedScore: {
        $multiply: [
          { $meta: "vectorSearchScore" },
          {
          $cond: [
            { $in: ["Adventure", "$genres"] },
            1.1, // boost if "Adventure" is present
            1    // no boost
          ]
          }
        ]
      }
    }
  },
{
    $project: {
          vectorSearchScore: { $meta: "vectorSearchScore" },
          boostedScore: 1,
          title: 1,
//          genres: 1,
//          plot: 1,
          _id: 0
        }
},
{
    "$sort": { "boostedScore": -1 }
},
{
    "$limit": 10
}
]
)

Movies no longer must have "Adventure" in their genres array to be a candidate in the answer set. This pipeline gives movies with "Adventure" in their genres array a boosted score, then sorts and limits the results based on the boosted score. You can replace "Adventure" with any other genre or tweak the condition as needed.

Below is a sample boosted result.

Row	Title	Vector Search Score	Boosted Score	Boosted
1	Godzilla	0.7414208650588989	0.8155629515647889	Yes
2	Forest Warrior	0.7403745651245117	0.8144120216369629	Yes
3	Heavy Metal	0.7403595447540283	0.8143954992294312	Yes
4	Forest Warrior (duplicate)	0.7403473854064941	0.8143821239471436	Yes
5	Clash of the Titans	0.7400811314582825	0.8140892446041108	Yes
6	Tales from Earthsea	0.737678587436676	0.8114464461803437	Yes
7	Sinbad and the Eye of the Tiger	0.7307860851287842	0.8038646936416627	Yes
8	The Brothers Grimm	0.7293913960456848	0.8023305356502534	Yes
9	Nightbreed	0.7514783143997192	0.7514783143997192	No
10	Troll	0.7401143908500671	0.7401143908500671	No

With the new criteria approach, several additional movies (such as Nightbreed and Troll, which have genres of ["Fantasy", "Horror"]) became eligible candidates for the result set. These films were previously excluded from the first answer set, which required "Adventure" as a mandatory genre.

Under the new, preference-based query, Nightbreed would typically appear at the top of the list based on its vector score alone. However, because the updated query boosts scores for movies with "Adventure" in their genres, eight movies moved higher in the rankings. This approach broadens the candidate pool and delivers results that reflect a preference for "Adventure," rather than making it a strict requirement.

Summary Table

Approach	Description	Effect on Results
Pre-filter in $vectorSearch	Only search documents with matching type. Requires "type": "filter in the vector search index definition for the document field path.	Only matching genres documents returned.
Boost with aggregation	Add custom score boost to matching genres.	All high scoring documents returned, but matching genres ranked higher

Other Options

While genre is a practical field for filtering or boosting results in the movies dataset, you can apply the same logic to any relevant field to suit your application’s needs. For instance, you might provide a preference-based score boost for movies with high IMDb ratings (using the imdb.rating field) or filter results by official movie ratings such as "PG" or "R". Below is an example of how to boost the score for movies with an IMDB rating of 8 or higher:

$cond: [
    { $gte: ["$imdb.rating", 8] },
    1.1, // Boost by 10% (or any factor you prefer)
    1    // No boost
  ]

Try it Out Yourself

Prerequisites

The minimum requirement is a MongoDB Atlas Subscription. The free tier is fine and you can easily sign up at https://www.mongodb.com/cloud/atlas/register

An OpenAI API key is optional if you want to generate your own query vectors. The appendix contains 2 ready to go query vector values and also a utility to generate an embedding.

Load the Sample Data

Review the MongoDB documentation subsections at

https://www.mongodb.com/docs/atlas/sample-data/sample-mflix/#std-label-sample-mflix
https://www.mongodb.com/docs/atlas/sample-data/sample-mflix/#sample_mflix.embedded_movies

See the steps at the following link to load the mFlix data set with embeddings.
https://www.mongodb.com/docs/atlas/sample-data/

The result in the Atlas UI should be

Create the Search Index

After the dataset is loaded, select

Choose the tab highlighted below

Click

Choose the right hand option as shown below.

Leave the default name for this exercise. If you already have such a name then use a different one and adjust the query "index": "vector_index" accordingly in the sample queries.

For the

Choose

Choose the right hand option as shown below.

Click

Enter the index definition:

{
  "fields": [
    {
      "type": "vector",
      "path": "plot_embedding",
      "numDimensions": 1536,
      "similarity": "euclidean"
    },
   { "type": "filter",
      "path": "genres"
  }
  ]
}

Since our example vector search filters on genres the “type” portion of the definition is required.

Click

Review the information and then click

Wait for the index to become ready to use

Query Your Data

Open up the MongoDB shell to your Atlas instance.

Navigate to the sample database.

use sample_mflix

Review a sample document by running

db.embedded_movies.findOne( { } )

Several precomputed search vectors are in the appendix which can be stored in local files. In the MongoDB shell, load one of the query vectors by running

load('scaryEmbedding.js')

This defines the variable “myQueryVector” and it is available for your queries in the shell.

In the vector queries above replace

      "queryVector": [ /* your query vector here */ ],

with the variable reference via

      "queryVector": myQueryVector,

For example,

db.embedded_movies.aggregate(
[
{
    "$vectorSearch": {
      "index": "vector_index",
      "path": "plot_embedding",
      "queryVector": myQueryVector,
      "numCandidates": 150,
      "limit": 20
    }
},
{
    $addFields: {
      boostedScore: {
        $multiply: [
          { $meta: "vectorSearchScore" },
          {
          $cond: [
            { $in: ["Adventure", "$genres"] },
            1.1, // boost if "Adventure" is present
            1    // no boost
          ]
          }
        ]
      }
    }
},
{
    $project: {
          vectorSearchScore: { $meta: "vectorSearchScore" },
          boostedScore: 1,
          title: 1,
//          genres: 1,
//          plot: 1,
          _id: 0
        }
},
{
    "$sort": { "boostedScore": -1 }
},
{
    "$limit": 10
}
]
)

Below is a sample result.

Row	Title	Vector Search Score	Boosted Score	Boosted
1	Godzilla	0.7414208650588989	0.8155629515647889	Yes
2	Forest Warrior	0.7403745651245117	0.8144120216369629	Yes
3	Heavy Metal	0.7403595447540283	0.8143954992294312	Yes
4	Forest Warrior (duplicate)	0.7403473854064941	0.8143821239471436	Yes
5	Clash of the Titans	0.7400811314582825	0.8140892446041108	Yes
6	Tales from Earthsea	0.737678587436676	0.8114464461803437	Yes
7	Sinbad and the Eye of the Tiger	0.7307860851287842	0.8038646936416627	Yes
8	The Brothers Grimm	0.7293913960456848	0.8023305356502534	Yes
9	Nightbreed	0.7514783143997192	0.7514783143997192	No
10	Troll	0.7401143908500671	0.7401143908500671	No

As noted in the previous section, with the new criteria approach several other movies became candidates. Both Nightbreed and Troll have genres of [ 'Fantasy', 'Horror' ] and were not in the first answer set which required a genres element of “Adventure”. With the newly structured query the vector score for Nighbreed would normally put it at the top of the list. Instead, there were 8 movies that got boosted up in the answer set because they had “Adventure” as a genres value.The result is similar to the first, but with additional candidates because of the “preference” for “Adventure” rather than the “requirement” of “Adventure”.

Open AI and Voyage AI Embeddings

The sample dataset’s embeddings were generated using Open AI’s text-embedding-ada-002 model. Voyage AI is another embedding service that you could use in your own application. The company was purchased by MongoDB for future technology integration into itsDeveloper Data Platform. The Voyage AI interface is quite similar to Open AI’s. Here is a simple Open AI cURL request.

curl https://api.openai.com/v1/embeddings \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer $OPENAI_API_KEY" \
  -d '{
    "input": "Your text string goes here",
    "model": "text-embedding-3-small"
  }'

For more details see https://platform.openai.com/docs/api-reference/embeddings

In a general sense, only the base URL changes. Here is a corresponding Voyage AI cURL request.

curl https://api.voyageai.com/v1/embeddings \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer $VOYAGE_API_KEY" \
  -d '{
    "input": "Your text string goes here",
    "model": "voyage-3-large",
  }’

For more details see https://docs.voyageai.com/docs/embeddings. Sample output for either API is shown below.

{
  "object": "list",
  "data": [
    {
      "object": "embedding",
      "index": 0,
      "embedding": [0.002, -0.009, 0.078, ...]
    }
  ],
  "model": "the_model_name"
}

Next Steps

This article demonstrated that user driven criteria in MongoDB vector search can also be treated as preferences rather than strict requirements. In the MongoDB shell experiment with adjusting your search parameters and conditions and view the affects on the results. Also see how boosting percentages also affect the results.

To deepen your expertise, explore the free MongoDB University courses on vector search at https://learn.mongodb.com/learning-paths/atlas-vector-search. These courses will help you optimize your indexing strategies and unlock advanced vector search capabilities.

We welcome feedback to support@qarbine.com.

References

When applying vector search to your applications, make sure your vector index is created on the correct embedding field (e.g., plot_embedding) and that the model used to create the vectors matches the one used later to obtain the query vector.

About mFlix Movies with Embeddings

https://huggingface.co/datasets/MongoDB/embedded_movies

About Atlas Vector Search

https://www.mongodb.com/docs/atlas/atlas-vector-search/vector-search-overview/

Creating Vector Search Indexes

https://www.mongodb.com/docs/atlas/atlas-vector-search/vector-search-type/#create-an-atlas-vector-search-index

About Qarbine

Qarbine’s integration with MongoDB supports direct, native queries and preserves the flexibility of the document structure, enabling advanced analytics, vector search, and seamless collaboration across teams. This approach allows Qarbine to deliver publication-quality drill-down analysis and actionable insights, making it a strong fit for modern AI and data-driven applications.

Appendix

scaryEmbedding.js

The JavaScript set the embedding for 'scary movies with mythical villians'.
Download the file from here.

funnyEmbedding.js

The JavaScript sets the embedding for 'funny cartoons with family themes'.

Download the file from here.

Vector Utility

This is a simple node.js utility to obtain a vector query value.

const axios = require('axios');
const apiKey = 'YOUR_OPEN_AI_API_KEYi';
async function fetchEmbedding(text) {
  const response = await axios.post('https://api.openai.com/v1/embeddings',
    { input: [text],
            model: 'text-embedding-ada-002',
            },
    { headers: {
      'Content-Type': 'application/json',
      Authorization: 'Bearer ' + apiKey,
      },
    }
  );
  const data = response.data.data[0];
  console.log('queryVector = ', JSON.stringify(data.embedding) );
}
const inputText = 'funny cartoons with family themes';
fetchEmbedding(inputText);

You can pipe the output into a file and then load that file in the MongoDB shell.

Enhancing MongoDB® Atlas Vector Search by Boosting Effective Scores

Overview​

Scenario​

Data Background​

Query Alternatives and Tradeoffs​

1. Pre-filter by Genre in $vectorSearch​

2. Combine Vector Search with Aggregation to Boost Matching Genres​

Summary Table​

Other Options​

Try it Out Yourself​

Prerequisites​

Load the Sample Data​

Create the Search Index​

Query Your Data​

Open AI and Voyage AI Embeddings​

Next Steps​

References​

About Qarbine​

Appendix​

scaryEmbedding.js​

funnyEmbedding.js​

Vector Utility​

Overview

Scenario

Data Background

Query Alternatives and Tradeoffs

1. Pre-filter by Genre in $vectorSearch

2. Combine Vector Search with Aggregation to Boost Matching Genres

Summary Table

Other Options

Try it Out Yourself

Prerequisites

Load the Sample Data

Create the Search Index

Query Your Data

Open AI and Voyage AI Embeddings

Next Steps

References

About Qarbine

Appendix

scaryEmbedding.js

funnyEmbedding.js

Vector Utility