Skip to main content

Enhancing Couchbase® Vector Search by Boosting Effective Scores

Overview

In this article, we explore how to apply Couchbase SQL++ querying features to achieve preference-based retrieval boosting alongside vector searching. Using a movies collection with embeddings we demonstrate increasing the effective movie recommendation score by 10% for movies rated “PG13”. This is in contrast to having a rated requirement of “PG13”. The result set is then sorted by this boosted score, ensuring that personalized preferences are slightly prioritized in the recommendations.

Scenario

You provide a movie recommendation service which allows users to find movies by describing what they want in natural language. By embedding movie plots as vectors, the application can match user queries to the most relevant films, even when the input is general or open-ended.

To enhance the recommendations you want to consider user preferences such as for movies rated PG13. The approach described below ensures users see suggestions that closely match their interests while still offering a broad and personalized selection of movies.More complex boosting criteria can be applied using various Couchbase query features and functions.

About the Vector Index

The sample dataset’s embeddings were generated using Open AI’s text-embedding-ada-002 model. A vector search index was created with the following settings

  

  

We can now use the SEARCH function to perform a vector search on the movies.

SELECT SEARCH_META().score as score, rated, title, plot
FROM `travel-sample`.inventory.`movies_with_array_embedding`
WHERE
SEARCH( movies_with_array_embedding,
{
"fields": ["*"],
"knn": [
{
"k": 30,
"field": "embeddingz",
"vector": [ . . . ]
}
]
}
)
order by score desc
limit 20

The vector value used in the query can be provided programmatically or in some other manner depending on the environment and tool in use. For example, Qarbine can dynamically determine a vector and then embed the value within the query for use at execution time.

When you use SEARCH() in the WHERE clause, Couchbase returns up to k (here, 30) of the nearest neighbors, but the result order is not guaranteed unless you specify ORDER BY. The result set may appear in the order the Search Service returns, which is often by score, but not always. N1QL does not enforce any order unless you tell it to.

The answer set is shown below.

# Title Rated Score Plot (truncated)
1FantasiaG2.608A collection of animated interpretations of great…
2In Old ArizonaG2.365A charming, happy-go-lucky bandit in old Arizona…
3The Three CaballerosPG2.324Donald receives his birthday gifts, which includ…
4CinderellaPG2.323When Cinderella's cruel stepmother prevents her …
5In Old ChicagoG2.300The O'Leary brothers -- honest Jack and roguish …
6Red RiverG2.280Dunson leads a cattle drive, the culmination of …
7PinocchioG2.269A living puppet, with the help of a cricket as h…
8Tarzan the Ape ManR2.248A trader and his daughter set off in search of t…
9Yellow SkyPG132.239Pistol-packing tomboy, and grandfather come to d…
10Snow White and the Seven DwarfsR2.223Snow White, pursued by a jealous queen, hides wi…
11Jungle BookPG132.215A boy raised by wolves tries to adapt to human v…
12Oklahoma!R2.203In the Oklahoma territory at the turn of the twe…
13The Crowd RoarsPG2.202Famous motor-racing champion Joe Greer returns t…
14House of DraculaG2.202Count Dracula and the Wolf Man seek a cure for t…
15ScaramoucheG2.201After Andre Moreau finds he is the secret bastar…
16The Blue BirdG2.197Two peasant children, Mytyl and Tyltyl, are led …
17ShaneG2.195A weary gunfighter attempts to settle down with …
18The Son of KongR2.190The men who captured the giant ape King Kong, re…
19The Ghost Goes WestG2.187An American businessman's family convinces him t…
20Of Human HeartsPG2.181This is a story about family relationships, set …

Boosting Vector Search Scores

Here is an approach to boost scores using a rated PG13 preference.

SELECT SEARCH_META().score as score, rated, title, plot, 
CASE WHEN rated = "PG13" THEN score * 1.1 ELSE score END AS boosted_score
FROM `travel-sample`.inventory.`movies_with_array_embedding`
WHERE
SEARCH( movies_with_array_embedding,
{
"fields": ["*"],
"knn": [
{
"k": 30,
"field": "embeddingz",
"vector": [ . . . ]
}
]
}
)
order by boosted_score desc
limit 20

The effective answer set is shown below.

# Title Rated Score Boosted Score Plot (truncated)
1FantasiaG2.6082.608A collection of animated interpretations of great…
2Yellow SkyPG132.2392.463Pistol-packing tomboy, and grandfather come to d…
3Jungle BookPG132.2152.437A boy raised by wolves tries to adapt to human v…
4The Thief of BagdadPG132.1522.368A recalcitrant thief vies with a duplicitous Mon…
5In Old ArizonaG2.3652.365A charming, happy-go-lucky bandit in old Arizona…
6The Three CaballerosPG2.3242.324Donald receives his birthday gifts, which includ…
7CinderellaPG2.3232.323When Cinderella's cruel stepmother prevents her …
8In Old ChicagoG2.3002.300The O'Leary brothers -- honest Jack and roguish …
9Red RiverG2.2802.280Dunson leads a cattle drive, the culmination of …
10PinocchioG2.2692.269A living puppet, with the help of a cricket as h…
11Tarzan the Ape ManR2.2482.248A trader and his daughter set off in search of t…
12Snow White and the Seven DwarfsR2.2232.223Snow White, pursued by a jealous queen, hides wi…
13Oklahoma!R2.2032.203In the Oklahoma territory at the turn of the twe…
14The Crowd RoarsPG2.2022.202Famous motor-racing champion Joe Greer returns t…
15House of DraculaG2.2022.202Count Dracula and the Wolf Man seek a cure for t…
16ScaramoucheG2.2012.201After Andre Moreau finds he is the secret bastar…
17The Blue BirdG2.1972.197Two peasant children, Mytyl and Tyltyl, are led …
18ShaneG2.1952.195A weary gunfighter attempts to settle down with …
19The Son of KongR2.1902.190The men who captured the giant ape King Kong, re…
20The Ghost Goes WestG2.1872.187An American businessman's family convinces him t…

PG13 Preferences Impact

Boosting PG13 movies by 10% moved them significantly higher in the ranking. The relative position of PG13 movies improves compared to non-boosted scores, sometimes overtaking R, G, or PG movies with similar or slightly higher original scores.

The #1 spot ("Fantasia", G) is unchanged because its score is still higher than any boosted PG13. Let’s compare the two answer sets and highlight the row ordering changes due to the 10% score boost for "PG13" movies.

The largest jumps were:

Title Original Rank Boosted Rank Score Boosted Score
Yellow Sky922.2392.463
Jungle Book1132.2152.437
The Thief of Bagdad>2042.1522.368

Non PG13 movies which dropped include:

Title Original Rank Boosted Rank
In Old Arizona25
The Three Caballeros36
Cinderella47

Conclusion

The approach described here provides an example from which to apply preferences or other weighting criteria within your own Couchbase applications

References

https://docs.couchbase.com/server/current/vector-search/vector-search.html