Enhancing Couchbase® Vector Search by Boosting Effective Scores
Overview
In this article, we explore how to apply Couchbase SQL++ querying features to achieve preference-based retrieval boosting alongside vector searching. Using a movies collection with embeddings we demonstrate increasing the effective movie recommendation score by 10% for movies rated “PG13”. This is in contrast to having a rated requirement of “PG13”. The result set is then sorted by this boosted score, ensuring that personalized preferences are slightly prioritized in the recommendations.
Scenario
You provide a movie recommendation service which allows users to find movies by describing what they want in natural language. By embedding movie plots as vectors, the application can match user queries to the most relevant films, even when the input is general or open-ended.
To enhance the recommendations you want to consider user preferences such as for movies rated PG13. The approach described below ensures users see suggestions that closely match their interests while still offering a broad and personalized selection of movies.More complex boosting criteria can be applied using various Couchbase query features and functions.
About the Vector Index
The sample dataset’s embeddings were generated using Open AI’s text-embedding-ada-002 model. A vector search index was created with the following settings
Standard Vector Search
We can now use the SEARCH function to perform a vector search on the movies.
SELECT SEARCH_META().score as score, rated, title, plot
FROM `travel-sample`.inventory.`movies_with_array_embedding`
WHERE
SEARCH( movies_with_array_embedding,
{
"fields": ["*"],
"knn": [
{
"k": 30,
"field": "embeddingz",
"vector": [ . . . ]
}
]
}
)
order by score desc
limit 20
The vector value used in the query can be provided programmatically or in some other manner depending on the environment and tool in use. For example, Qarbine can dynamically determine a vector and then embed the value within the query for use at execution time.
When you use SEARCH() in the WHERE clause, Couchbase returns up to k (here, 30) of the nearest neighbors, but the result order is not guaranteed unless you specify ORDER BY. The result set may appear in the order the Search Service returns, which is often by score, but not always. N1QL does not enforce any order unless you tell it to.
The answer set is shown below.
| # | Title | Rated | Score | Plot (truncated) |
|---|---|---|---|---|
| 1 | Fantasia | G | 2.608 | A collection of animated interpretations of great… |
| 2 | In Old Arizona | G | 2.365 | A charming, happy-go-lucky bandit in old Arizona… |
| 3 | The Three Caballeros | PG | 2.324 | Donald receives his birthday gifts, which includ… |
| 4 | Cinderella | PG | 2.323 | When Cinderella's cruel stepmother prevents her … |
| 5 | In Old Chicago | G | 2.300 | The O'Leary brothers -- honest Jack and roguish … |
| 6 | Red River | G | 2.280 | Dunson leads a cattle drive, the culmination of … |
| 7 | Pinocchio | G | 2.269 | A living puppet, with the help of a cricket as h… |
| 8 | Tarzan the Ape Man | R | 2.248 | A trader and his daughter set off in search of t… |
| 9 | Yellow Sky | PG13 | 2.239 | Pistol-packing tomboy, and grandfather come to d… |
| 10 | Snow White and the Seven Dwarfs | R | 2.223 | Snow White, pursued by a jealous queen, hides wi… |
| 11 | Jungle Book | PG13 | 2.215 | A boy raised by wolves tries to adapt to human v… |
| 12 | Oklahoma! | R | 2.203 | In the Oklahoma territory at the turn of the twe… |
| 13 | The Crowd Roars | PG | 2.202 | Famous motor-racing champion Joe Greer returns t… |
| 14 | House of Dracula | G | 2.202 | Count Dracula and the Wolf Man seek a cure for t… |
| 15 | Scaramouche | G | 2.201 | After Andre Moreau finds he is the secret bastar… |
| 16 | The Blue Bird | G | 2.197 | Two peasant children, Mytyl and Tyltyl, are led … |
| 17 | Shane | G | 2.195 | A weary gunfighter attempts to settle down with … |
| 18 | The Son of Kong | R | 2.190 | The men who captured the giant ape King Kong, re… |
| 19 | The Ghost Goes West | G | 2.187 | An American businessman's family convinces him t… |
| 20 | Of Human Hearts | PG | 2.181 | This is a story about family relationships, set … |
Boosting Vector Search Scores
Here is an approach to boost scores using a rated PG13 preference.
SELECT SEARCH_META().score as score, rated, title, plot,
CASE WHEN rated = "PG13" THEN score * 1.1 ELSE score END AS boosted_score
FROM `travel-sample`.inventory.`movies_with_array_embedding`
WHERE
SEARCH( movies_with_array_embedding,
{
"fields": ["*"],
"knn": [
{
"k": 30,
"field": "embeddingz",
"vector": [ . . . ]
}
]
}
)
order by boosted_score desc
limit 20
The effective answer set is shown below.
| # | Title | Rated | Score | Boosted Score | Plot (truncated) |
|---|---|---|---|---|---|
| 1 | Fantasia | G | 2.608 | 2.608 | A collection of animated interpretations of great… |
| 2 | Yellow Sky | PG13 | 2.239 | 2.463 | Pistol-packing tomboy, and grandfather come to d… |
| 3 | Jungle Book | PG13 | 2.215 | 2.437 | A boy raised by wolves tries to adapt to human v… |
| 4 | The Thief of Bagdad | PG13 | 2.152 | 2.368 | A recalcitrant thief vies with a duplicitous Mon… |
| 5 | In Old Arizona | G | 2.365 | 2.365 | A charming, happy-go-lucky bandit in old Arizona… |
| 6 | The Three Caballeros | PG | 2.324 | 2.324 | Donald receives his birthday gifts, which includ… |
| 7 | Cinderella | PG | 2.323 | 2.323 | When Cinderella's cruel stepmother prevents her … |
| 8 | In Old Chicago | G | 2.300 | 2.300 | The O'Leary brothers -- honest Jack and roguish … |
| 9 | Red River | G | 2.280 | 2.280 | Dunson leads a cattle drive, the culmination of … |
| 10 | Pinocchio | G | 2.269 | 2.269 | A living puppet, with the help of a cricket as h… |
| 11 | Tarzan the Ape Man | R | 2.248 | 2.248 | A trader and his daughter set off in search of t… |
| 12 | Snow White and the Seven Dwarfs | R | 2.223 | 2.223 | Snow White, pursued by a jealous queen, hides wi… |
| 13 | Oklahoma! | R | 2.203 | 2.203 | In the Oklahoma territory at the turn of the twe… |
| 14 | The Crowd Roars | PG | 2.202 | 2.202 | Famous motor-racing champion Joe Greer returns t… |
| 15 | House of Dracula | G | 2.202 | 2.202 | Count Dracula and the Wolf Man seek a cure for t… |
| 16 | Scaramouche | G | 2.201 | 2.201 | After Andre Moreau finds he is the secret bastar… |
| 17 | The Blue Bird | G | 2.197 | 2.197 | Two peasant children, Mytyl and Tyltyl, are led … |
| 18 | Shane | G | 2.195 | 2.195 | A weary gunfighter attempts to settle down with … |
| 19 | The Son of Kong | R | 2.190 | 2.190 | The men who captured the giant ape King Kong, re… |
| 20 | The Ghost Goes West | G | 2.187 | 2.187 | An American businessman's family convinces him t… |
PG13 Preferences Impact
Boosting PG13 movies by 10% moved them significantly higher in the ranking. The relative position of PG13 movies improves compared to non-boosted scores, sometimes overtaking R, G, or PG movies with similar or slightly higher original scores.
The #1 spot ("Fantasia", G) is unchanged because its score is still higher than any boosted PG13. Let’s compare the two answer sets and highlight the row ordering changes due to the 10% score boost for "PG13" movies.
The largest jumps were:
| Title | Original Rank | Boosted Rank | Score | Boosted Score |
|---|---|---|---|---|
| Yellow Sky | 9 | 2 | 2.239 | 2.463 |
| Jungle Book | 11 | 3 | 2.215 | 2.437 |
| The Thief of Bagdad | >20 | 4 | 2.152 | 2.368 |
Non PG13 movies which dropped include:
| Title | Original Rank | Boosted Rank |
|---|---|---|
| In Old Arizona | 2 | 5 |
| The Three Caballeros | 3 | 6 |
| Cinderella | 4 | 7 |
Conclusion
The approach described here provides an example from which to apply preferences or other weighting criteria within your own Couchbase applications