Milvus
Overview
Milvus is an open source vector database for multi-modal AI applications. More information on Milvus can be found at https://milvus.io/. Qarbine supports native Milvus vector query interactions. This can take the form of a JSON specification, Qarbine’s SQL interface, or a combination of the two. Qarbine can easily analyze this deeply nested data and format an analysis report. This interaction can also be embedded into applications for a seamless end user experience. The results can then be exported into various popular formats and easily shared within leading collaboration tools.
Defining a Data Source
Overview
A Data Source is a Qarbine component responsible for retrieving data from somewhere. At a high level it has a name, a description and some arbitrary query string which when sent to the associated Qarbine Data Service endpoint returns some data. The overall execution flow for an analysis, including the optional prompt component, is shown below.
A single data source can be referenced by name from multiple Qarbine template components. This enables a single point of change when perhaps, an index is added, or some other query tweak is necessary. The alternative is to attempt to find all templates impacted by a schema or index change for example. This component reusability is especially beneficial when team members have varying roles and skills.
Enhanced Querying Options
Qarbine queries to Milvus use version 1 of the Milvus REST API. The full expressive power of Milvus is available in 3 forms:
- JSON specification,
- SQL-like and
- hybrid of the two.
JSON Specification
One way to specify a Milvus query in Qarbine is to use a JSON-like structure. Below is an example to retrieve up to 10 matches from the quick_setup collection.
{
"collectionName": "quick_setup",
"annsField" : "vector",
"limit": 3,
"data": [ [0.19886813, 0.060235605, 0.6976963, 0.26144746, 0.8387295] ]
}
Qarbine SQL Interaction
Recall that Milvus supports semantic (i.e. vector) search and a lexical (i.e. scalar/matching) search. The use of the specification structure described above can be a bit verbose and cumbersome though. To improve readability and productivity when authoring Milvus retrievals, Qarbine provides a SQL oriented option. For example, here is an example of a vector search retrieval for the quick_setup collection.
{
collectionName: quick_setup,
outputFields: [ *],
annsField: 'vector',
data : [ [ 0.19886813, 0.060235605, 0.6976963, 0.26144746, 0.8387295] ],
filter: "color in ['pink_8682', 'red_9392']",
limit: 10
}
The Qarbine SQL equivalent is simply
select *
from quick_setup
where nearVector (0.19886813, 0.060235605, 0.6976963, 0.26144746, 0.8387295)
and withOption('annsField', "vector")
and color in ('pink_8682', 'red_9392')
limit 10
This example shows the use of Qarbine’s SQL functions nearVector() and withOption(). When run, Qarbine automatically interfaces with an embedding service to obtain the raw embedding for the nearText() phrase. That embedding is then used in the call to Milvus Cloud. In the end, a native interaction with Milvus using its REST API is performed. The Milvus answer set is accepted and moved along the execution pathway.
Answer Set Row Shape
Milvus can store complex JSON documents. Below is an example row.
{
'title': 'The Reported Mortality Rate of Coronavirus Is Not Important',
'title_vector': [0.041732933, 0.013779674, -0.027564144, ..., 0.030096486],
'article_meta': {
'link': 'https://medium.com/swlh/the-important-369989c8d912',
'reading_time': 13,
'publication': 'The Startup',
'claps': 1100,
'responses': 18,
'tag_1': [4, 15, 6, 7, 9],
'tag_2': [[2, 3, 4], [7, 8, 9], [5, 6, 1]]
}
}
Example
The data source query specification below retrieves medium articles semantically near the phrase “dracula”.
select * from medium_articles
where reading_time > 10
and nearText("python", "myGoogleAI")
order by reading_time
limit 3
When run, Qarbine automatically interfaces with Google Gemini in this example to obtain the raw embedding value for the nearText() phrase. That embedding is then used in the call to Milvus Cloud using the REST endpoint. The Milvus answer set is accepted and moved along the execution pathway.
The results are shown below.
A sample result element is shown below.
This example is in the catalog at “example/Milvus/Search articles for python”.
Managing Answer Set Size
The default maximum number of rows starts off at 25 for a new data source. This is useful to evolve a query from a concept to one that you have verified returns the desired answer set. As noted, any native way of limiting an answer set size is the preferred approach. This setting is in the component dialog as shown below and also accessible by clicking the ‘Gear’ icon.
Once you are done drafting you can adjust this parameter. A “0” indicates there is no maximum. A number greater than 0 indicates to limit the final answer set size to that number of rows. This answer set truncation comes after any native query limit. So, if the answer set from the data endpoint is quite large, that content has to be returned to the Qarbine host. It then may truncate the number of rows. It is best to truncate at the query level (i.e., use a limit) to reduce the content sent from the data endpoint to the Qarbine host in the first place.
Adjusting Maximum Rows
Recall the default maximum rows at the component level is 25. When you are satisfied with your query you can change that setting by clicking.
Adjust the setting to “0” indicating no Qarbine answer set truncation.
To close the dialog click
To keep this setting save the data source to the catalog.
Prompt Integration
Overview
Qarbine prompts provide a way to obtain runtime values and variables for data source and template execution. To avoid hardcoding, prompts can use macro formulas to run queries which populate list widgets. Prompts are defined in a no code manner using the Prompt Designer. Shown below is the execution flow when there is a Prompt component.
The Prompt Designer supports a large variety of input widgets including entry fields, check boxes, radio button groups, sliders, and file input.
Example
The Qarbine Prompt Designer supports 10+ different widget types. Below is the prompt for this example which has a heading and text input widget.
The prompt can be defined with the following elements.
The primary properties of the first element are shown below.
The primary properties of the second element are shown below.
This prompt is in the catalog at “example/Milvus/Prompt for @userPhrase”.
Adjusting the Data Source
The prompt sets a runtime variable named ‘@userPhrase’ which propagates along to the execution process. A new data source can be defined to use this value based on the following query specification.
select * from medium_articles
where reading_time > 10
and nearText(@userPhrase, "myGoogleAI")
order by reading_time
limit 3
It uses the convenient SQL-like syntax and includes a Qarbine provides post Milvus answer set sorting stage as well. These are unique Qarbine features that enhance the Milvus data analysis experience.
This example data source is in the catalog at “example/Milvus/Search articles for @userPhrase”. Its properties referencing the prompt are noted below.
When run the prompt is presented, the variable substituted, and the query run, The answer set is then displayed.
Defining an Analysis Template
Overview
A template defines how to process the data being retrieved from Data Source queries and other data expressions. It also defines formulas, formatting options, and other analysis and presentation options. The overall execution flow for an analysis, including the optional prompt component, is shown below
Using the Template Designer
The result of running the about to be described template is shown below.
A Template defines how to analyze and present the retrieved query data. Qarbine has numerous options to produce publication quality output which is also interactive.
In this example the link and copy images are interactive elements.
Click on opens a browser tab on the medium article.
Clicking on copies the medium article link to the clipboard.
Qarbine “custom cells” are used for the Milvus log and the star claps images.
The Milvus vector search distance is included in the output with 2 decimal places of precision.
In some cases the use of Qarbine template can completely avoid the tedious coding of traditional web pages. This dramatically saves developer time and provides a highly interactive experience for end-users.
This template is in the catalog at “example/Milvus/Medium articles for @userPhrase”.
The template’s primary properties of interest are noted below.
Since the data source references a prompt, the template need not reference one in its properties.
Next Steps
Accessing Your Database
To configure access to your database see the guides at
http://doc.qarbine.com/docs/category/data-service-configuration
Querying Your Database
For database specific interaction guides navigate to
http://doc.qarbine.com/docs/category/data-source-designer
References
A good starting point for further information on querying Milvis can be found at https://milvus.io/docs/v2.0.x/search_and_query.md