Skip to main content

Data Source Designer Guide

Overview

The Data Source Designer provides a free form query interface for those skills with authoring database queries. It complements other popular query tools such as MongoDB Compass by providing a centralized catalog to maintain and share queries. The Qarbine Catalog also supports folder visibility and component permissions for even better team collaboration. The overall execution flow for an analysis, including the optional prompt component, is shown below.

  

Component Examples

The standard Qarbine installation includes many types of example components including data sources covering the variety of supported databases. Each Data Source references a Data Service which is defined by the Qarbine administrator. The examples may refer to yet undefined data services. An example of running an AWS DynamoDB example may show.

  

This likely indicates that there is no Qarbine Data Service yet defined by the Qarbine administrator named “AWS DynamoDB”. To run this example component, such a data service must first be defined by the Qarbine administrator. Refer to the associated Qarbine configuration and tutorial documentation for the given data driver for more information. Qarbine has preset access to a set of commonly used sample MongoDB data through the “Sample Data Service” definition.

For some data drivers there is a sample data file and optionally sample code to load that data provided. For example, the Qarbine Administrator can SSH into the Qarbine host and review the contents of the ˜/qarbine.service/sample folder. The Qarbine documentation for the driver will refer to any support sample data and data loader.

Prerequisites

Data service configured to access your data. In this example we will access the MongoDB sample store sales data.

Opening the Tool

You can access the Data Source Designer in several ways. Tools can be opened from the signOn page, the home page, and from various tools using the hamburger    menu.


From the Home tool.

  

Alternatively you can open the tool from the hamburger    menu on each tool.

  

Pressing the control key during the menu click opens the tool in another tab.

The layout of this tool is depicted below.

  

Specifying the Data to Query

Select your data service from the dropdown. The Qarbine administrator manages which ones are visible to any particular sign on.

  

The databases associated with that data service will populate the database dropdown.

  

The “database” concept varies by the actual cloud data endpoint being interacted with. For example, MongoDB and SQL Server have this concept, while Milvus and Weaviate do not.

Select the database “sample_airbnb” database as shown below.

  

On the left hand side the collections drop down will be populated. This area is informational in nature and independent of what you type into the query text field. “Collections” is the generic term. For SQL databases this maps to “tables” for example.

Select a collection from the drop down.

  

Its general structure will be shown.

Main Fields Some Additional Fields
  




  

The large text area in the middle is where you enter your query specification. The syntax varies again by actual cloud data endpoint. It may be MongoDB Query Language (MSQL), SQL, or even a JSON looking like structure. See the Data Source Query Guides for details.

Here is an example MongoDB query.

db.listingsAndReviews.find( { "address.country_code": 'AU'} )
.project( {name: 1, summary: 1, address: 1} )

Note there is a dot before the “project” clause.

  

Clicking the run icon    performs the query and the results are shown in the bottom area.
 

Selecting one of the rows populates the right hand area as shown below.

  

Date Display and Maximum Element Retrieval Options

Note that the sample data above has sale dates stored as UTC dates. By default dates are shown in the local timezone. To change this presentation click the   button to go to the property dialog.

Check the “ISO UTC” option as shown below.

  

This dialog is also one way you can change the maximum number of elements in the answer set you want processed.

NOTE- The “Max Objects” value is an action performed by Qarbine. Specifying zero indicates no limitation on Qarbine’s part. It is best to limit the size of the result set within the query itself which will then be optimized by the target database. While you are formulating your query this may be the best strategy.

Defining Queries

Overview

The syntax of a Data Source queries are based on the underlying database or endpoint. It may range from MongoDB query language, to SQL, to even a JSON object structure. Review the documentation matching the database or endpoint of interest for specific details on query interaction and syntax.

The query text entered can include non-backend native lines of text. The simplest example is starting a line off with “//”. This indicates a comment or other text that is not sent to the backend as part of the native query. It is a convenient way to document your query independently of the data source’s description property. You can also easily enable and disable query snippets.

Data Types

The data types returned vary based on the Qarbine integration driver being used. In general numbers, strings, and booleans are returned as their primitive counterparts. Some drivers encode dates for example as objects with specific fields. Their automatic conversion into JavaScript dates is usually done by the Qarbine integration driver.

Using Pragmas to Manipulate Answer Sets

The queries to your data may be defined using multiple lines of text for formatting and other purposes. Like the ‘//’ comment lines above, there are a few recognized lines that are preprocessed out by Qarbine and not sent to the backend data.

Starts With Description
//Used to comment as noted above.
#pragmaThere are many general pragma keywords recognized as described below and a few for specific data drivers.
Pragma Keyword Description
addArraysThe first 2 arguments are fields that each have array values. The last argument is the name of the field to get them combined. Given this data.   The line
“#pragma addArrays events_properties_names, events_properties_values, events”
results in
  
convertToDateProvide a CSV list of fields to convert from values to dates. The values can be strings or numerics to “new Date(...)”. The convert keywords accept field paths of up to 2 levels. The first level can be a document or an array of documents. This is done in-place.
convertToNumberProvide a CSV list of fields to convert from values to dates. The values can be strings or numerics to “Number(...)”. This is done in-place.
convertToObjectProvide a CSV list of fields to convert strings to JSON objects via JSON.parse(someString). This is done in-place.
convertToObjectsProvide a CSV list of ARRAY fields to convert their element strings to JSON objects via JSON.parse(someString). This is done in-place.
convertToValueListUse this when the raw query result is a list of strings or numbers. It produces a list of objects each with a “sValue” and “ordIndex” fields. For example, the array[ 10, 2, 43] into [ {orgValue:10, orgIndex:1}, {orgValue: 2, orgIndex:2}, {orgValue: 43, orgIndex:3} ]. This keeps the template processing world happy as it expects to iterate over a list of objects, not a list of primitives such as strings or numbers. After applying the sortResultsBy pragma can be used as well.
deleteFieldsProvide a CSV list of fields to delete. The arguments may have field paths of up to 2 levels. The first level can be a document or an array of documents. The deleting is done in-place. This is convenient when the result row/document has many fields and you want all just a few of them. Rather than explicitly list the 20 fields of say 23, just ‘delete’ those 3 from the answer set.
mergeFieldsThere are 3 arguments for each action: key, value, and destination fields. You can have as many triplets per line as you would like and you can have several lines with this pragma. The key and value fields of the row are arrays. They are used to create a new object. That object is then saved into the target field of the result row. If the destination field is ‘row’, then the fields are assigned to the row with no intermediate field.
The line
#pragma mergeFields keys, values, name
with this record
"keys" : ["first", "last"], "values": ["tom", "jones"]
results in the object
name { first: "tom", last” : "jones"}
mergeArraysProvide a CSV list of field names with the last being the field to receive the merged field values. Each of the source fields is an array and they all have the same number of values. The line
#pragma mergeArrays solist_names, soloist_role, soloist_instruments, soloists
with this record
{ "solist_names": [ "Otto, Antoinette", "Timm, Henry C." ], "soloist_role": [ "S", "A" ], "soloist_instruments": [ "Soprano", "Piano" ],}
results in
"soloists": [ { "solist_name": "Otto, Antoinette", "soloist_role": "S", "soloist_instrument": "Soprano" }, { "solist_name": "Timm, Henry C.", "soloist_role": "A", "soloist_instrument": "Piano" } ]
runPostQueryApply either a SQL statement or a MongoDB query to the result data. Use a table name or collection name of ‘data’. The query must be defined on a single line as part of the pragma line. See the sorting discussion below for more information
sortResultByApply a SQL oriented ORDER BY operation to the answer set.
#pragma sortResultBy age desc
If the result is a list of strings or numbers then use
#pragma sortResultBy sValue
See the sorting discussion below for more information.
pullValuesUpProvide a CSV list of dotted paths to pull their values up to the first level. This can be useful when there is an inner field that is better placed at the first level. This line
#pragma pullValuesUp container.subpath
copies the given path’s value to the container_subpath field at the first level.You can set a specific field prefix using the slash syntax,
#pragma pullValuesUp container.subpath/stuff_
pullFieldsUpProvide a CSV list of object fields to pull their contents up to the first level. The original field containing the object is deleted. This can be useful when there are many inner fields that can be part of the first level. Instead of several template formulas like #container.first and #container.last you can simply use #first and #last via the line
#pragma pullFieldsUp container
You can set a specific field prefix using the slash syntax,
#pragma pullValuesUp container.subpath/stuff_

Sorting Considerations

The runPostQuery pragma allows an optional SQL “ORDER BY” clause and the sortResultBy pragma implies such a clause. The default sorting order is ascending and for strings it is case sensitive. To sort strings in a non-case sensitive manner use “COLLATE NOCASE” or one of the “ascIns” and “descIns” shortcuts. For example,

#pragma sortResultBy stringField COLLATE NOCASE DESC
#pragma sortResultBy stringField ascIns
#pragma sortResultBy stringField descIns

Here are some variations of order by snippets.

someField 
someField asc
someField COLLATE NOCASE
someField COLLATE NOCASE asc
someField desc
someField COLLATE NOCASE desc

The someField value is case sensitive. The keywords are not.

Interacting with Query Results

Overview

Toward the right of cent on the tool bar are result oriented buttons.

  

Once you have a result you can then do other actions such as:

  copy it to the clipboard in JSON format.

   view the JSON in another window.

   send it in various formats to a particular destination such as a download folder.

   Open the Ask AI Assistant to interact with the results.

Ask AI Assistant Interaction

Consider the sample query

db.listingsAndReviews.find( { "beds": 4} )
.project( {name: 1, summary: 1, "address.country": 1} )

Run it to obtain a listing of accommodations across a variety of country locations.

Shown below is the Ask AI dialog for result interactions.

  

Note that many AI interactions have limits on the amount of input that can be provided as part of the interaction context. Thus the options related to which position of the result to send. Shown below is an error example.

  

Enter your AI prompt

  

Provide a count of the accommodations based on countries.

Indicate what rows to send.

  

Trim the context based on fields.

  

To send the request click

  

The AI reply is shown below.