Query
This guide explains how to perform semantic queries on documents in OneNode DB using the /document/query
endpoint via the REST API. The examples below demonstrate how to leverage EmbJSON for effective semantic searches.
Key Concepts and Query Parameters
EmbJSON Fields
The query
operation works exclusively with EmbJSON fields, such as EmbText
. For more details, refer to the EmbJSON documentation.
Automatic Embedding
The query text is embedded automatically using the specified embedding model, and results are based on their semantic similarity to the stored EmbJSON fields.
Embedding Model Compatibility
The emb_model
specified in the query does not need to exactly match the models used in the EmbJSON fields within the collection. If the embedding model specified in the query differs from the models used in the collection, the query will only search the EmbJSON fields that use the same embedding model as the one specified in the query.
For example, if your collection contains multiple embedding models and you specify a particular model, only the documents using that same model will be targeted. If you need to search across all documents using different embedding models, you must make separate API calls for each embedding model to ensure complete coverage of the collection.
Query Parameters
Parameter | Description | Data Type / Format |
---|---|---|
query | The text to be embedded and matched against stored EmbJSON fields. This parameter is required. | string |
emb_model (optional) | The embedding model used for the query. Defaults to OpenAI's text-embedding-3-small. Users can select from supported embedding models listed at /models. Refer to Supported Embedding Models for more details. If the specified model does not match those used in the stored EmbJSON, only matching fields will be targeted. | string |
top_k (optional) | Specifies the maximum number of top-matching chunks to return, sorted by semantic similarity. Default is 10. Use top_k to control how many chunks are returned, ensuring you receive the most relevant semantic matches. | integer |
include_values (optional) | Boolean flag to include vector values for each matched chunk in the response. Default is false. Set include_values to true to include the actual vector values of each matched chunk in the response. | boolean |
projection (optional) | Defines which fields to return in the response. The default value is { "mode": "exclude" } . Accepts a required mode (include or exclude) and an optional fields list. See the table below for how different values of projection affect the response. | JSON object |
Format of projection
Parameter
Key | Description | Format |
---|---|---|
mode | Required. Specifies whether to include or exclude certain fields in the response. | string ("include" or "exclude" ) |
fields | Optional. A list of specific fields to include or exclude based on the mode setting. | list of strings (["field1", "field2"] ) |
Projection Parameter Scenarios
Example Projection Value | Result |
---|---|
{ "mode": "include" } | The entire document is returned. |
{ "mode": "include", "fields": ["title", "author"] } | Only the title and author fields are returned. |
{ "mode": "exclude" } | Only the _id field is returned. |
{ "mode": "exclude", "fields": ["title", "author"] } | All fields except title and author are returned. |
Query Operation Overview
The query
operation retrieves documents based on semantic matches to the provided query text. Optional parameters like emb_model, top_k, and projection allow you to control the embedding model used, the number of results, and the fields returned.
Endpoint
To execute a query, use the following endpoint:
{collection_url}/document/query
Where {collection_url}
is the complete collection URL including your db_id
and collection_name
.
Example: Python Code for query
Operation
Below is an example of how to perform a semantic query using Python. This example includes EmbJSON fields that align with the type of data you may have inserted previously:
import requests
import json
# Hardcoded API Key and Collection URL (example usage only, not for production)
ONENODE_API_KEY = "your_example_api_key_here"
collection_url = "https://api.onenode.ai/v1/db/123abc/collection/my_collection"
# Query URL
url = f"{collection_url}/document/query"
# Query parameters
query_text = "Software engineer with expertise in AI"
emb_model = "text-embedding-3-small"
top_k = 3
include_values = True
projection = {
"mode": "include",
"fields": ["name", "bio"]
}
# Request body
data = {
"query": query_text,
"emb_model": emb_model,
"top_k": top_k,
"include_values": include_values,
"projection": projection
}
# Headers with API Key
headers = {
"Authorization": f"Bearer {ONENODE_API_KEY}",
"Content-Type": "application/json"
}
# Sending the request
response = requests.post(url, headers=headers, data=json.dumps(data))
# Print the response
print(response.json())
Query Response
A successful query operation will return a JSON response containing matched chunks sorted by semantic similarity. Below is an example response:
{
"matches": [
{
"document": {
"name": "John Doe",
"bio": {
"@embText": {
"text": "John is a software engineer with expertise in AI.",
"model": "text-embedding-3-small"
}
}
},
"path": "bio",
"chunk": "John is a software engineer with expertise in AI.",
"chunk_n": 0,
"score": 0.95,
"values": [
0.123, 0.456, 0.789, ...
]
},
{
"document": {
"name": "Alice Smith",
"bio": {
"@embText": {
"text": "Alice is a data scientist with a background in machine learning.",
"model": "text-embedding-3-small"
}
}
},
"path": "bio",
"chunk": "Alice is a data scientist with a background in machine learning.",
"chunk_n": 1,
"score": 0.89,
"values": [
0.234, 0.567, 0.890, ...
]
}
]
}
The matches field contains an array of documents that were semantically matched, with additional metadata about the matched chunks, such as path, chunk, score, and values if requested. Note that only documents containing EmbJSON fields are returned in the response.
Example Response When Omitting projection
If the projection parameter is omitted, the default behavior is to exclude all fields except the _id field. Below is an example response:
{
"matches": [
{
"document": {
"_id": "64d2f8f01234abcd5678ef90"
},
"path": "bio",
"chunk": "John is a software engineer with expertise in AI.",
"chunk_n": 0,
"score": 0.95,
"values": [
0.123, 0.456, 0.789, ...
]
},
{
"document": {
"_id": "64d2f8f01234abcd5678ef91"
},
"path": "bio",
"chunk": "Alice is a data scientist with a background in machine learning.",
"chunk_n": 1,
"score": 0.89,
"values": [
0.234, 0.567, 0.890, ...
]
}
]
}
In this response, only the _id field is included in the document since the projection parameter was not specified.
If you need additional help with the OneNode DB API, feel free to explore our documentation or reach out to support.