Query

This guide explains how to perform semantic queries on documents in OneNode. Semantic queries retrieve documents by matching the meaning of the provided query text with indexed multimodal data in the database.

Semantic Search Out of the Box!

OneNode provides semantic search capabilities right out of the box, no complex setup required:

  • Semantic Search Support - Intelligent meaning-based queries that understand context and intent
  • Multimodal Data Support - Search across text, images, and other data types seamlessly
  • Zero Configuration - Start searching immediately without complex embedding setup
  • High Performance - Optimized vector search with automatic indexing

Basic Query Operation

The simplest way to use the query operation is to just provide the query text. This offers an easy and intuitive way to search your data semantically without worrying about additional parameters.

Basic Example

# Simple query example
query_text = "Software engineer with expertise in AI"

response = collection.query(query_text)

# Process the results - response is now a list of QueryMatch objects
for match in response:
    print(f"Match: {match.chunk} (Score: {match.score})")

Default Response

A successful query operation will return a JSON array containing the matching documents. By default, the response includes the matched text chunks, their location in the document, similarity scores, and basic document metadata:

[
  {
    "chunk": "John is a software engineer with expertise in AI.",
    "path": "bio",
    "chunk_n": 0,
    "score": 0.95,
    "document": {
      "_id": ObjectId("64d2f8f01234abcd5678ef90")
      // All document fields are returned here (name, bio, skills, etc.)
    }
  },
  {
    "chunk": "Alice is a data scientist with a background in machine learning.",
    "path": "bio",
    "chunk_n": 1,
    "score": 0.89,
    "document": {
      "_id": ObjectId("64d2f8f01234abcd5678ef91")
      // Complete document data is returned by default
    }
  }
]

By default, the system will:

  • Use OpenAI's text-embedding-3-small as the embedding model
  • Return the top 10 matching results
  • Exclude the vector values from the response
  • Return the whole document data, not just minimal metadata

Advanced Query Operation

For more control over your semantic searches, you can customize the query operation with additional parameters. These parameters allow you to fine-tune the search behavior, filter results, and specify what data to include in the response.

Using Filters

The filter parameter allows you to apply MongoDB-style query filters to narrow down documents before performing semantic search.

# Query with filter to search only active users
query_text = "Software engineer with expertise in AI"

response = collection.query(
    query_text,
    filter={"status": "active", "experience": {"$gte": 3}}
)

for match in response:
    print(f"Match: {match.chunk} (Score: {match.score})")
    print(f"User status: {match.document['status']}")
📖 Learn More

For detailed filter syntax and examples, see our Filter Syntax Guide.

Using Projection

The projection parameter controls which fields are included or excluded in the returned documents.

Note: When using projection, the matched chunk may not be available in the response if it comes from a field that's excluded by the projection.

# Query with projection to include only specific fields
query_text = "Data scientist with machine learning experience"

projection = {
    "mode": "include",
    "fields": ["name", "bio", "skills"]
}

response = collection.query(query_text, projection=projection)

for match in response:
         print(f"Match: {match.chunk}") # This will be None if the chunk field is excluded by the projection
    print(f"Name: {match.document.get('name')}")
    print(f"Skills: {match.document.get('skills')}")
📖 Learn More

For comprehensive projection syntax and advanced examples, see our Projection Syntax Guide.

Using Custom Embedding Model

The emb_model parameter allows you to specify which embedding model to use for the query.

# Query with custom embedding model
from onenode import Models

query_text = "Frontend developer with React expertise"

response = collection.query(
    query_text,
    emb_model=Models.TextToEmbedding.OpenAI.TEXT_EMBEDDING_3_LARGE
)

for match in response:
    print(f"Match: {match.chunk} (Score: {match.score})")
    print(f"Document ID: {match.document['_id']}")
📖 Learn More

For a complete list of available embedding models and their specifications, see our Embedding Models Guide.

Limiting Results with top_k

The top_k parameter controls the maximum number of results returned from the query.

# Query with limited results
query_text = "Python developer with Django experience"

response = collection.query(query_text, top_k=5)

print(f"Found {len(response)} matches:")
for i, match in enumerate(response, 1):
    print(f"{i}. {match.chunk} (Score: {match.score:.3f})")

Including Embedding Values

The include_embedding parameter determines whether to include the raw embedding vector values in the response.

# Query with embedding values included
query_text = "DevOps engineer with Kubernetes skills"

response = collection.query(query_text, include_embedding=True)

for match in response:
    print(f"Match: {match.chunk} (Score: {match.score})")
    if match.embedding:
        print(f"Embedding dimensions: {len(match.embedding)}")
        print(f"First few values: {match.embedding[:5]}")

Parameters for Query Operations

ParameterDescription
queryThe text to be embedded and matched against stored indexed fields. This parameter is required.
filter (optional)MongoDB-style query filter to apply to documents before semantic search. This helps narrow down the document set before performing the semantic search.
projection (optional)Specifies which fields to include or exclude in the returned documents. Format: {"mode": "include", "fields": ["field1", "field2"]} or{"mode": "exclude", "fields": ["field3"]}.
emb_model (optional)The embedding model used for the query. Defaults to OpenAI's text-embedding-3-small. Users can select from supported embedding models. If the specified model does not match those used in the stored data, only matching fields will be targeted.
Note: Use embModel in TypeScript/JavaScript.
top_k (optional)The maximum number of matches to return. Defaults to 10. Increase this value to get more results, decrease it to improve performance and reduce response size.
Note: Use topK in TypeScript/JavaScript.
include_embedding (optional)Whether to include the embedding vector values in the response. Defaults to false. Set to true if you need the raw vector data for further processing.
Note: Use includeEmbedding in TypeScript/JavaScript.