LLM Models

OneNode uses a two-step embedding architecture to enable true multimodal search across text and images.

The Two-Step Process

OneNode's unique approach uses two specialized models working in sequence:

Vision Model: Visual → Text

Converts images into detailed text descriptions that capture visual content, context, and relationships.

Example

Input: [Image of a red Tesla in parking lot]

Vision Model Output:

"A red Tesla Model 3 electric sedan parked in an outdoor parking lot with white painted lines. The vehicle features sleek aerodynamic design, chrome door handles, and LED headlights."

Embedding Model: Text → Vectors

Converts all text (original + vision-generated) into semantic vectors for mathematical comparison.

Example

Text Input 1: "I bought a red Tesla last month"

Text Input 2: "A red Tesla Model 3 electric sedan parked..." (from vision)

Result: Both get similar embedding vectors → semantically related

Step-by-Step: Document Processing

Here's exactly what happens when you store multimodal data:

Submit Document: Upload document with Text and Image objects

Vision Processing: Each Image object → detailed text description via vision model

Text Consolidation: Original text + vision descriptions = unified text format

Embedding Generation: All text → semantic vectors via embedding model

Unified Search: Single query finds content across all modalities

🤔 Why Two Steps?

Simplicity: One embedding model handles all final processing
Interpretability: You can see the text description that caused a match
Extensibility: Add new modalities by converting them to text
Efficiency: Reuses mature text processing infrastructure

Learn More About Specific Models

Dive deeper into the specific models available in OneNode and learn how to optimize them for your use cases.

Embedding Models

Text embedding models for semantic search capabilities

→

Vision Models

Image processing and visual understanding models

→

How can we improve this documentation?

Share Your Thoughts

Your feedback helps us improve our documentation. Let us know what you think!

LLM Models

The Two-Step Process

Vision Model: Visual → Text

Example

Embedding Model: Text → Vectors

Example

Step-by-Step: Document Processing

Embedding Models

Vision Models

How can we improve this documentation?

Share Your Thoughts

Got a question? Email us and we'll get back to you within 24 hours.