LLM Models
OneNode uses a two-step embedding architecture to enable true multimodal search across text and images.
The Two-Step Process
OneNode's unique approach uses two specialized models working in sequence:
Vision Model: Visual → Text
Converts images into detailed text descriptions that capture visual content, context, and relationships.
Example
Embedding Model: Text → Vectors
Converts all text (original + vision-generated) into semantic vectors for mathematical comparison.
Example
Step-by-Step: Document Processing
Here's exactly what happens when you store multimodal data:
Simplicity: One embedding model handles all final processing
Interpretability: You can see the text description that caused a match
Extensibility: Add new modalities by converting them to text
Efficiency: Reuses mature text processing infrastructure
Dive deeper into the specific models available in OneNode and learn how to optimize them for your use cases.
How can we improve this documentation?
Share Your Thoughts
Your feedback helps us improve our documentation. Let us know what you think!