Vision Models
Vision models enable OneNode to process and understand image content, creating semantic representations that can be used for advanced image search and analysis. These models extract features and context from images, similar to how embedding models work with text.
When working with Image in OneNode, you can specify which vision model to use via the vision_model
parameter.
Supported Vision Models
Model | Provider | Description |
---|---|---|
gpt-4o | OpenAI | High-quality multimodal model capable of understanding images with excellent detail recognition and contextual understanding |
gpt-4o-mini | OpenAI | Smaller, more cost-effective version of GPT-4o with good performance for most image processing tasks |
o4-mini | OpenAI | Advanced mini vision model optimized for efficiency and performance |
o3 | OpenAI | Next-generation vision model with enhanced reasoning capabilities |
o1 | OpenAI | Advanced vision model optimized for high-fidelity understanding of complex visual content |
o1-pro | OpenAI | Professional-grade version of O1 with enhanced capabilities for production use |
gpt-4.1 | OpenAI | Latest iteration of GPT-4 with improved vision understanding capabilities |
gpt-4.1-mini | OpenAI | Compact version of GPT-4.1 optimized for cost-effective vision processing |
gpt-4.1-nano | OpenAI | Ultra-lightweight version of GPT-4.1 for high-volume image processing |
Using Vision Models in OneNode
To specify a vision model when working with Image
, use the vision_model
parameter:
from onenode import Image, Models
{
"product_image": Image("product.jpg", mime_type="image/jpeg").enable_index(
vision_model=Models.ImageToText.OpenAI.GPT_4O
)
# Or from base64 string:
# "product_image": Image(
# "base64_encoded_image_data",
# mime_type="image/jpeg"
# ).enable_index(vision_model=Models.ImageToText.OpenAI.GPT_4O)
}
In the example above, we're using the GPT_4O
model from OpenAI to process and understand the image content.
Combined Usage with Embedding Models
You can use both vision models and embedding models together with Image
to get the benefits of both:
from onenode import Image, Models
{
"product_image": Image("product.jpg", mime_type="image/jpeg").enable_index(
vision_model=Models.ImageToText.OpenAI.GPT_4O,
emb_model=Models.TextToEmbedding.OpenAI.TEXT_EMBEDDING_3_LARGE
)
# Or from binary data:
# with open("product.jpg", "rb") as f:
# "product_image": Image(f.read(), mime_type="image/jpeg").enable_index(
# vision_model=Models.ImageToText.OpenAI.GPT_4O,
# emb_model=Models.TextToEmbedding.OpenAI.TEXT_EMBEDDING_3_LARGE
# )
}
This combination allows OneNode to extract both visual features (via the vision model) and encode textual descriptions (via the embedding model) for comprehensive multimodal search capabilities.
Best Practices
- Use
gpt-4o-mini
oro4-mini
for cost-efficient image processing where the highest level of detail recognition is not required. - Choose
gpt-4o
for high-quality image understanding in production applications. - Consider
o3
oro1
for applications that require the most advanced image understanding capabilities. - Use
gpt-4.1-nano
for high-volume image processing where cost optimization is critical. - When working with a large number of images, be mindful of processing costs and consider using a more economical vision model for initial processing.
How can we improve this documentation?
Share Your Thoughts
Your feedback helps us improve our documentation. Let us know what you think!