image_url content type.
The model processes both modalities together, so it can answer questions about
image content, compare multiple images, or extract structured data from
screenshots.
Not all models support vision. Check the table below before sending image
inputs.
Supported models
| Model | Slug |
|---|---|
| Kimi K2.5 | moonshotai/Kimi-K2.5 |
| Kimi K2.6 | moonshotai/Kimi-K2.6 |
Send a vision request
Use theimage_url content type to include images in your messages.
Baseten retrieves image URLs from the inference service, so the URL must be reachable over HTTPS from Baseten’s environment (for example your own object storage, Hugging Face artifact links, or other hosts that allow server-side fetches). Prefer stable, direct HTTPS links.
Optional image_url.detail controls preprocessing resolution: low, high, original, or auto (OpenAI-compatible). When in doubt, use auto.
- Python
- JavaScript
- cURL
Image and video limits (Model APIs)
For Kimi, multimodal limits come from the model’s deployment config (b10_vision_config under baseten/mp/baseten_dynamo/deploy/model-apis/) and its encoder template (baseten/mp/baseten_dynamo/cache_aware_routing_trtllm/encoder/template_configs/).
| Limit | Kimi K2.5 | Kimi K2.6 |
|---|---|---|
| Max images per request | 96 | 96 |
| Max videos per request | 12 | 12 |
| Max total media size per request (URL) | 240 MB | 240 MB |
| Max size per image (URL) | 90 MB | 80 MB |
| Max request size (base64) | 100 MB | 100 MB |
Other Model APIs models use their own
b10_vision_config values. Confirm limits for a slug in the Baseten app, via /v1/models, or by reading that model’s YAML under deploy/model-apis/.Pricing
There is no additional per-image fee. Images are converted to input tokens and priced at the model’s standard input rate. Higher resolution images produce more tokens and cost more to process. The exact conversion from pixels to tokens depends on the model. Kimi K2.5 and Kimi K2.6 divide each image into 14×14 pixel tiles where each tile becomes one input token. The cost table below uses Kimi K2.5’s uncached input rate ($0.60 per million tokens); for Kimi K2.6 and other models, use the rates in the pricing table on the Model APIs overview.| Image resolution | Tiles | Input tokens | Cost at $0.60/M |
|---|---|---|---|
| 256×256 | 324 | 324 | $0.0002 |
| 512×512 | 1,296 | 1,296 | $0.0008 |
| 1024×1024 | 5,329 | 5,329 | $0.0032 |
| 1920×1080 | 10,234 | 10,234 | $0.0061 |