Vision

Model APIs support both text and vision inputs, but multimodal capability depends on the underlying model. Vision-capable models accept images alongside text in the same request, using the OpenAI-compatible image_url content type. The model processes both modalities together, so it can answer questions about image content, compare multiple images, or extract structured data from screenshots. Not all models support vision. Check the table below before sending image inputs.

Supported models

Model	Slug
Kimi K2.5	`moonshotai/Kimi-K2.5`

Send a vision request

Use the image_url content type to include images in your messages.

Python
JavaScript
cURL

from openai import OpenAI
import os

client = OpenAI(
    base_url="https://inference.baseten.co/v1",
    api_key=os.environ["BASETEN_API_KEY"],
)

response = client.chat.completions.create(
    model="moonshotai/Kimi-K2.5",
    messages=[
        {
            "role": "user",
            "content": [
                {
                    "type": "text",
                    "text": "Describe the natural environment in the image.",
                },
                {
                    "type": "image_url",
                    "image_url": {
                        "url": "https://huggingface.co/datasets/YiYiXu/testing-images/resolve/main/seashore.png"
                    },
                },
            ],
        }
    ],
)

print(response.choices[0].message.content)

import OpenAI from "openai";

const client = new OpenAI({
    baseURL: "https://inference.baseten.co/v1",
    apiKey: process.env.BASETEN_API_KEY,
});

const response = await client.chat.completions.create({
    model: "moonshotai/Kimi-K2.5",
    messages: [
        {
            role: "user",
            content: [
                {
                    type: "text",
                    text: "Describe the natural environment in the image.",
                },
                {
                    type: "image_url",
                    image_url: {
                        url: "https://huggingface.co/datasets/YiYiXu/testing-images/resolve/main/seashore.png",
                    },
                },
            ],
        },
    ],
});

console.log(response.choices[0].message.content);

curl https://inference.baseten.co/v1/chat/completions \
  -H "Content-Type: application/json" \
  -H "Authorization: Api-Key $BASETEN_API_KEY" \
  -d '{
    "model": "moonshotai/Kimi-K2.5",
    "messages": [
      {
        "role": "user",
        "content": [
          {
            "type": "text",
            "text": "Describe the natural environment in the image."
          },
          {
            "type": "image_url",
            "image_url": {
              "url": "https://huggingface.co/datasets/YiYiXu/testing-images/resolve/main/seashore.png"
            }
          }
        ]
      }
    ]
  }'

Image constraints

Pass images as URLs or as base64-encoded data.

Constraint	Limit
Max size per image (URL)	10 MB
Max total media size per request (URL)	50 MB
Max images per request	8
Max request size (base64)	50 MB

Pricing

There is no additional per-image fee. Images are converted to input tokens and priced at the model’s standard input rate. Higher resolution images produce more tokens and cost more to process. The exact conversion from pixels to tokens depends on the model. For example, Kimi K2.5 divides each image into 14×14 pixel tiles where each tile becomes one input token. At Kimi K2.5’s input rate of $0.60 per million tokens:

Image resolution	Tiles	Input tokens	Cost at $0.60/M
256×256	324	324	$0.0002
512×512	1,296	1,296	$0.0008
1024×1024	5,329	5,329	$0.0032
1920×1080	10,234	10,234	$0.0061

For videos, token count scales with both resolution and the number of sampled frames.

Get started

About Baseten

Inference

Development

Deployment

Engines

Training

Organization

Observability

Troubleshooting

Supported models

Send a vision request

Image constraints

Pricing

Get started

About Baseten

Inference

Development

Deployment

Engines

Training

Organization

Observability

Troubleshooting

​Supported models

​Send a vision request

​Image constraints

​Pricing

Supported models

Send a vision request

Image constraints

Pricing