Function calling is supported by Baseten engines including BIS-LLM and Engine-Builder-LLM, as well as Model APIs for instant access. It’s also compatible with other inference frameworks like vLLM and SGLang.
Overview
Function calling (also known as tool calling) lets a model choose a tool and produce arguments based on a user request. Important: the model does not execute your Python function. Your application must:- run the tool, and
- optionally send the tool’s output back to the model to produce a final, user-facing response.
How tool calling works
A typical tool-calling loop looks like:- Send the user message and a list of tools.
- The model returns either normal text or one or more tool calls (name and JSON arguments).
- Execute the tool calls in your application.
- Send tool output back to the model.
- Receive a final response or additional tool calls.
1. Define tools
Tools can be anything: API calls, database queries, internal scripts, etc. Docstrings matter. Models use them to decide which tool to call and how to fill parameters:Tool-writing tips
Design small, single-purpose tools and document constraints in docstrings (units, allowed values, required fields). Treat model-provided arguments as untrusted input and validate before execution.2. Serialize functions
Convert functions into JSON-schema tool definitions (OpenAI-compatible format):3. Call the model
Include thetools array in your request:
4. Control tool selection
Settool_choice to control how the model uses tools. With auto (default), the model may respond with text or tool calls. With required, the model must return at least one tool call. With none, the model returns plain text only. To force a specific tool:
5. Parse and execute tool calls
Depending on the engine and model, tool calls are typically returned in an assistant message undertool_calls:
Full loop: send tool output back for a final answer
If you want the model to turn raw tool output into a user-facing response, append the assistant message and a tool response with the matchingtool_call_id:
Practical tips
Use low temperature (0.0–0.3) for reliable tool selection and argument values. Addenum and required constraints in your JSON schema to guide model outputs. Consider parallel tool calls only if your model supports them. Always validate and sanitize inputs before calling real systems.
Further reading
- Chains: Orchestrate multi-step workflows.
- Custom engine builder: Advanced configuration options.