Baseten engines including BIS-LLM and Engine-Builder-LLM support function calling, as do Model APIs for instant access. Other inference frameworks like vLLM and SGLang also support it.
How tool calling works
A typical tool-calling loop looks like:- Send the user message and a list of tools.
- The model returns normal text or one or more tool calls (a name and JSON arguments).
- Execute the tool calls in your application.
- Send the tool output back to the model.
- Receive a final response or additional tool calls.
Define tools
Tools can be anything: API calls, database queries, or internal scripts. Docstrings matter. Models use them to decide which tool to call and how to fill parameters:tools.py
Tool-writing tips
Design small, single-purpose tools and document constraints in docstrings (units, allowed values, required fields). Treat model-provided arguments as untrusted input and validate before execution.Serialize functions
Convert functions into JSON-schema tool definitions (OpenAI-compatible format):serialize.py
Call the model
Include thetools array in your request. The payload is the same on both surfaces; Model APIs take a model slug and a shared endpoint, while dedicated deployments use your model’s own URL:
- Model APIs
- Dedicated deployment
call_model.py
Control tool selection
Settool_choice to control how the model uses tools. With auto (default), the model can respond with text or tool calls. With required, the model must return at least one tool call. With none, the model returns plain text only. To force a specific tool:
tool_choice.py
Parse and execute tool calls
Depending on the engine and model, tool calls are typically returned in an assistant message undertool_calls:
parse_tool_calls.py
Full loop: send tool output back for a final answer
If you want the model to turn raw tool output into a user-facing response, append the assistant message and a tool response with the matchingtool_call_id:
full_loop.py
Practical tips
Use low temperature (0.0-0.3) for reliable tool selection and argument values. Addenum and required constraints in your JSON schema to guide model outputs. Consider parallel tool calls only if your model supports them. Always validate and sanitize inputs before calling real systems.
Related
- Chains: Orchestrate multi-step workflows.
- Custom engine builder: Advanced configuration options.