Quick start
Engine compatibility
| Feature | Engine-Builder-LLM | BIS-LLM |
|---|---|---|
| Lookahead decoding | ✅ Supported | ✅ Gated Feature |
| Structured outputs | ❌ Incompatible | ✅ Supported |
| Tool calling | ❌ Incompatible | ✅ Supported |
| Eagle speculation | ❌ Not supported | ✅ Gated Feature |
Configuration examples
Code generation (Qwen2.5-Coder)
Large model (Llama-3.3-70B)
Parameter tuning
See lookahead decoding documentation for detailed parameter explanations. Quick guidelines:- windows_size: 1-7 (set to 1 for predictable content, 3 or 5 for others.)
- ngram_size: 4-32 (large for code, smaller for creative tasks)
- verification_set_size: Usually equal to windows_size
Use cases
| Use case | windows_size | ngram_size | Why |
|---|---|---|---|
| Code generation | 7 | 3 | Code patterns, smaller n-grams |
| free form JSON/YAML | 5 | 5 | Balanced for structured data |
| Template completion | 7-10 | 5-7 | Highly predictable content |
Limitations
❌ Not compatible with:- Structured outputs - Use BIS-LLM instead
- Function calling - Use BIS-LLM instead
- BIS-LLM engine - V2 stack doesn’t support lookahead that is self-sericeable.
Further reading
- Lookahead decoding guide - Complete reference config
- Engine-Builder-LLM overview - Dense model engine
- BIS-LLM overview - MoE engine with structured outputs
- Quantization guide - Performance optimization