Skip to main content
Speculative decoding with lookahead decoding accelerates inference for predictable workloads using n-gram patterns.

Quick start

trt_llm:
  build:
    speculator:
      enable_b10_lookahead: true
      speculative_decoding_mode: LOOKAHEAD_DECODING
      windows_size: 8
      ngram_size: 1
      verification_set_size: 1

Engine compatibility

FeatureEngine-Builder-LLMBIS-LLM
Lookahead decoding✅ Supported✅ Gated Feature
Structured outputs❌ Incompatible✅ Supported
Tool calling❌ Incompatible✅ Supported
Eagle speculation❌ Not supported✅ Gated Feature

Configuration examples

Code generation (Qwen2.5-Coder)

model_name: Qwen2.5-Coder-7B-Lookahead
resources:
  accelerator: H100
trt_llm:
  build:
    base_model: decoder
    checkpoint_repository:
      source: HF
      repo: "Qwen/Qwen2.5-Coder-7B-Instruct"
    quantization_type: fp8_kv
    speculator:
      enable_b10_lookahead: true
      speculative_decoding_mode: LOOKAHEAD_DECODING
      windows_size: 3
      ngram_size: 8
      verification_set_size: 3

Large model (Llama-3.3-70B)

model_name: Llama-3.3-70B-Lookahead
resources:
  accelerator: H100:2
trt_llm:
  build:
    base_model: decoder
    checkpoint_repository:
      source: HF
      repo: "meta-llama/Llama-3.3-70B-Instruct"
    quantization_type: fp8_kv
    tensor_parallel_count: 2
    speculator:
      enable_b10_lookahead: true
      speculative_decoding_mode: LOOKAHEAD_DECODING
      windows_size: 3
      ngram_size: 5
      verification_set_size: 3

Parameter tuning

See lookahead decoding documentation for detailed parameter explanations. Quick guidelines:
  • windows_size: 1-7 (set to 1 for predictable content, 3 or 5 for others.)
  • ngram_size: 4-32 (large for code, smaller for creative tasks)
  • verification_set_size: Usually equal to windows_size

Use cases

Use casewindows_sizengram_sizeWhy
Code generation73Code patterns, smaller n-grams
free form JSON/YAML55Balanced for structured data
Template completion7-105-7Highly predictable content

Limitations

Not compatible with:

Further reading