Named entity recognition (NER) classifies each token in an input string into entity categories such as person (Documentation Index
Fetch the complete documentation index at: https://docs.baseten.co/llms.txt
Use this file to discover all available pages before exploring further.
PER), organization (ORG), location (LOC), and miscellaneous (MISC). NER models use the ForTokenClassification architecture and the /predict_tokens endpoint. NER requires BEI-Bert (base_model: encoder_bert); BEI does not support token-level outputs.
Recommended models
dslim/bert-base-NER-uncased: fast, compact NER for English. (Truss example)tanaos/tanaos-NER-v1: general-purpose NER.
Configuration
Add toconfig.yaml:
Request format
| Field | Type | Description |
|---|---|---|
inputs | list of list of strings | Batched text inputs to classify. Each inner list is a batch of texts. |
raw_scores | boolean | When true, returns raw logit scores for all labels per token. When false, returns the top predicted label with its probability. |
truncate | boolean | Truncates inputs that exceed the model’s max sequence length. |
truncation_direction | string | Controls which end is truncated. Defaults to "Right". |
aggregation_strategy | string | Merges sub-word tokens into entity spans. Accepts "none", "simple", "first", "average", or "max". Use "max" to match transformers.pipeline("ner", aggregation_strategy="max"). Use "none" for token-level predictions. |
Response format
Withaggregation_strategy: "max" (recommended for production):
aggregation_strategy: "none" and raw_scores: true (token-level with BIO labels):
B- marks the beginning of an entity, I- marks a continuation, and O means outside any entity.
Python example
Using the Baseten Performance Client:/predict_tokens directly. The route also supports async inference.
Related
- BEI-Bert overview: Bidirectional encoder engine that hosts NER deployments.
- BEI configuration reference: Full
trt_llmschema for build and runtime fields.