Named entity recognition

Named entity recognition (NER) classifies each token in an input string into entity categories such as person (PER), organization (ORG), location (LOC), and miscellaneous (MISC). NER models use the ForTokenClassification architecture and the /predict_tokens endpoint. NER requires BEI-Bert (base_model: encoder_bert), which produces token-level outputs.

Recommended models

dslim/bert-base-NER-uncased: fast, compact NER for English. (Truss example)
tanaos/tanaos-NER-v1: general-purpose NER.

Configuration

Add to config.yaml:

trt_llm:
  build:
    base_model: encoder_bert
    checkpoint_repository:
      source: HF
      repo: "baseten-admin/bert-base-ner-uncased"
      revision: main
    max_num_tokens: 16384
  runtime:
    webserver_default_route: /predict_tokens

Request format

{
  "inputs": ["Apple is looking at buying U.K. startup for $1 billion"],
  "truncate": true,
  "raw_scores": false,
  "aggregation_strategy": "max"
}

Field	Type	Description
`inputs`	list of strings	Batched text inputs to classify. Each string is classified independently, and the response contains one list of entities per input.
`raw_scores`	boolean	When `true`, returns raw logit scores for all labels per token. When `false`, returns the top predicted label with its probability.
`truncate`	boolean	Truncates inputs that exceed the model’s max sequence length.
`truncation_direction`	string	Controls which end is truncated. Defaults to `"Right"`.
`aggregation_strategy`	string	Merges sub-word tokens into entity spans. Accepts `"none"`, `"simple"`, `"first"`, `"average"`, or `"max"`. Use `"max"` to match `transformers.pipeline("ner", aggregation_strategy="max")`. Use `"none"` for token-level predictions.

Response format

With aggregation_strategy: "max" (recommended for production):

[
  [
    {"token": "Apple", "token_id": 0, "start": 0, "end": 5, "results": {"ORG": 0.9975586}},
    {"token": "U.K.", "token_id": 0, "start": 27, "end": 31, "results": {"LOC": 0.9980469}}
  ]
]

With aggregation_strategy: "none" and raw_scores: true (token-level with BIO labels):

[
  [
    {
      "token": "Apple",
      "token_id": 6207,
      "start": 0,
      "end": 5,
      "results": {
        "B-ORG": 6.7578125,
        "O": -1.7929688,
        "B-LOC": 0.6015625,
        "B-MISC": 0.2467041,
        "B-PER": 0.17675781,
        "I-ORG": -0.6484375,
        "I-MISC": -1.9873047,
        "I-LOC": -1.3808594,
        "I-PER": -2.21875
      }
    }
  ]
]

Token-level labels follow the BIO tagging scheme: B- marks the beginning of an entity, I- marks a continuation, and O means outside any entity.

Python example

Using the Baseten Performance Client:

from baseten_performance_client import PerformanceClient
import os

client = PerformanceClient(
    api_key=os.environ['BASETEN_API_KEY'],
    base_url="https://model-xxxxxx.api.baseten.co/environments/production/sync"
)

response = client.batch_post(
    url_path="/predict_tokens",
    payloads=[{
        "inputs": ["Apple is looking at buying U.K. startup for $1 billion"],
        "truncate": True,
        "raw_scores": False,
        "aggregation_strategy": "max"
    }]
)

for entity in response.data[0]:
    label = next(iter(entity["results"]))
    score = entity["results"][label]
    print(f"{entity['token']}: {label} ({score:.4f})")

NER models do not expose an OpenAI-compatible endpoint. Call /predict_tokens directly. The route also supports async inference.

BEI-Bert overview: Bidirectional encoder engine that hosts NER deployments.
BEI configuration reference: Full trt_llm schema for build and runtime fields.

Overview

Get started

Model APIs

Inference

Development

Deployment

Engines

Frontier Gateway

Organization

Observability

Troubleshooting

Named entity recognition

Recommended models

Configuration

Request format

Response format

Python example

​Recommended models

​Configuration

​Request format

​Response format

​Python example

​Related

Recommended models

Configuration

Request format

Response format

Python example

Related