Skip to main content

Documentation Index

Fetch the complete documentation index at: https://docs.baseten.co/llms.txt

Use this file to discover all available pages before exploring further.

Named entity recognition (NER) classifies each token in an input string into entity categories such as person (PER), organization (ORG), location (LOC), and miscellaneous (MISC). NER models use the ForTokenClassification architecture and the /predict_tokens endpoint. NER requires BEI-Bert (base_model: encoder_bert); BEI does not support token-level outputs.
  • dslim/bert-base-NER-uncased: fast, compact NER for English. (Truss example)
  • tanaos/tanaos-NER-v1: general-purpose NER.

Configuration

Add to config.yaml:
trt_llm:
  build:
    base_model: encoder_bert
    checkpoint_repository:
      source: HF
      repo: "baseten-admin/bert-base-ner-uncased"
      revision: main
    max_num_tokens: 16384
  runtime:
    webserver_default_route: /predict_tokens

Request format

{
  "inputs": [["Apple is looking at buying U.K. startup for $1 billion"]],
  "truncate": true,
  "raw_scores": false,
  "aggregation_strategy": "max"
}
FieldTypeDescription
inputslist of list of stringsBatched text inputs to classify. Each inner list is a batch of texts.
raw_scoresbooleanWhen true, returns raw logit scores for all labels per token. When false, returns the top predicted label with its probability.
truncatebooleanTruncates inputs that exceed the model’s max sequence length.
truncation_directionstringControls which end is truncated. Defaults to "Right".
aggregation_strategystringMerges sub-word tokens into entity spans. Accepts "none", "simple", "first", "average", or "max". Use "max" to match transformers.pipeline("ner", aggregation_strategy="max"). Use "none" for token-level predictions.

Response format

With aggregation_strategy: "max" (recommended for production):
[
  [
    {"token": "Apple", "token_id": 0, "start": 0, "end": 5, "results": {"ORG": 0.9975586}},
    {"token": "U.K.", "token_id": 0, "start": 27, "end": 31, "results": {"LOC": 0.9980469}}
  ]
]
With aggregation_strategy: "none" and raw_scores: true (token-level with BIO labels):
[
  [
    {
      "token": "Apple",
      "token_id": 6207,
      "start": 0,
      "end": 5,
      "results": {
        "B-ORG": 6.7578125,
        "O": -1.7929688,
        "B-LOC": 0.6015625,
        "B-MISC": 0.2467041,
        "B-PER": 0.17675781,
        "I-ORG": -0.6484375,
        "I-MISC": -1.9873047,
        "I-LOC": -1.3808594,
        "I-PER": -2.21875
      }
    }
  ]
]
Token-level labels follow the BIO tagging scheme: B- marks the beginning of an entity, I- marks a continuation, and O means outside any entity.

Python example

Using the Baseten Performance Client:
from baseten_performance_client import PerformanceClient
import os

client = PerformanceClient(
    api_key=os.environ['BASETEN_API_KEY'],
    base_url="https://model-xxxxxx.api.baseten.co/environments/production/sync"
)

response = client.batch_post(
    route="/predict_tokens",
    payloads=[{
        "inputs": [["Apple is looking at buying U.K. startup for $1 billion"]],
        "truncate": True,
        "raw_scores": False,
        "aggregation_strategy": "max"
    }]
)

for entity in response.data[0]:
    label = next(iter(entity["results"]))
    score = entity["results"][label]
    print(f"{entity['token']}: {label} ({score:.4f})")
NER models do not expose an OpenAI-compatible endpoint. Call /predict_tokens directly. The route also supports async inference.