> ## Documentation Index > Fetch the complete documentation index at: https://docs.baseten.co/llms.txt > Use this file to discover all available pages before exploring further. # Qwen3 Reranker > Alibaba's Qwen3 Reranker is an 8B cross-encoder for high-quality passage reranking in retrieval-augmented generation pipelines.

Reranking Cross-encoder

## Setup To get started, sign into Baseten with Truss and then install the Python `requests` library. **Sign in to Baseten** ```sh theme={"system"} uvx truss login --browser ``` **Install requests** ```sh theme={"system"} uv pip install requests ``` [Qwen/Qwen3-Reranker-8B](https://huggingface.co/Qwen/Qwen3-Reranker-8B) is an 8B-parameter dense model. H100 TRT-LLM ## Write the config Create and move into the project directory: ```sh theme={"system"} mkdir qwen3-reranker-8b && cd qwen3-reranker-8b ``` Then create a file named `config.yaml` and paste the following: ```yaml config.yaml theme={"system"} # this file was autogenerated by `generate_templates.py` - please do change via template only model_metadata: example_model_input: inputs: - - Baseten is a fast inference provider - - Classify this separately. raw_scores: true truncate: true truncation_direction: Right model_name: "model:qwen3-reranker-8b preset:throughput" python_version: py39 resources: accelerator: H100 cpu: '1' memory: 10Gi use_gpu: true trt_llm: build: base_model: encoder checkpoint_repository: repo: michaelfeil/Qwen3-Reranker-8B-seq revision: main source: HF max_num_tokens: 40960 num_builder_gpus: 1 quantization_type: fp8 runtime: webserver_default_route: /predict ``` ## Key parameters [Baseten Embeddings Inference](/engines/bei/overview) (BEI) reads these fields from the `trt_llm` block. Each one shapes how the engine is built and served: | Parameter | Value | | --------------- | --------- | | Quantization | `fp8` | | Base model type | `encoder` | ## Deploy Push the config to Baseten: ```sh theme={"system"} uvx truss push ``` You should see output similar to: ```output theme={"system"} ✨ Model qwen3-reranker-8b was successfully pushed ✨ Model ID: abc1d2ef Deployment ID: xyz123 Endpoint: model-abc1d2ef.api.baseten.co Logs: https://app.baseten.co/models/abc1d2ef/logs/xyz123 ``` Your **model ID** is printed in the `truss push` output (`abcd1234` in the example). Use it wherever you see `{model_id}` in the next section. ## Call the model Your deployment exposes a cross-encoder scoring endpoint at `/predict`. Replace `{model_id}` with your model ID and make sure `BASETEN_API_KEY` is set. Now call your deployment to score candidates: ```python main.py theme={"system"} import os import requests response = requests.post( "https://model-{model_id}.api.baseten.co/environments/production/sync/predict", headers={"Authorization": f"Bearer {os.environ['BASETEN_API_KEY']}"}, json={ "query": "fast inference platform", "texts": [ "Baseten serves models on dedicated GPUs.", "The Eiffel Tower is in Paris.", "Cold-start latency matters for autoscaling.", ], }, ) for hit in response.json(): print(hit["score"], hit["text"]) ``` ```sh theme={"system"} curl -s https://model-{model_id}.api.baseten.co/environments/production/sync/predict \ -H "Content-Type: application/json" \ -H "Authorization: Bearer $BASETEN_API_KEY" \ -d '{ "query": "fast inference platform", "texts": [ "Baseten serves models on dedicated GPUs.", "The Eiffel Tower is in Paris.", "Cold-start latency matters for autoscaling." ] }' ``` For batch scoring at higher throughput, use the [Baseten Performance Client](https://www.baseten.co/blog/your-client-code-matters-10x-higher-embedding-throughput-with-python-and-rust/).