Zerank 1 Small

Example usage

Zerank 1 Small is a state-of-the art open-source reranking model for accurate search and retrieval by ZeroEntropy. ZeroEntropy also published the full-size Zerank 1 as a source-available model that can be licensed for commercial use.

Zerank offers exceptional performance, comparing favorable to top closed-source models on reranking tasks.

Zerank vs BM25 for NDCG@10

With Baseten, you can run high-throughput and low-latency deployments of Zerank powered by BEI. Here are results from a benchmark running on a single H100 GPU with 64 concurrent requests of 500 tokens each.

And with the Baseten Performance Client, you can run high-volume reranking jobs up to 12x faster than ordinary client code.

The Baseten Performance Client achieves 12x faster corpus backfill in an extremely high-volume test.

If you're building search at scale, try zeranker on Baseten today!

Sample inference code

Input
import os
from baseten_performance_client import (
    PerformanceClient, ClassificationResponse
)

api_key = os.environ["BASETEN_API_KEY"]
model_id = "xxxxxxx"
base_url = f"https://model-{model_id}.api.baseten.co/environments/production/sync"

client = PerformanceClient(base_url=base_url, api_key=api_key)

prefix = "<|im_start|>system\nJudge whether the Document meets the requirements based on the Query and the Instruct provided. Note that the answer can only be \"yes\" or \"no\".<|im_end|>\n<|im_start|>user\n"
suffix = "<|im_end|>\n<|im_start|>assistant\n<think>\n\n</think>\n\n"

def format_instruction(instruction, query, doc):
    if instruction is None:
        instruction = 'Given a web search query, retrieve relevant passages that answer the query'
    output = "{prefix}<Instruct>: {instruction}\n<Query>: {query}\n<Document>: {doc}{suffix}"
    return output

texts_to_classify = [
    format_instruction(task=None, query="What is the capital of China?", doc="The capital of China is Beijing."),
    format_instruction(task=None, query="What is the capital of China?", doc="The capital of France is Paris.")
]

response: ClassificationResponse = client.classify(
    input=texts,
    model="my_model",
    truncate=True,
    batch_size=16,
    max_concurrent_requests=32,
)
JSON output
[
    {
        "score": 0.9861514,
        "label": "yes"
    },
    {
        "score": 0.01384861,
        "label": "no"
    }
]

Model details

Example usage

Sample inference code

embedding models

Qwen3 8B Reranker

Qwen3 8B Embedding

Tulu 3 8B Reward

ZeroEntropy models

Zerank 1 Small

🔥 Trending models

GPT OSS 120B

GPT OSS 20B

Qwen Image

Explore Baseten today