embedding

Qwen3 8B Embedding

Leading open-source model for embeddings

Model details

Developed by
Qwen
Use case
embedding
Optimization
BEI
Hardware
H100 MIG 40GB
API
OpenAI SDK
License
Apache 2.0

Example usage

Qwen-3-embeddings is a text-embeddings model, producing a 1D embeddings vector, given an input. It's frequently used for downstream tasks like clustering, used with vector databases.

This model is quantized to FP8 for deployment, which is supported by Nvidia's newest GPUs e.g. H100, H100_40GB, B200 or L4. Quantization is optional, but leads to higher efficiency.

The client code can be installed via pip.
https://github.com/basetenlabs/truss/tree/main/baseten-performance-client

Alternatively, you may use also the OpenAI embeddings client.

Input
import os
from baseten_performance_client import (
    PerformanceClient, OpenAIEmbeddingsResponse, ClassificationResponse
)

api_key = os.environ.get("BASETEN_API_KEY")
model_id = "yqv0rjjw"
base_url = f"https://model-{model_id}.api.baseten.co/environments/production/sync"

client = PerformanceClient(base_url=base_url, api_key=api_key)

def format_query(task_description: str, query: str, document: str) -> str:
    # qwen-3-embedding style qeury formatting..
    return f'Instruct: {task_description}\nQuery:{query}'

task = 'Given a web search query, retrieve relevant passages that answer the query'
texts = [
    get_detailed_instruct(task, 'Explain gravity'),
    "Gravity is a force that attracts two bodies towards each other. It gives weight to physical objects and is responsible for the movement of planets around the sun."
]
response: OpenAIEmbeddingsResponse = client.embed(
    input=texts,
    model="my_model",
    batch_size=16,
    max_concurrent_requests=32,
)
array = response.numpy()
JSON output
{
    "data": [
        {
            "embedding": [
                0
            ],
            "index": 0,
            "object": "embedding"
        }
    ],
    "model": "thenlper/gte-base",
    "object": "list",
    "usage": {
        "prompt_tokens": 512,
        "total_tokens": 512
    }
}