embedding

Qwen LogoQwen3 8B Embedding

Leading open-source model for embeddings

Model details

View repository

Example usage

Qwen-3-embeddings is a text-embeddings model, producing a 1D embeddings vector, given an input. It's frequently used for downstream tasks like clustering, used with vector databases.

This model is quantized to FP8 for deployment, which is supported by Nvidia's newest GPUs e.g. H100, H100_40GB, B200 or L4. Quantization is optional, but leads to higher efficiency.

The client code can be installed via pip.
https://github.com/basetenlabs/truss/tree/main/baseten-performance-client

Alternatively, you may use also the OpenAI embeddings client.

Input
1import os
2from baseten_performance_client import (
3    PerformanceClient, OpenAIEmbeddingsResponse, ClassificationResponse
4)
5
6api_key = os.environ.get("BASETEN_API_KEY")
7model_id = "yqv0rjjw"
8base_url = f"https://model-{model_id}.api.baseten.co/environments/production/sync"
9
10client = PerformanceClient(base_url=base_url, api_key=api_key)
11
12def format_query(task_description: str, query: str, document: str) -> str:
13    # qwen-3-embedding style qeury formatting..
14    return f'Instruct: {task_description}\nQuery:{query}'
15
16task = 'Given a web search query, retrieve relevant passages that answer the query'
17texts = [
18    get_detailed_instruct(task, 'Explain gravity'),
19    "Gravity is a force that attracts two bodies towards each other. It gives weight to physical objects and is responsible for the movement of planets around the sun."
20]
21response: OpenAIEmbeddingsResponse = client.embed(
22    input=texts,
23    model="my_model",
24    batch_size=16,
25    max_concurrent_requests=32,
26)
27array = response.numpy()
JSON output
1{
2    "data": [
3        {
4            "embedding": [
5                0
6            ],
7            "index": 0,
8            "object": "embedding"
9        }
10    ],
11    "model": "thenlper/gte-base",
12    "object": "list",
13    "usage": {
14        "prompt_tokens": 512,
15        "total_tokens": 512
16    }
17}

embedding models

See all
Qwen Logo
Embedding

Qwen3 8B Reranker

BEI - H100 MIG 40GB
Qwen Logo
Embedding

Qwen3 8B Embedding

BEI - H100 MIG 40GB
Allen AI
Embedding

Tulu 3 8B Reward

V3 - Reward - BEI - H100 MIG 40GB

Qwen models

See all
Qwen Logo
Embedding

Qwen3 8B Reranker

BEI - H100 MIG 40GB
Qwen Logo
Embedding

Qwen3 8B Embedding

BEI - H100 MIG 40GB
Qwen Logo
LLM

Qwen 3 235B

V3 - SGLang - H100

🔥 Trending models