EmbeddingGemma

Example usage

EmbeddingGemma is a 300 million parameter text embedding model based on the Gemma3 LLM architecture. On the MTEB benchmark, it offers the highest quality of any embedding model less than half a billion parameters and supports over 100 languages with text input of up to 2k tokens.

Input
import os
from baseten_performance_client import (
    PerformanceClient, OpenAIEmbeddingsResponse, 
)

api_key = os.environ.get("BASETEN_API_KEY")
model_id = "abcd1234"
base_url = f"https://model-{model_id}.api.baseten.co/environments/production/sync"

client = PerformanceClient(base_url=base_url, api_key=api_key)
prompts={
    "query": "task: search result | query: ",
    "document": "title: none | text: ",
    "BitextMining": "task: search result | query: ",
    "Clustering": "task: clustering | query: ",
    "Classification": "task: classification | query: ",
    "InstructionRetrieval": "task: code retrieval | query: ",
    "MultilabelClassification": "task: classification | query: ",
    "PairClassification": "task: sentence similarity | query: ",
    "Reranking": "task: search result | query: ",
    "Retrieval": "task: search result | query: ",
    "Retrieval-query": "task: search result | query: ",
    "Retrieval-document": "title: none | text: ",
    "STS": "task: sentence similarity | query: ",
    "Summarization": "task: summarization | query: "
}
def get_detailed_instruct(query: str, task: str) -> str:
    """
    """
    task_str = prompts[task]
    return f'{task_str}{query}'

task = 'Given a web search query, retrieve relevant passages that answer the query'
texts = [
    get_detailed_instruct(task, "Which planet is known as the Red Planet?"),
    "Mars, known for its reddish appearance, is often referred to as the Red Planet.",
    "Jupiter, the largest planet in our solar system, has a prominent red spot.",
]
response: OpenAIEmbeddingsResponse = client.embed(
    input=texts,
    model="gemmamodel",
    batch_size=32,
    max_concurrent_requests=128,
)
array = response.numpy()
JSON output
{
    "data": [
        {
            "embedding": [
                0
            ],
            "index": 0,
            "object": "embedding"
        }
    ],
    "model": "embeddinggemma",
    "object": "list",
    "usage": {
        "prompt_tokens": 512,
        "total_tokens": 512
    }
}

Model details

Example usage

embedding models

EmbeddingGemma

Qwen3 8B Reranker

Qwen3 8B Embedding

Google models

EmbeddingGemma

Gemma 3 27B IT

🔥 Trending models

DeepSeek V3.2

GPT OSS 120B

Kimi K2 Thinking

Explore Baseten today