GLM-4.6V

A frontier vision language model by Z AI with native multimodal function calling and interleaved image-text content generation

Talk to an engineer

‌

Model details

Developed by
Z AI
Model family
GLM
Use case
large language
Version
4.6
Variant
Vision
Size
106B

Example usage

GLM-4.6V scales its context window to 128k tokens in training, and achieves SoTA performance in visual understanding among models of similar parameter scales. Crucially, we integrate native Function Calling capabilities for the first time. This effectively bridges the gap between "visual perception" and "executable action" providing a unified technical foundation for multimodal agents in real-world business scenarios. You can deploy GLM-4.6V on NVIDIA H100 GPUs with Baseten today.

GLM-4.6V benchmarks

Deployments of GLM-4.6V are OpenAI-compatible.

Input

1from openai import OpenAI
2import os
3
4model_url = "" # Copy in from API pane in Baseten model dashboard
5
6client = OpenAI(
7    api_key=os.environ['BASETEN_API_KEY'],
8    base_url=model_url
9)
10
11# Chat completion
12response_chat = client.chat.completions.create(
13    model="zai-org/GLM-4.6V",
14    stream=True,
15    messages=[
16        {"role": "system", "content": "You are a helpful vision-language assistant."},
17        {"role": "user", "content": [{"url": "https://upload.wikimedia.org/wikipedia/commons/f/fa/Grayscale_8bits_palette_sample_image.png", "type": "image"},
18        {"text": "Describe this image in detail.", "type": "text"}
19    ],
20    max_tokens=1024
21    temperature=0.7}
22print(response_chat)

JSON output

1{
2    "id": "143",
3    "choices": [
4        {
5            "finish_reason": "stop",
6            "index": 0,
7            "logprobs": null,
8            "message": {
9                "content": "[Model output here]",
10                "role": "assistant",
11                "audio": null,
12                "function_call": null,
13                "tool_calls": null
14            }
15        }
16    ],
17    "created": 1741224586,
18    "model": "",
19    "object": "chat.completion",
20    "service_tier": null,
21    "system_fingerprint": null,
22    "usage": {
23        "completion_tokens": 145,
24        "prompt_tokens": 38,
25        "total_tokens": 183,
26        "completion_tokens_details": null,
27        "prompt_tokens_details": null
28    }
29}