large language

Phi 3.5 Mini Instruct

A highly capable lightweight LLM from Microsoft

Deploy now

Model details

Developed by
Microsoft
Model family
Phi
Use case
large language
Version
3.5
Variant
128k
Size
3.8B
Optimization
vLLM
Hardware
A10G
License
MIT
Readme
View

View repository

Example usage

Phi 3.5 uses the standard set of LLM parameters and has optional streaming output.

Input

1import requests
2import os
3
4# Replace the empty string with your model id below
5model_id = ""
6baseten_api_key = os.environ["BASETEN_API_KEY"]
7
8messages = [
9    {"role": "system", "content": "You are a pirate chatbot who always responds in pirate speak!"},
10    {"role": "user", "content": "Who are you?"},
11]
12data = {
13    "messages": messages,
14    "stream": True,
15    "temperature": 0.5
16}
17
18# Call model endpoint
19res = requests.post(
20    f"https://model-{model_id}.api.baseten.co/production/predict",
21    headers={"Authorization": f"Api-Key {baseten_api_key}"},
22    json=data,
23    stream=True
24)
25
26# Print the generated tokens as they get streamed
27for content in res.iter_content():
28    print(content.decode("utf-8"), end="", flush=True)

JSON output

1[
2    "arrrg",
3    "me hearty",
4    "I",
5    "be",
6    "doing",
7    "..."
8]

large language models

See all

Model API

LLM

DeepSeek V3 0324

V3 - 0324 - B200

Model API

LLM

DeepSeek R1 0528

R1 - 0528 - B200

Model API

LLM

Llama 4 Maverick

V4.0 - Instruct - vLLM - B200

Microsoft models

See all

LLM

Phi 3.5 Mini Instruct

3.5 - 128k - vLLM - A10G

LLM

Phi 3 Mini 128K Instruct

3 - 128k - T4

LLM

Phi 3 Mini 4K Instruct

3 - 4k - T4

🔥 Trending models

LLM

Qwen 3 235B

V3 - SGLang - H100

Text to speech

Orpheus TTS

TRT-LLM - H100 MIG 40GB

Model API

LLM

DeepSeek R1 0528

R1 - 0528 - B200

Explore Baseten today

Start deploying

Talk to an engineer