large language

Phi 3 Mini 128K Instruct

A highly capable LLM with just 3.8 billion parameters and a 128K-token context window

Deploy now

Model details

Developed by
Microsoft
Model family
Phi
Use case
large language
Version
3
Variant
128k
Size
3.8B
Hardware
T4
License
MIT
Readme
View

View repository

Example usage

Phi 3 uses the standard set of LLM parameters and has optional streaming output.

Input

1import requests
2
3# Replace the empty string with your model id below
4model_id = ""
5baseten_api_key = os.environ["BASETEN_API_KEY"]
6
7messages = [
8    {"role": "system", "content": "You are a pirate chatbot who always responds in pirate speak!"},
9    {"role": "user", "content": "Who are you?"},
10]
11data = {
12    "messages": messages,
13    "stream": True,
14    "max_new_tokens": 512,
15    "temperature": 0.5
16}
17
18# Call model endpoint
19res = requests.post(
20    f"https://model-{model_id}.api.baseten.co/production/predict",
21    headers={"Authorization": f"Api-Key {baseten_api_key}"},
22    json=data,
23    stream=True
24)
25
26# Print the generated tokens as they get streamed
27for content in res.iter_content():
28    print(content.decode("utf-8"), end="", flush=True)

JSON output

1[
2    "arrrg",
3    "me hearty",
4    "I",
5    "be",
6    "doing",
7    "..."
8]

large language models

See all

Model API

LLM

DeepSeek V3 0324

V3 - 0324 - B200

Model API

LLM

DeepSeek R1 0528

R1 - 0528 - B200

Model API

LLM

Llama 4 Maverick

V4.0 - Instruct - vLLM - B200

Microsoft models

See all

LLM

Phi 3.5 Mini Instruct

3.5 - 128k - vLLM - A10G

LLM

Phi 3 Mini 128K Instruct

3 - 128k - T4

LLM

Phi 3 Mini 4K Instruct

3 - 4k - T4

🔥 Trending models

LLM

Qwen 3 235B

V3 - SGLang - H100

Text to speech

Orpheus TTS

TRT-LLM - H100 MIG 40GB

Model API

LLM

DeepSeek R1 0528

R1 - 0528 - B200

Explore Baseten today

Start deploying

Talk to an engineer