"Inference Engineering" is now available. Get your copy here

Gemma 4

Family of LLMs developed by Google

Publisher details

Website
View
Hugging Face
View

Gemma is a family of generative artificial intelligence models and you can use them in a wide variety of generation tasks, including question answering, summarization, and reasoning. Gemma models are provided with open weights and permit responsible commercial use, allowing you to tune and deploy them in your own projects and applications.

Gemma 4 model family spans three distinct architectures tailored for specific hardware requirements:

Small Sizes: 2B and 4B effective parameter models built for ultra-mobile, edge, and browser deployment (e.g., Pixel, Chrome).
Dense: A powerful 31B parameter dense model that bridges the gap between server-grade performance and local execution.
Mixture-of-Experts: A highly efficient 26B MoE model designed for high-throughput, advanced reasoning.

Gemma 4 models

LLM

Gemma 4 E4B IT

4 - Latency - H100

LLM

Gemma 4 26B A4B IT

4 - Latency - H100

LLM

Gemma 4 E2B IT

4 - Latency - H100

LLM

Gemma 4 31B IT

4 - Latency - H100

Explore Baseten today

Start deploying Talk to an engineer

Gemma 4 | Model library