Baseten Blog | Page 1
How to build function calling and JSON mode for open-source and fine-tuned LLMs
Use a state machine to generate token masks for logit biasing to enable function calling and structured output at the model server level.
Introducing function calling and structured output for open-source and fine-tuned LLMs
Add function calling and structured output capabilities to any open-source or fine-tuned large language model supported by TensorRT-LLM automatically.
The best open-source image generation model
Explore the strengths and weaknesses of state-of-the-art image generation models like FLUX.1, Stable Diffusion 3, SDXL Lightning, and Playground 2.5.
How to double tokens per second for Llama 3 with Medusa
We observe up to a 122% increase in tokens per second for Llama 3 after training custom Medusa heads and running the updated model with TensorRT-LLM
SPC hackathon winners build with Llama 3.1 on Baseten
SPC hackathon winner TestNinja and finalist VibeCheck used Baseten to power apps for test generation and mood board creation.
Introducing Baseten Self-hosted
Gain granular control over data locality, align with strict compliance standards, meet specific performance requirements, and more with Baseten Self-hosted.
Compound AI systems explained
Compound AI systems combine multiple models and processing steps, and are forming the next generation of AI products.
Introducing automatic LLM optimization with TensorRT-LLM Engine Builder
The TensorRT-LLM Engine Builder empowers developers to deploy extremely efficient and performant inference servers for open source and fine-tuned LLMs.
Deploying custom ComfyUI workflows as APIs
Easily package your ComfyUI workflow to use any custom node or model checkpoint.
Ten reasons to join Baseten
Baseten is a Series B startup building infrastructure for AI. We're actively hiring for many roles — here are ten reasons to join the Baseten team.