Company Overview
Poolside is a frontier AI research company on a mission to build the most capable AI system for software engineering. Founded in 2023, Poolside has rapidly become one of the most well-funded and closely watched names in enterprise AI by building frontier coding models exclusively for self-hosted deployments.
Their flagship model family, Laguna, is purpose built for agentic coding and long-horizon work: Laguna M.1 (225B total parameters with 23B activated) handles complex, multi-file generation, refactoring, and agentic tasks, while Laguna XS.2 (33B total parameters with 3B activated) is best-in-class among Western open-weight coding agent models. Underlying both is Poolside’s Reinforcement Learning from Code Execution Feedback (RLCEF) training methodology, which continuously generates synthetic code to break through traditional data ceilings.
Challenge
As Poolside prepared to launch its Laguna model family to the global AI community in April 2026, the team faced a fundamental infrastructure challenge: how could they serve a frontier-scale model to thousands of developers with a branded production-grade API but without having to build the operational infrastructure themselves?
And their requirements were demanding:
A whitelabeled, OpenAI compatible API endpoint under their own brand
Performance that matched or exceeded their own internal inference benchmarks
Billing metering and usage limits per API key
Traffic recording with configurable sampling rates, feeding back into model training and evaluations
Secure API key federation between Poolside’s platform and Baseten’s infrastructure
A partner capable of rapid iteration with a launch deadline of April 27, 2026
Solution
Baseten deployed a comprehensive inference platform for Poolside that combines dedicated model hosting with the Baseten Inference Stack, the Baseten Frontier Gateway, billing infrastructure and training data tooling, all operational within weeks of the partnership kicking off.
Core elements of the solution include:
Dedicated Model Deployment
Baseten deployed a preview version of Poolside’s Laguna M.1 model as a whitelabeled Model API accessible via an OpenAI and Anthropic-compatible interface directly through Baseten production infrastructure. Within 48 hours of Poolside creating their account, the model was live and responding to queries.
KV Cache-Aware Routing
Baseten’s Inference Stack implements KV cache-aware routing to maximize GPU efficiency at scale. It was critical to meet Poolside’s expected traffic profile post-launch. This is particularly important given the high cache utilization seen in agentic coding workloads.
Billing Webhooks & Usage Limits
Baseten’s Frontier Gateway product supports a batched billing webhook that emits detailed token-level usage events (including cached input tokens) to Poolside’s metering infrastructure without impacting inference. In the inference path, Baseten enforces per API-key usage limits giving Poolside precise control over developer usage.
Traffic Sampling for RLCEF
To power Poolside’s proprietary training loop, Baseten built a traffic sampling system that captures request headers and inputs at configurable sampling rates. This closes the feedback loop between production inference and continuous model improvements via RLCEF.
Secure API Key Federation
Baseten designed a token-based authentication scheme, allowing Poolside to issue API keys that are valid across both platforms without exposing raw Baseten credentials. This standard, secure protocol satisfies both teams’s security requirements and supports clean key rotation over time.
Auto-Deploy Workflow for Checkpoint Updates
Basteen build an automated model update pipeline where:
Poolside provides a new checkpoint + model implementation
Baseten deploys a staging endpoint
Poolside runs quality evals and performance benchmarks
Checkpoints are promoted to production
This workflow reduces friction on every future model update.
Results
The partnership delivered results that exceed expectations on every dimension: Performance, speed of execution and relationship quality.
“When we set out to make our Laguna family of models available to the world, we knew the inference layer would make or break the developer experience at launch. Baseten exceeded our quality and performance bar. The speed from conversion to production-grade, white labeled API was unlike anything we had seen from an infrastructure partner.
Baseten’s engineering team achieved a breakthrough by leveraging Triton MoE backend for Laguna inference. Some key results include:
P50 TTFT: 146ms for Laguna XS.2 and 605ms for Laguna M.1
P90 TTFT: 1.5s for Laguna XS.2 and 3.9s for Laguna M.1