"Inference Engineering" is now available. Get your copy here
Case study

Posit launches real-time AI code suggestions with Baseten

1

minute to acquire compute

60%

faster than other providers

<200ms

latency

Background: Building AI into the data science workflow

The Posit team recently launched Posit AI, a new feature embedding intelligence directly into RStudio through two core features:

  • Posit Assistant: A conversational agent embedded in RStudio with full context of the user's live R session, capable of writing, refactoring, and executing code, collaborating on exploratory data analysis, debugging, and generating Quarto reports.

  • Next Edit Suggestions (NES): Context-aware autocompletion that predicts the developer's next edit, similar in spirit to Cursor Tab, but uniquely powered by a suite of small LLMs fine-tuned to incorporate live computational session context (variable names, data frame column types, function signatures) alongside code context. NES is hosted and served through Baseten.

The latency bar

For NES, latency was the critical factor. Because suggestions appear inline as "ghost text" while the user types, and can predict edits elsewhere in the document (such as renaming a column reference downstream), the feature needed to feel instant. Anything above a few hundred milliseconds would break the user experience. This constraint ruled out most LLM providers, who were roughly 400ms too slow at best.

2Baseten’s deployment optimized performance and latency for NES

Posit needed infrastructure that could deliver sub-200ms latency and give them the flexibility to rapidly experiment with fine-tuning approaches to push model quality further.

Challenge: Closing the gap between general-purpose and context-aware models

The Posit team's initial exploration started with open-source coding models. These models performed well on pure code completion, but the Posit team needed to go further. They wanted to incorporate live computational session context, including variable names, data frame column types, function signatures, and runtime state, alongside code context to make suggestions dramatically more relevant. This is what sets NES apart from every other code completion tool on the market: the model can "see" the structure of the data the user is actively working with.

2NES speeds up your coding process by offering targeted code completions as you type

However, open-source models trained only on code context became unstable when this additional context was introduced. Fine-tuning was the natural next step, but the team faced several challenges:

Speed of experimentation: With a product launch on the horizon, the team could not afford slow, sequential training runs. They needed to test multiple fine-tuning approaches quickly enough to determine whether this path would work or if they needed to change their approach.

Cost uncertainty: Standing up GPU infrastructure for training experimentation typically comes with unpredictable and often steep costs, especially when the outcome is uncertain.

Solution: Rapid experimentation on Baseten Training

Posit used Baseten Training to run a fast, iterative cycle of fine-tuning experiments, stand up compute in minutes, and script the full workflow for training, deploying, and evaluating models.

Instant compute, controlled costs

"Being able to stand up compute in a matter of minutes helped us to control costs while rapidly iterating on our training runs. I initially thought there was some sort of billing error after kicking off the first training run. We had anticipated orders of magnitude greater costs – the pricing model is very reasonable."
Simon Couch, AI Core Team

Baseten's on-demand compute model meant Posit only paid for what it used, eliminating the overhead of maintaining idle GPU infrastructure. The cost savings were dramatic:

Code-first workflow with Truss

Rather than navigating a point-and-click interface, Posit's engineers interfaced with Baseten Training via the open-source Truss library, scripting the entire training, deployment, and evaluation workflow programmatically.

"Interfacing with the training platform through the open-source Truss library was a huge boost. Scripting the workflow of training, deploying, and evaluating allowed us to iterate much faster."
Simon Couch, AI Core Team

Sub-200ms latency in production

By deploying and optimizing the model on the Baseten Inference Stack, Posit achieved sub-200ms latencies, which was the critical threshold for the inline edit suggestion experience. No other provider they evaluated could meet this bar.

Results: Fast experimentation, confident decisions, and a successful launch

Baseten Training gave Posit the speed to explore fine-tuning approaches and the clarity to make the right product decision — to use an existing open-weights model that was not fine-tuned for code completions specifically — all before launch.

The platform's on-demand compute and code-first workflow meant the team could test multiple configurations in parallel rather than waiting days for sequential runs. They were able to rapidly assess different fine-tuning strategies, evaluate model quality against their benchmarks, and determine the best path forward for their Next Edit Suggestions feature, all within a compressed timeline.

Posit AI officially launched last week, powered by Baseten. Check it out here.

Chosen by the world's most ambitious builders