Baseten AI Wrapped: 3 trends to help you build better in 2026

What does it take to run AI in production today?

The answer changed in 2025. DeepSeek R1's release in January catalyzed a market-wide shift—open source models are now viable, and sometimes preferred, alternatives for production workloads. At Baseten, we power the fastest-growing enterprise AI applications with performant open source models. In light of the holiday season, here's our version of Spotify Wrapped: 3 infrastructure learnings that will help you build better in 2026.

1) Reliability is the new moat

2023 and 2024 were the years of experimentation. Everyone was trying out models from the select closed-source options. As engineers grasp this new technology, they are tolerant of hiccups in availability and occasional downtime. But as AI apps gain wider adoption and mature into mission-critical industries, uptime becomes a key differentiator in inference.

In healthcare, decision makers and engineers are now asking for four to even five nines SLA. Take OpenEvidence or Abridge, both of whom we've partnered with at Baseten. For a doctor searching through medical databases or reviewing a transcription of patient conversations, a few minutes of downtime directly delays the care of tens of thousands of lives. Separately, application builders are hitting the walls of closed source because of the lack of control and transparency. They are instead choosing open source for the increased robustness gained from customization. For example, in the medical setting, SFT and RL provide the necessary dials to reach recall and precision metrics for your users' unique data distribution.

2) Speed is the product

Code generation is now a primary part of every developer's workflow, and this is the canonical case for building performant AI applications. There is an increased appetite for trying out new models and picking a favorite combination based on your work. Many are choosing slower, larger, expensive models for solving new problems such as planning the architecture around an application, and faster, smaller, cheaper models for producing boilerplate code that matches existing patterns.

Instead of defaulting to closed-source labs, developers who want reduced latency in tab completion and higher throughput to run longer reasoning or tool use traces are considering optimized models hosted by performant providers. And in improving these models, we are seeing that simulating trajectories of agents and developing strong RL environments that fit an application, combined with learning from these signals, turns out to be a better way of consistently improving than relying on large-scale pretraining and scaling laws. Ultimately, the performance in inference is how you show up for customers.

3) Developers want the wheel

Developer experience is front and center as more people than ever become AI engineers. Before the Cambrian explosion of models, software engineers were accustomed, and more importantly, constrained to using the few intelligent models behind a black-box inference API. However, open source with SFT/RL/post-training led to the proliferation of useful knobs that produce tangible results. Specifically, smaller open source models adapted for a use case can now match the performance of larger closed-source models.

Now there is a downstream effect from the popularity of custom models: a suite of dev ex constructs that help teams train, serve, and monitor models. Companies want their engineers to have more comprehensive visibility into how everything is working under the hood. Logs and metrics help them plan for future compute capacity, monitor usage patterns, and resolve errors on their custom deployments in real time. This set of constructs allow teams to iterate faster and make better decisions for their needs without maintaining or building their own. We are seeing developers become increasingly invested in optimizing their own experience of training and serving models.

Conclusion

As we head into 2026, the infrastructure layer is growing in capabilities alongside the applications. The shift from experimentation to production-grade AI means that reliability, performance, and developer experience are no longer nice-to-haves—they're table stakes in a crowded market.

At Baseten, we're committed to building the infrastructure that makes it possible for teams to ship AI applications that are fast, reliable, and delightful to work with. Whether you're powering critical healthcare workflows, enabling developers to code faster, or building a mission critical application, we're here to help you focus on what matters: serving your users. Happy building!