NVIDIA BioNeMo Agent Toolkit on Baseten

Scientific AI is evolving from isolated model inference toward agentic workflows that can reason, plan, and execute across multi-step scientific tasks. Scientists increasingly need AI systems that can understand scientific literature, predict protein structures, design molecules, analyze genomic data, identify therapeutic targets, and recommend next experiments

NVIDIA BioNeMo Agent Toolkit provides the domain-specific tools, models, and skills needed to enable this new generation of AI scientists. It combines BioNeMo skills, open models, NVIDIA NIM microservices, and agent orchestration infrastructure, to help agents move from reading science to doing science.

At Baseten, we built our platform for teams that need to run inference at scale and increasingly, those teams are in life sciences. The shift toward agentic workflows is changing what 'running a model' means: instead of a single call, an agent is now orchestrating a pipeline of specialized models, each passing structured outputs to the next. That's exactly the infrastructure problem Baseten was built to solve. So when NVIDIA shared the BioNeMo agent toolkit with us, it gave us the biological reasoning layer our customers were missing. BioNeMo NIM are domain-specific by design — not general-purpose models adapted for biology, but models trained from the ground up for structure prediction, molecule generation, and genomics. Pairing that with Baseten's inference infrastructure means a life science research team can run a full discovery pipeline end-to-end, without standing up and managing the underlying compute themselves

What is a NVIDIA NIM microservice?

NVIDIA NIM microservice is a Docker container that serves exactly one model behind a small HTTP API, with the weights, CUDA, and the serving stack baked in. BioNeMo is an NVIDIA platform for AI-driven biology and drug discovery, providing open models, libraries, datasets, tools, and NIM microservices for scientific AI workloads. The BioNeMo portfolio spans three broad categories:

Understanding Biology - models that analyze biological sequences and identify biological meaning, such as mapping a molecule’s 3D shape from its raw sequence. Examples include Evo 2
Design New Therapeutics - models that generate proteins, antibodies, and molecules, such as designing entirely new, functional proteinsto target specific diseases. Examples include RFdiffusion, ProteinMPNN, GenMol, MolMIM.
Predict Structures and Interactions - models that predict how biological molecules behave and interact, such as reading a sequence to understand their properties, evolution and functions. Examples include OpenFold2, OpenFold3, Boltz2, DiffDock.

The table below summarizes the BioNeMo NIM microservices available on Baseten:

BioNeMo skills turn models into agent workflows

Alongside the containers, NVIDIA is publishing BioNeMo skills. A skill is a markdown specification written for an agent. It gives instructions on which endpoint to call for a given task, the authentication header, the exact payload fields, how to parse the response, and what the common errors mean. Each model gets one SKILL.md plus reference files, and two "meta-skills" compose several models into full pipelines.

When you point an agent at these skills, it can decide on its own that folding a sequence means gathering an alignment first and folding second, format both payloads correctly, and then carry the data from one model to the next.

So the skill will turn a pile of HTTP endpoints into something an agent can drive end to end and deliver a complete scientific workflow.

Search first: context powers biology

Context is necessary before the structure models can run; the accuracy of AlphaFold-class models does not come from the sequence alone, it comes from evolution. Find hundreds of related proteins across other species and line them up, and the columns that mutate together over millions of years are usually touching in 3D. That coevolution signal in the alignment is most of what the folder reads, which is why folding a single lonely sequence yields noticeably worse results.

Gathering context is the whole job of MSA-Search (MSA stands for “multiple sequence alignment”): GPU MMseqs2 searches over a roughly 1.4 TB reference database, which turns a search that took tens of minutes to hours on CPU into one that finishes in seconds. It is plumbing, but the fold depends on it, so it is the first step in every pipeline outlined below.

Watching an agent fold a protein

Here is the structure named at the beginning of this blog, drawn as the agent sees it. The agent reads the meta-skill, then makes two calls, and the alignment text that comes out of the search is what the folder uses as input.

Watching an agent fold a protein

The agent picks paired search instead of standard search when the input is a complex of two chains. It can run OpenFold3 twice, once with the full alignment and once with a single-sequence alignment. Then it reads the confidence score and decides whether to trust the answer. That decision-making is what the BioNeMo skill enables.

Beyond folding: scientific workflow at scale

Protein folding is only one example. BioNeMo Agent Toolkit enables broader scientific workflows, including:

Protein Binder Design - design proteins that bind to specific biological targets and computationally validate their effectiveness before laboratory testing.
Virtual Screening - generate, dock, score, and rank candidate drug molecules from millions of possibilities before expensive experimental validation.
Genomic Analysis - combine genomics, biological foundational models, and agentic reasoning to identify and prioritize promising therapeutic targets.
Target Discovery - combine genomics, biological foundation models, and agentic reasoning to identify and prioritize promising therapeutic targets.

These workflows transform isolated model calls into complete scientific discovery pipelines.

Drug discovery in action

As we now want to discover new drugs, here is how the models can be used:

GenMol generates candidate molecules from a scaffold.
DiffDock aligns each candidate to the target protein and keeps only those that fit.
Boltz2 then predicts how tightly each survivor would actually bind.

To finish, the agent will generate, dock, score, sort, and hand back a shortlist ranked by predicted potency. It is the same move as folding, three models instead of two, and the chemistry-design models are doing the work that the folders cannot.

Drug discovery pipeline

BioNeMo at Baseten

All BioNeMo NIM microservices are available today in the Baseten Model Library.

Developers can deploy them directly through Baseten's MCP and invoke them using Baseten skills. Then, the same pattern that folds a protein or ranks drug candidates will work end-to-end: the agent reads the skill, picks the right NIM from the library, deploys it, and starts calling.

By making BioNeMo Agent Toolkit available on Baseten, developers can rapidly deploy and scale scientific AI workloads without managing the complexity of model serving, GPU infrastructure, or orchestration.

Teams like Benchling are already using Baseten to run BioNeMo workloads in production, accelerating the path from sequence to structure to candidate.

Whether you’re building protein design tools, virtual screening pipelines, genomic analysis applications, or next-generation AI scientists, Baseten provides a production-ready platform for BioNeMo-powered workloads.

Get Started

Explore BioNeMo models in the Baseten Model Library and start building AI agents for protein design, virtual screening, genomics, and drug discovery today.

If you are building scientific AI applications and need help scaling them, talk to our engineers.

NVIDIA BioNeMo Agent Toolkit on Baseten

Authors

Last updated

Share

What is a NVIDIA NIM microservice?

BioNeMo skills turn models into agent workflows

Search first: context powers biology

Watching an agent fold a protein

Beyond folding: scientific workflow at scale

Drug discovery in action

BioNeMo at Baseten

Get Started

Related posts

The best open-source large language models (LLMs)

Mercury 2, the first reasoning diffusion LLM, is now on Baseten

Introducing NVIDIA Nemotron 3 Ultra: The Nemotron 3.x family is here!

Explore Baseten today

Related posts

The best open-source large language models (LLMs)

Mercury 2, the first reasoning diffusion LLM, is now on Baseten

Introducing NVIDIA Nemotron 3 Ultra: The Nemotron 3.x family is here!