Engineering
Engineering
Use a state machine to generate token masks for logit biasing to enable function calling and structured output at the model server level.
Add function calling and structured output capabilities to any open-source or fine-tuned large language model supported by TensorRT-LLM automatically.