Fast Mistral 7B, fractional H100 GPUs, FP8 quantization, and API endpoints for model management.
3x throughput with H100 GPUs, 40% lower SDXL latency with TensorRT, and multimodal open source models.
A library for open source models, general availability for L4 GPUs, and performance benchmarking for ML inference
Faster Mixtral inference, Playground v2 image generation, and ComfyUI pipelines as API endpoints.
Switching to open source ML, a guide to model inference math, and Stability.ai's new generative AI image-to-video model.
All-new model management, a text embedding model that matches OpenAI, and misgif, the most fun youβll have with AI all week.
Mistral 7B LLM, GPU comparisons, model observability features, and an open source AI event series
Truss' latest update addresses key ML model serving issues. Discover how to speed up SDXL inference to 3s and build ChatGPT-like apps with Llama 2 & Chainlit.