One of the slowest parts of deploying a model—whether for the first time or as a cold start for a scaled-to-zero model service—is downloading the model weights. These files can exceed 10 GB for some common foundation models.
We developed a network accelerator to speed up model loads from common model artifact stores, including HuggingFace, CloudFront, S3, and OpenAI. Our accelerator employs byte range downloads in the background to maximize the parallelism of downloads.
The network accelerator speeds up downloading large model files by 500-600%.
The network accelerator uses a proxy and sidecar to speed up model downloads. These services run on our SOC 2 Type II certified and HIPAA compliant infrastructure. But, if you prefer to disable this network acceleration for your Baseten workspace, please contact our support team at firstname.lastname@example.org and we will disable the feature for your workspace.