About | Stable Diffusion XL

Stable Diffusion XL 1.0 (SDXL) is a text-to-image foundation model that creates high-quality images in a variety of styles, from realistic photos to paintings and cartoons. SDXL was released by Stability AI in July 2023.

SDXL works well with short, descriptive prompts. Unlike earlier text to image models, you don’t need to end your prompt with instructions like “4k ultra hd realistic” to get a high-quality image. A prompt like “a man in a space suit playing a guitar” is great for SDXL.

How do I make better images with Stable Diffusion XL?

SDXL works in steps, starting from random noise and ending with a finished image. While 20 or 30 steps of inference is often enough to get a great image, you can increase the step count to 50 to maximize detail and clarity.

SDXL image quality at different inference step counts

Stable Diffusion models have long struggled to generate realistic hands, faces, and other key details. SDXL comes with an optional refiner model that you can run to improve the detail and accuracy of these key areas of your image.

Overall, the best way to keep output quality high is to understand the capabilities and limitations of the model and ask for images that it is good at making. Use short, concrete prompts and avoid asking for images with numerous objects or difficult details. And if image quality is the highest priority, run more inference steps to give the model longer to get the details right.

What size of images can Stable Diffusion XL 1.0 create?

SDXL is trained to create 1024 x 1024 pixel images. However, it can create images at other aspect ratios with the same pixel count, for example 1536 x 640 pixels. Images generated at extreme aspect ratios may be lower quality or less realistic.

Can Stable Diffusion XL 1.0 be used commercially?

Yes. This model, Stable Diffusion XL 1.0, is licensed under an OpenRAIL++-M license that allows for commercial use. Note that some variants of SDXL, such as SDXL Turbo, are instead licensed under a Stability Membership which has its own rules for commercial use.

What GPU is best for Stable Diffusion XL 1.0 inference?

Stable Diffusion XL and its associated helper models like variational auto-encoders (vaes) and refiners are run in float16 precision and require about 7 GB of VRAM to load model weights, plus inference headroom.

Given its low VRAM requirements, Stable Diffusion XL can be run on a card as small as a T4. However, it performs much better on midsize L4 and A10G GPUs. And when the fastest image generation is required, A100s are capable of creating Stable Diffusion images in as little as two seconds.

By default, Stable Diffusion XL will be deployed on an A10G GPU.

Can Stable Diffusion XL 1.0 be fine tuned?

As a foundation model, Stable Diffusion XL 1.0 is a perfect candidate for fine tuning. Many techniques, from DreamBooth to LoRAs, make it easier and more cost-effective to fine tune SDXL for narrow use cases, like generating images in a specific style.

One model trained with the same architecture as Stable Diffusion is Playground V2 Aesthetic, which is stylistically similar to MidJourney.

How do I make better images with Stable Diffusion XL?

What size of images can Stable Diffusion XL 1.0 create?

Can Stable Diffusion XL 1.0 be used commercially?

What GPU is best for Stable Diffusion XL 1.0 inference?

Can Stable Diffusion XL 1.0 be fine tuned?

Explore Baseten today