Skip to content

Sana Logo

🚀 SGLang: High-Performance Serving for SANA

SGLang is an inference framework for accelerated image/video generation. SANA models are natively supported in SGLang, providing high-performance serving with OpenAI-compatible API, CLI, and Python SDK.

Supported Models

Model HuggingFace ID
Sana 0.6B (512px) Efficient-Large-Model/Sana_600M_512px_diffusers
Sana 0.6B (1024px) Efficient-Large-Model/Sana_600M_1024px_diffusers
Sana 1.6B (512px) Efficient-Large-Model/Sana_1600M_512px_diffusers
Sana 1.6B (1024px) Efficient-Large-Model/Sana_1600M_1024px_diffusers
SANA-1.5 1.6B (1024px) Efficient-Large-Model/SANA1.5_1.6B_1024px_diffusers
SANA-1.5 4.8B (1024px) Efficient-Large-Model/SANA1.5_4.8B_1024px_diffusers

Installation

uv pip install 'sglang[diffusion]' --prerelease=allow

For more installation methods (e.g. Docker, ROCm/AMD), check the SGLang installation guide.


Quick Start

1. CLI

The simplest way to generate an image:

sglang generate \
    --model-path Efficient-Large-Model/SANA1.5_1.6B_1024px_diffusers \
    --prompt 'a cyberpunk cat with a neon sign that says "Sana"' \
    --save-output

2. Python SDK

from sglang.multimodal_gen import DiffGenerator

generator = DiffGenerator.from_pretrained(
    model_path="Efficient-Large-Model/SANA1.5_1.6B_1024px_diffusers",
    num_gpus=1,
)

image = generator.generate(
    sampling_params_kwargs=dict(
        prompt='a cyberpunk cat with a neon sign that says "Sana"',
        height=1024,
        width=1024,
        num_inference_steps=20,
        guidance_scale=4.5,
        seed=42,
        save_output=True,
        output_path="outputs/",
    )
)

Server Mode (OpenAI-Compatible API)

Launch the Server

sglang serve --model-path Efficient-Large-Model/SANA1.5_1.6B_1024px_diffusers \
    --host 0.0.0.0 --port 30000

Send Requests

Once the server is running, use the OpenAI-compatible image generation API:

import requests

response = requests.post(
    "http://127.0.0.1:30000/v1/images/generations",
    json={
        "prompt": 'a cyberpunk cat with a neon sign that says "Sana"',
        "size": "1024x1024",
        "num_inference_steps": 20,
        "guidance_scale": 4.5,
        "seed": 42,
        "response_format": "b64_json",
        "n": 1,
    },
)

result = response.json()

Memory Optimization

For GPUs with limited VRAM, SGLang provides CPU offloading options:

sglang generate \
    --model-path Efficient-Large-Model/SANA1.5_1.6B_1024px_diffusers \
    --text-encoder-cpu-offload \
    --vae-cpu-offload \
    --pin-cpu-memory \
    --prompt "A beautiful landscape" \
    --save-output
Option Description
--dit-cpu-offload Offload DiT model to CPU
--text-encoder-cpu-offload Offload text encoder to CPU
--vae-cpu-offload Offload VAE to CPU
--pin-cpu-memory Pin CPU memory for faster transfer

LoRA Support

Apply LoRA adapters during inference:

sglang generate \
    --model-path Efficient-Large-Model/SANA1.5_1.6B_1024px_diffusers \
    --lora-path <your-lora-path> \
    --prompt "A beautiful landscape" \
    --save-output


Citation

@misc{xie2024sana,
      title={Sana: Efficient High-Resolution Image Synthesis with Linear Diffusion Transformer},
      author={Enze Xie and Junsong Chen and Junyu Chen and Han Cai and Haotian Tang and Yujun Lin and Zhekai Zhang and Muyang Li and Ligeng Zhu and Yao Lu and Song Han},
      year={2024},
      eprint={2410.10629},
      archivePrefix={arXiv},
      primaryClass={cs.CV},
      url={https://arxiv.org/abs/2410.10629},
}