SANA × Cosmos-RL: Post-Training (SFT/RL) for Image & Video Diffusion Models¶
We have partnered with Cosmos-RL to provide a complete RL infrastructure for SANA. This document summarizes how to post-train (SFT/RL) SANA-Image or SANA-Video on Cosmos-RL.
Overview¶
SANA is an efficiency-oriented codebase for high-resolution image and video generation. Cosmos-RL is NVIDIA’s flexible and scalable reinforcement learning framework. Together they support:
- SFT (Supervised Fine-Tuning): full and LoRA for image and video
- RL (e.g. DiffusionNFT & Flow-GRPO): image and video with async reward service and configurable datasets
Supported Algorithms & Features¶
Cosmos-RL supports state-of-the-art algorithms including GRPO, DAPO for LLMs, and for diffusion/world models FlowGRPO, DDRL, and DiffusionNFT. SANA is natively supported. For full details see the Cosmos-RL documentation and the post-training of diffusion models overview.
Configuration¶
Configs: SANA configs live in configs/sana. Presets include:
- SFT — Image:
sana-image-sft,sana-image-sft-lora; Video:sana-video-sft,sana-video-sft-lora - RL — Image:
sana-image-nft; Video:sana-video-nft
See the Configuration Page for argument details.
Reward service: Use a separate async reward service; see reward_service/README.md. Set REMOTE_REWARD_TOKEN, REMOTE_REWARD_ENQUEUE_URL, and REMOTE_REWARD_FETCH_URL for the trainer.
Training¶
SFT (example with image LoRA):
cosmos-rl --config ./configs/sana/sana-image-sft-lora.toml cosmos_rl.tools.dataset.diffusers_dataset
RL (image NFT):
cosmos-rl --config ./configs/sana/sana-image-nft.toml cosmos_rl.tools.dataset.diffusion_nft
Datasets: SFT uses local dirs with *.json + *.jpg / *.mp4. RL supports image datasets (e.g. pickscore, ocr, geneval) and video (e.g. filtered VidProM). You can customize via cosmos_rl/tools/dataset/diffusion_nft.py and the Customization guide.