Skip to content

SANA × Cosmos-RL Logo

SANA × Cosmos-RL: Post-Training (SFT/RL) for Image & Video Diffusion Models

SANA on Cosmos-RL   Cosmos-RL

We have partnered with Cosmos-RL to provide a complete RL infrastructure for SANA. This document summarizes how to post-train (SFT/RL) SANA-Image or SANA-Video on Cosmos-RL.

SANA × Cosmos-RL Teaser

Overview

SANA is an efficiency-oriented codebase for high-resolution image and video generation. Cosmos-RL is NVIDIA’s flexible and scalable reinforcement learning framework. Together they support:

  • SFT (Supervised Fine-Tuning): full and LoRA for image and video
  • RL (e.g. DiffusionNFT & Flow-GRPO): image and video with async reward service and configurable datasets

Supported Algorithms & Features

Cosmos-RL supports state-of-the-art algorithms including GRPO, DAPO for LLMs, and for diffusion/world models FlowGRPO, DDRL, and DiffusionNFT. SANA is natively supported. For full details see the Cosmos-RL documentation and the post-training of diffusion models overview.

Configuration

Configs: SANA configs live in configs/sana. Presets include:

  • SFT — Image: sana-image-sft, sana-image-sft-lora; Video: sana-video-sft, sana-video-sft-lora
  • RL — Image: sana-image-nft; Video: sana-video-nft

See the Configuration Page for argument details.

Reward service: Use a separate async reward service; see reward_service/README.md. Set REMOTE_REWARD_TOKEN, REMOTE_REWARD_ENQUEUE_URL, and REMOTE_REWARD_FETCH_URL for the trainer.

Training

SFT (example with image LoRA):

cosmos-rl --config ./configs/sana/sana-image-sft-lora.toml cosmos_rl.tools.dataset.diffusers_dataset

RL (image NFT):

cosmos-rl --config ./configs/sana/sana-image-nft.toml cosmos_rl.tools.dataset.diffusion_nft

Datasets: SFT uses local dirs with *.json + *.jpg / *.mp4. RL supports image datasets (e.g. pickscore, ocr, geneval) and video (e.g. filtered VidProM). You can customize via cosmos_rl/tools/dataset/diffusion_nft.py and the Customization guide.