Sol-RL: FP4 Explore, BF16 Train for SANA, FLUX.1, and SD3.5-L¶
This guide covers Sol-RL post-training in Sana, including single-node launchers, config families, reward setup, and model-specific notes for SANA, FLUX.1, and SD3.5-L.
Base installation is shared with the rest of the repo and is documented in Installation.
If you want the NVFP4 path (*_naive_quant_* or *_sol_rl_*), also install transformer-engine with the same Python interpreter used by torchrun:
How to Train¶
Default single-node launchers:
bash train_scripts/sol_rl/run_sana_single_node_8gpu.sh
bash train_scripts/sol_rl/run_sd3_single_node_8gpu.sh
bash train_scripts/sol_rl/run_flux1_single_node_8gpu.sh
Examples:
CONFIG_SPEC=configs/sol_rl/sana.py:sana_diffusionnft_pickscore \
bash train_scripts/sol_rl/run_sana_single_node_8gpu.sh
CONFIG_SPEC=configs/sol_rl/sd3.py:sd3_compile_hpsv2 \
bash train_scripts/sol_rl/run_sd3_single_node_8gpu.sh
CONFIG_SPEC=configs/sol_rl/flux1.py:flux1_sol_rl_imagereward \
bash train_scripts/sol_rl/run_flux1_single_node_8gpu.sh
Configuration Families¶
Config naming pattern:
Examples:
sana_diffusionnft_pickscoresd3_compile_hpsv2flux1_sol_rl_imagereward
| Family | Meaning | Rollout shape | TE / NVFP4 needed |
|---|---|---|---|
diffusionnft |
PEFT-only baseline | 24-in-24 | No |
naive_scaling |
PEFT brute-force scaling | 24-in-96 | No |
compile |
BF16 compiled brute-force scaling | 24-in-96 | No |
naive_quant |
Direct NVFP4 compiled rollout | 24-in-96 | Yes |
sol_rl |
Two-stage decoupled rollout | 24-in-96 | Yes |
In this repository:
diffusionnft:preview_model="peft",fullrollout_model="peft"naive_scaling:preview_model="peft",fullrollout_model="peft"compile:fullrollout_model="compile"naive_quant:fullrollout_model="compile_nvfp4"sol_rl:preview_step=6,preview_model="compile_nvfp4",fullrollout_model="compile"
Recommended first runs:
sana_diffusionnft_pickscoresd3_diffusionnft_pickscoreflux1_diffusionnft_pickscore
Reward Models¶
Current online reward suffixes:
pickscoreclipscorehpsv2imagereward
Manual Reward Checkpoints¶
HPSv2 expects local files under reward_ckpts/:
mkdir -p reward_ckpts
cd reward_ckpts
wget https://huggingface.co/laion/CLIP-ViT-H-14-laion2B-s32B-b79K/resolve/main/open_clip_pytorch_model.bin
wget https://huggingface.co/xswu/HPSv2/resolve/main/HPS_v2.1_compressed.pt
cd ..
Auto-Downloaded Reward Models¶
The other reward models are downloaded automatically on first use:
clipscore:openai/clip-vit-large-patch14pickscore:laion/CLIP-ViT-H-14-laion2B-s32B-b79Kandyuvalkirstain/PickScore_v1imagereward:ImageReward-v1.0
Acknowledgements¶
- Sol-RL training recipes in this repo draw on Advantage Weighted Matching and DiffusionNFT.