Skip to content

Sol-RL Logo

Sol-RL: FP4 Explore, BF16 Train for SANA, FLUX.1, and SD3.5-L

Sol-RL HomepageSol-RL Arxiv

This guide covers Sol-RL post-training in Sana, including single-node launchers, config families, reward setup, and model-specific notes for SANA, FLUX.1, and SD3.5-L.

Base installation is shared with the rest of the repo and is documented in Installation.

If you want the NVFP4 path (*_naive_quant_* or *_sol_rl_*), also install transformer-engine with the same Python interpreter used by torchrun:

python -m pip install --no-build-isolation "transformer-engine[pytorch]"

How to Train

Default single-node launchers:

bash train_scripts/sol_rl/run_sana_single_node_8gpu.sh
bash train_scripts/sol_rl/run_sd3_single_node_8gpu.sh
bash train_scripts/sol_rl/run_flux1_single_node_8gpu.sh

Examples:

CONFIG_SPEC=configs/sol_rl/sana.py:sana_diffusionnft_pickscore \
bash train_scripts/sol_rl/run_sana_single_node_8gpu.sh
CONFIG_SPEC=configs/sol_rl/sd3.py:sd3_compile_hpsv2 \
bash train_scripts/sol_rl/run_sd3_single_node_8gpu.sh
CONFIG_SPEC=configs/sol_rl/flux1.py:flux1_sol_rl_imagereward \
bash train_scripts/sol_rl/run_flux1_single_node_8gpu.sh

Configuration Families

Config naming pattern:

<model>_<family>_<reward>

Examples:

  • sana_diffusionnft_pickscore
  • sd3_compile_hpsv2
  • flux1_sol_rl_imagereward
Family Meaning Rollout shape TE / NVFP4 needed
diffusionnft PEFT-only baseline 24-in-24 No
naive_scaling PEFT brute-force scaling 24-in-96 No
compile BF16 compiled brute-force scaling 24-in-96 No
naive_quant Direct NVFP4 compiled rollout 24-in-96 Yes
sol_rl Two-stage decoupled rollout 24-in-96 Yes

In this repository:

  • diffusionnft: preview_model="peft", fullrollout_model="peft"
  • naive_scaling: preview_model="peft", fullrollout_model="peft"
  • compile: fullrollout_model="compile"
  • naive_quant: fullrollout_model="compile_nvfp4"
  • sol_rl: preview_step=6, preview_model="compile_nvfp4", fullrollout_model="compile"

Recommended first runs:

  • sana_diffusionnft_pickscore
  • sd3_diffusionnft_pickscore
  • flux1_diffusionnft_pickscore

Reward Models

Current online reward suffixes:

  • pickscore
  • clipscore
  • hpsv2
  • imagereward

Manual Reward Checkpoints

HPSv2 expects local files under reward_ckpts/:

mkdir -p reward_ckpts
cd reward_ckpts

wget https://huggingface.co/laion/CLIP-ViT-H-14-laion2B-s32B-b79K/resolve/main/open_clip_pytorch_model.bin
wget https://huggingface.co/xswu/HPSv2/resolve/main/HPS_v2.1_compressed.pt

cd ..

Auto-Downloaded Reward Models

The other reward models are downloaded automatically on first use:

  • clipscore: openai/clip-vit-large-patch14
  • pickscore: laion/CLIP-ViT-H-14-laion2B-s32B-b79K and yuvalkirstain/PickScore_v1
  • imagereward: ImageReward-v1.0

Acknowledgements