SEED BVH Data Preparation (SOMA Skeleton)#

This guide covers converting BONES-SEED BVH motion data (SOMA format, 77 joints) into ProtoMotions format for the soma23 humanoid.

Note

The BONES-SEED dataset contains ~142K motions — significantly larger than AMASS (~15K motions). At this scale, both conversion and training require special handling:

Conversion should be parallelized across multiple CPUs (see Scaling Up: Parallel Conversion).
Training should use sharded MotionLib chunks so each GPU only loads a subset of motions, avoiding GPU memory exhaustion (see Training with MotionLib).

Overview#

The BONES-SEED dataset contains motion capture data in BVH format using the 77-joint SOMA skeleton. BONES-SEED provides two BVH variants: SOMA Uniform (standardized skeleton shared across all motions) and SOMA Proportional (per-actor body proportions). Use SOMA Uniform — it matches the single soma23_humanoid.xml MJCF model that ProtoMotions uses.

The conversion pipeline:

Parses BVH files into local rotation matrices and root translations
Converts from the BVH bone-axis-aligned zero-pose to the standard T-pose using precomputed global rotation offsets
Subselects from 77 joints to the 23 actuated joints in the soma23_humanoid.xml MJCF
Applies Y-up to Z-up coordinate transforms and computes full rigid body states

BONES-SEED .bvh files (77 joints, 120 fps, Y-up)
     │
     ▼ (convert_soma23_bvh_to_proto.py)
ProtoMotions .motion files (23 bodies, 30 fps, Z-up)
     │
     ▼ (motion_lib.py --motion-path)
Packaged .pt MotionLib (one per chunk, or single file for small sets)

Prerequisites#

Download BONES-SEED: Download the dataset from Hugging Face. After downloading, extract the SOMA Uniform tar archive:

cd bones-seed
tar -xf soma_uniform.tar

The expected directory structure is:

bones-seed/
  └── soma_uniform/
      └── bvh/
          ├── <date_1>/
          │   ├── motion_001.bvh
          │   └── motion_002.bvh
          └── <date_2>/
              └── ...

T-pose offsets: The file data/soma/standard_t_pose_global_offsets_rots.p is included in the repository. It contains precomputed per-body global rotation offsets that convert the BVH’s bone-axis-aligned zero-pose to the standard T-pose.

Quick Start: Small Subset#

For a small number of BVH files (e.g., testing with a few motions), convert and package into a single MotionLib file:

# Convert
python data/scripts/convert_soma23_bvh_to_proto.py \
    --input-dir /path/to/bones-seed/soma_uniform/bvh \
    --output-dir /path/to/output/motions \
    --input-fps 120 \
    --output-fps 30

# Package into a single .pt
python protomotions/components/motion_lib.py \
    --motion-path /path/to/output/motions/ \
    --output-file /path/to/seed_bvh_motions.pt

# Verify visually
python examples/motion_libs_visualizer.py \
    --motion_files /path/to/seed_bvh_motions.pt \
    --robot soma23 \
    --simulator isaacgym

Key arguments:

--input-dir: Root directory to search recursively for .bvh files
--output-dir: Directory to save .motion files (preserves subdirectory structure)
--input-fps: Source BVH frame rate (default: 120)
--output-fps: Target output frame rate (default: 30)
--force-remake: Overwrite existing .motion files
--ignore-motion-filter: Skip quality filtering (useful for debugging)

What the Converter Does#

Parse BVH: Reads the BVH hierarchy (excluding the dummy Root joint) to get local rotation matrices (T, 77, 3, 3) and root translations (T, 3) in centimeters. Root translations are converted to meters.
T-pose conversion: The BVH zero-pose (all rotations = identity) places bones along their primary axis, which is NOT a natural T-pose. The change_tpose() function re-expresses the local rotations in the standard T-pose convention using precomputed global rotation offsets.
Body subselection: The 77 SOMASkeleton joints are subselected to the 23 bodies in soma23_humanoid.xml. The 54 dropped joints are leaf end-effectors (finger details, face joints, toe ends, etc.) without actuators.
Coordinate transform + FK: The Y-up local rotations are transformed to Z-up and run through the MJCF forward kinematics to produce world positions, rotations, velocities, DOF positions/velocities, and ground contact labels.
Contact label estimation: Heuristic ground contact labels are computed for all bodies based on height and velocity thresholds. These labels are used by certain reward functions and observations during training (e.g., contact-aware tracking rewards).
Quality filtering (enabled by default): Motions are rejected if they have extreme velocities, underground body parts, or unnaturally airborne segments. Use --ignore-motion-filter to disable.

Scaling Up: Parallel Conversion#

For the full BONES-SEED dataset (~142K BVH files), single-process conversion can be slow. The converter supports chunk-based parallelism via --num-rank and --slurm-rank: each rank processes a deterministic subset of files (assigned via SHA-256 hash), so all ranks run independently.

Splitting into chunks also keeps each packaged .pt file at a manageable size. Loading the entire dataset into a single file would require excessive memory.

Example: 15 parallel workers with SLURM

#!/bin/bash
#SBATCH --job-name=seed_bvh_to_proto
#SBATCH --array=0-14
#SBATCH --cpus-per-task=8
#SBATCH --mem=64G
#SBATCH --time=8:00:00

CHUNK=${SLURM_ARRAY_TASK_ID}
CHUNK_DIR=/path/to/output/chunk_$(printf '%02d' $CHUNK)
CHUNK_PT=/path/to/output/chunk_$(printf '%02d' $CHUNK).pt

python data/scripts/convert_soma23_bvh_to_proto.py \
    --input-dir /path/to/bones-seed/soma_uniform/bvh \
    --output-dir $CHUNK_DIR \
    --input-fps 120 --output-fps 30 \
    --num-rank 15 --slurm-rank $CHUNK

python protomotions/components/motion_lib.py \
    --motion-path $CHUNK_DIR/ \
    --output-file $CHUNK_PT

Without SLURM (GNU parallel / shell loop):

for RANK in $(seq 0 14); do
    python data/scripts/convert_soma23_bvh_to_proto.py \
        --input-dir /path/to/bones-seed/soma_uniform/bvh \
        --output-dir /path/to/output/chunk_$(printf '%02d' $RANK) \
        --input-fps 120 --output-fps 30 \
        --num-rank 15 --slurm-rank $RANK &
done
wait

for RANK in $(seq 0 14); do
    python protomotions/components/motion_lib.py \
        --motion-path /path/to/output/chunk_$(printf '%02d' $RANK)/ \
        --output-file /path/to/output/chunk_$(printf '%02d' $RANK).pt
done

Training with MotionLib#

Single-file MotionLib (small datasets):

python protomotions/train_agent.py \
    --robot-name soma23 \
    --simulator isaacgym \
    --experiment-path examples/experiments/mimic/mlp.py \
    --experiment-name soma23_seed_bvh \
    --motion-file /path/to/seed_bvh_motions.pt \
    --num-envs 4096 \
    --batch-size 16384

Sharded MotionLib (full BONES-SEED, multi-GPU):

For the full dataset, use sharded chunks so each GPU only loads one chunk into memory. Name your chunk files with the slurmrank placeholder pattern, e.g. chunk_slurmrank.pt. At runtime, MotionLib (see protomotions/components/motion_lib.py:process_packaged_motion_file_name_multi_gpu) discovers all matching files (chunk_00.pt, chunk_01.pt, …) and assigns each GPU rank a chunk via round-robin (rank % num_chunks).

python protomotions/train_agent.py \
    --robot-name soma23 \
    --simulator isaaclab \
    --experiment-path examples/experiments/mimic/mlp.py \
    --experiment-name soma23_seed_bvh_allchunks \
    --motion-file /path/to/output/chunk_slurmrank.pt \
    --ngpu 8 --nodes 3 \
    --num-envs 8192 \
    --batch-size 16384 \
    --training-max-steps 10000000000000 \
    --use-slurm --use-wandb

This launches 24 GPUs (3 nodes × 8 GPUs), each loading one of the 15 chunks (wrapped around via rank % 15). Each GPU trains on its own motion subset, keeping per-GPU memory usage bounded.

Next Steps#

Kimodo-Generated Motion Preparation - Prepare motions generated by Kimodo
SMPL Training on AMASS - SMPL training workflow