Core Abstractions#
This document covers the main components of ProtoMotions in detail.
MotionLib#
Location: protomotions/components/motion_lib.py
The MotionLib stores reference motion data for imitation learning.
Why pack motions into tensors?
When running 4096 parallel environments, each needs to query a different motion at a different time. Naive per-motion storage would require 4096 separate lookups.
Instead, we concatenate all motions into contiguous tensors:
gts: Tensor[total_frames, num_bodies, 3] # positions
grs: Tensor[total_frames, num_bodies, 4] # rotations (quaternion)
gvs: Tensor[total_frames, num_bodies, 3] # linear velocities
gavs: Tensor[total_frames, num_bodies, 3] # angular velocities
dps: Tensor[total_frames, num_dofs] # DOF positions
dvs: Tensor[total_frames, num_dofs] # DOF velocities
contacts: Tensor[total_frames, num_bodies] # contact labels
Frame boundaries are tracked via:
length_starts: Tensor[num_motions] # Start index of each motion
motion_num_frames: Tensor[num_motions] # Number of frames per motion
motion_lengths: Tensor[num_motions] # Duration in seconds
Querying motions:
# Get state for multiple envs at once
state = motion_lib.get_motion_state(
motion_ids=torch.tensor([0, 1, 2, ...]), # Which motion per env
motion_times=torch.tensor([0.5, 1.2, ...]) # Time in seconds
)
# Returns RobotState with interpolated values
Frame interpolation provides smooth motion even at fractional times.
Terrain#
Location: protomotions/components/terrains/
Terrain generates procedural heightfields for training robustness.
Configuration:
TerrainConfig(
sim_config=TerrainSimConfig(
static_friction=1.0,
dynamic_friction=1.0,
),
# Subterrain types, proportions, etc.
)
Complex terrain is optional - we set to flat terrain for basic training.
SceneLib#
Location: protomotions/components/scene_lib.py
SceneLib manages objects in the environment for interaction tasks.
Object types:
Box: Simple box primitive
Sphere: Sphere primitive
Mesh: Custom mesh from file
Example:
scene = Scene(objects=[
SceneObject(
object_type="box",
position=[1.0, 0.0, 0.5],
dimensions=[0.5, 0.5, 1.0],
fix_base_link=True, # Static object
)
])
Point cloud generation: SceneLib can generate point clouds of objects for perception-based policies.
Robot Config#
Location: protomotions/robot_configs/
Robot configs define “what to simulate” beyond the MJCF file.
Why not just use MJCF?
MJCF defines the robot’s physical structure, but we need additional information:
Semantic body mappings: Which bodies are feet, hands, head?
Simulator-specific settings: Different solvers need different iterations
Control parameters: PD gains, action scaling
Asset paths: MJCF for physics, USD for IsaacLab
Key fields:
@dataclass
class RobotConfig:
# Semantic mappings (used for contact detection, observations)
common_naming_to_robot_body_names: Dict[str, List[str]]
# Physics assets
asset: RobotAssetConfig
# Control (PD gains, action scaling)
control: ControlConfig
# Per-simulator physics settings
simulation_params: SimulatorParams
# Populated from MJCF (auto-extracted)
kinematic_info: KinematicInfo # Body hierarchy, joint info
Relationship to PoseLib:
pose_lib.extract_kinematic_info() parses the MJCF and populates kinematic_info.
This provides the body hierarchy needed for FK/IK and observation computation.
Simulator#
Location: protomotions/simulator/base_simulator/
The simulator abstraction wraps different physics backends.
Interface:
class BaseSimulator:
def step(self, actions: Tensor) -> None: ...
def get_state(self) -> SimulatorState: ...
def set_state(self, state: SimulatorState) -> None: ...
def reset_envs(self, env_ids: Tensor) -> None: ...
All backends (IsaacGym, IsaacLab, Newton, Genesis) implement this interface, allowing environment code to be simulator-agnostic.
SimulatorState:
This is the central data structure shared among data prep, simulator, and env. See Simulator State for more details.
Environment#
Location: protomotions/envs/
Environments orchestrate the training loop through modular components. The base environment delegates to specialized managers for observations, rewards, terminations, and control.
Structure:
BaseEnv: Core environment with component managersMimicEnv: Extends BaseEnv for motion imitationSteeringEnv: Extends BaseEnv for locomotion control
Component System#
Location: protomotions/envs/managers/
ProtoMotions uses a component-based architecture where observations, rewards, terminations, and control are defined as modular, reusable components.
Why components?
Instead of hardcoding observations and rewards in environment classes, components allow you to:
Mix and match different observation/reward combinations via config
Add new rewards without modifying environment code
Share components across different environment types
Configure everything in experiment files
Control Components#
Location: protomotions/envs/control/
Control components are stateful task managers that define environment behavior.
class ControlComponent(ABC):
def reset(self, env_ids: Tensor): ...
def step(self): ...
def get_context(self) -> Dict[str, Any]: ...
def should_terminate(self) -> Tuple[Tensor, Tensor]: ...
Key characteristics:
Maintain state across timesteps (e.g., current target motion, path waypoints)
Provide context variables for observations and rewards
Can define custom termination conditions
Have access to the full environment
Built-in components:
MimicControlComponent: Motion tracking (samples motions, tracks progress)SteeringControlComponent: Heading and speed targetsPathFollowerControlComponent: Path generation and following
Configuration:
control_components = {
"mimic": MimicControlConfig(
bootstrap_on_episode_end=True,
)
}
Observation Components#
Location: protomotions/envs/obs/
Observation components are stateless functions that compute observations from context variables.
@dataclass
class ObservationComponentConfig:
function: Callable[..., Tensor] # Pure function
variables: Dict[str, str] # Maps args to context keys
Key characteristics:
Pure functions with no side effects
Receive context from control components
Can specify body subsets via
indices_subsetSupport observation noise injection
Configuration:
from protomotions.envs.obs import (
max_coords_obs_factory,
mimic_target_poses_max_coords_factory,
)
observation_components = {
"max_coords_obs": max_coords_obs_factory(),
"target_poses": mimic_target_poses_max_coords_factory(),
}
Factory functions create pre-configured ObservationComponentConfig instances.
Reward Components#
Location: protomotions/envs/utils/rewards.py
Reward components are stateless functions that compute reward terms.
@dataclass
class RewardComponentConfig:
function: Callable[..., Tensor]
variables: Dict[str, str]
weight: float = 1.0
grace_period: float = 0.0 # Seconds before reward activates
reward_type: str = "multiplicative" # or "additive"
Configuration:
from protomotions.envs.rewards import (
gt_rew_factory,
action_smoothness_factory,
)
reward_components = {
"gt_rew": gt_rew_factory(weight=0.5, coefficient=-100.0),
"action_smoothness": action_smoothness_factory(weight=-0.02),
}
Reward types:
multiplicative: Combined as product (default for tracking rewards)additive: Summed directly (for penalties like action smoothness)
Termination Components#
Location: protomotions/envs/utils/terminations.py
Termination components check for episode termination conditions.
@dataclass
class TerminationComponentConfig:
function: Callable[..., Tensor]
variables: Dict[str, str]
Configuration:
from protomotions.envs.terminations import tracking_error_factory
termination_components = {
"tracking_error": tracking_error_factory(threshold=0.5),
}
Built-in terminations:
tracking_error: Terminates when tracking error exceeds thresholdHeight termination and max episode length are handled by
BaseEnvdirectly
Managers#
Location: protomotions/envs/managers/
Managers orchestrate component evaluation:
ControlManager: Initializes and steps control componentsObservationManager: Evaluates observation functions, caches body indicesRewardManager: Evaluates rewards, applies grace periods, combines termsTerminationManager: Evaluates termination conditions
Data flow:
Control Components → Context Dict → Observation/Reward/Termination Functions
│ │
└─────────────────────────────────────────┘
↓
Environment Step
Agent#
Location: protomotions/agents/
Agents implement RL algorithms.
Structure:
BaseAgent: Training loop, checkpointingPPOAgent: Proximal Policy OptimizationAMPAgent: Adversarial Motion PriorsASEAgent: Adversarial Skill EmbeddingsMaskedMimicAgent: Masked motion imitation with tracking rewards
Next Steps#
Environment Context - Context dictionary connecting components
PoseLib Toolkit - MJCF parsing and FK/IK utilities
Simulator State - State representation details