Closed Loop Evaluation#

Closed loop evaluation tests a trained mindmap model in Isaac Lab simulation. Observational data is fed to the model in real-time, and the model’s actions are executed in the simulation.

Task Demonstrations#

The following GIFs show mindmap models successfully completing each benchmark task:

Cube Stacking#	Mug in Drawer#
Drill in Box#	Stick in Bin#

Prerequisites#

Make sure you have set up mindmap and are inside the interactive Docker container.
Obtain HDF5 demonstration files by either:
- Downloading a pre-generated dataset, or
- Recording your own demonstrations
Obtain a trained model by either:
- Downloading a pre-trained checkpoint, or
- Training your own model

Running Closed Loop Evaluation#

Evaluate your model on the chosen task:

torchrun_local run_closed_loop_policy.py \
    --task cube_stacking \
    --data_type rgbd_and_mesh \
    --feature_type radio_v25_b \
    --checkpoint <LOCAL_CHECKPOINT_PATH>/best.pth \
    --demos_closed_loop 150-249 \
    --hdf5_file <LOCAL_DATASET_PATH>/mindmap_franka_cube_stacking_1000_demos.hdf5

torchrun_local run_closed_loop_policy.py \
    --task mug_in_drawer \
    --data_type rgbd_and_mesh \
    --feature_type radio_v25_b \
    --checkpoint <LOCAL_CHECKPOINT_PATH>/best.pth \
    --demos_closed_loop 150-249 \
    --hdf5_file <LOCAL_DATASET_PATH>/mindmap_franka_mug_in_drawer_250_demos.hdf5

torchrun_local run_closed_loop_policy.py \
    --task drill_in_box \
    --data_type rgbd_and_mesh \
    --feature_type radio_v25_b \
    --checkpoint <LOCAL_CHECKPOINT_PATH>/best.pth \
    --demos_closed_loop 100-199 \
    --hdf5_file <LOCAL_DATASET_PATH>/mindmap_gr1_drill_in_box_200_demos.hdf5

torchrun_local run_closed_loop_policy.py \
    --task stick_in_bin \
    --data_type rgbd_and_mesh \
    --feature_type radio_v25_b \
    --checkpoint <LOCAL_CHECKPOINT_PATH>/best.pth \
    --demos_closed_loop 100-199 \
    --hdf5_file <LOCAL_DATASET_PATH>/mindmap_gr1_stick_in_bin_200_demos.hdf5

Note

The selected demos in the commands above correspond to 100 demonstrations each from the evaluation set of the pre-trained models.

Note

Using the --record_videos flag, closed loop evaluation runs can be recorded and stored at the path specified with --record_camera_output_path <VIDEO_OUTPUT_DIR>.

Alternatively, you can evaluate a task in ground truth mode, which replays the ground truth keyposes from a dataset:

torchrun_local run_closed_loop_policy.py \
    --task cube_stacking \
    --data_type rgbd_and_mesh \
    --feature_type radio_v25_b \
    --dataset <LOCAL_DATASET_PATH> \
    --demos_closed_loop 0-9 \
    --hdf5_file <LOCAL_DATASET_PATH>/mindmap_franka_cube_stacking_1000_demos.hdf5 \
    --demo_mode execute_gt_goals

torchrun_local run_closed_loop_policy.py \
    --task mug_in_drawer \
    --data_type rgbd_and_mesh \
    --feature_type radio_v25_b \
    --dataset <LOCAL_DATASET_PATH> \
    --demos_closed_loop 0-9 \
    --hdf5_file <LOCAL_DATASET_PATH>/mindmap_franka_mug_in_drawer_250_demos.hdf5 \
    --demo_mode execute_gt_goals

torchrun_local run_closed_loop_policy.py \
    --task drill_in_box \
    --data_type rgbd_and_mesh \
    --feature_type radio_v25_b \
    --dataset <LOCAL_DATASET_PATH> \
    --demos_closed_loop 0-9 \
    --hdf5_file <LOCAL_DATASET_PATH>/mindmap_gr1_drill_in_box_200_demos.hdf5 \
    --demo_mode execute_gt_goals

torchrun_local run_closed_loop_policy.py \
    --task stick_in_bin \
    --data_type rgbd_and_mesh \
    --feature_type radio_v25_b \
    --dataset <LOCAL_DATASET_PATH> \
    --demos_closed_loop 0-9 \
    --hdf5_file <LOCAL_DATASET_PATH>/mindmap_gr1_stick_in_bin_200_demos.hdf5 \
    --demo_mode execute_gt_goals

Running in ground truth mode is useful for validating the keypose extraction pipeline and estimating the model’s maximum achievable performance before training.

Evaluation Results#

After completing all selected demonstrations, check the model’s success rate:

Console output: Look for the lines after Summary of closed loop evaluation
Evaluation file: Check the file specified by --eval_file_path (if provided)

The success rate indicates how many demonstrations the model completed successfully.

Note

Closed loop evaluation is not deterministic, i.e. the same demonstration can succeed or fail on different runs even if the same model or ground truth goals are used. Therefore, it is important to run enough demonstrations to get a statistically significant result.

Visualization Options#

To visualize model inputs and outputs during evaluation:

Add the --visualize flag to your command
Select a visualization window by clicking on it
Press space to trigger the next inference step

This allows you to observe how the model processes spatial information and makes real-time predictions.

Note

Replace <LOCAL_DATASET_PATH> with your dataset directory path and <LOCAL_CHECKPOINT_PATH> with your checkpoint directory path.

Note

For more information on parameters choice and available options, see Parameters.