🤖 Robotic Ultrasound Training with PI Zero
This repository provides a complete workflow for training PI Zero models for robotic ultrasound applications. PI Zero is a powerful vision-language-action model developed by Physical Intelligence that can be fine-tuned for various robotic tasks.
📋 Table of Contents
- Overview
- Installation
- Data Collection
- Data Conversion
- Training Configuration
- Running Training
- Understanding Outputs
- Testing Inference
- Troubleshooting
- Acknowledgements
🔍 Overview
This workflow enables you to:
- Collect the robot trajectories and camera image data while the robot is performing a ultrasound scan
- Convert the collected HDF5 data to LeRobot format (required by PI Zero)
- Fine-tune a PI Zero model using either full supervised fine-tuning or LoRA (Low-Rank Adaptation)
- Deploy the trained model for inference
🛠️ Installation
First, install the OpenPI repository and its dependencies using our provided script:
# Install OpenPI with dependencies
./tools/env_setup_robot_us.sh
This script: - Clones the OpenPI repository at a specific commit - Updates Python version requirements in pyproject.toml - Installs LeRobot and other dependencies - Sets up the OpenPI client and core packages
📊 Data Collection
To train a PI Zero model, you'll need to collect robotic ultrasound data. We provide a state machine implementation in the simulation environment that can generate training episodes that emulate a liver ultrasound scan.
See the simulation README for more information on how to collect data.
🔄 Data Conversion
PI Zero uses the LeRobot data format for training. We provide a script to convert your HDF5 data to this format.
Please move to the training
folder and execute:
python convert_hdf5_to_lerobot.py /path/to/your/hdf5/data
/path/to/your/hdf5/data
with the actual path to your dataset.
Arguments:
- data_dir
: Path to the directory containing HDF5 files. (default: "--repo_id
: Name for your dataset (default: "i4h/robotic_ultrasound")
- --task_prompt
: Text description of the task (default: "Perform a liver ultrasound.")
- --image_shape
: Shape of the image data as a comma-separated string, e.g., '224,224,3' (default: "224,224,3")
The script will: 1. Create a LeRobot dataset with the specified name 2. Process each HDF5 file and extract observations, states, and actions 3. Save the data in LeRobot format 4. Consolidate the dataset for training
The converted dataset will be saved in ~/.cache/huggingface/lerobot/<repo_id>
.
⚙️ Training Configuration
Full Fine-tuning vs. LoRA
- Full Fine-tuning (
robotic_ultrasound
): Updates all model parameters. Requires more GPU memory (>70GB) but can achieve better performance on larger datasets. - LoRA Fine-tuning (
robotic_ultrasound_lora
): Uses Low-Rank Adaptation to update a small subset of parameters. Requires less GPU memory (~22.5GB) while still achieving good results.
🚀 Running Training
To start training with the default LoRA configuration, please move to the current pi_zero
folder and execute:
python train.py --config robotic_ultrasound_lora --exp_name liver_ultrasound
--config
: Training configuration to use (options: robotic_ultrasound
, robotic_ultrasound_lora
)
- --exp_name
: Name for your experiment (used for wandb logging and checkpoints)
- --repo_id
: Repository ID for the dataset (default: i4h/robotic_ultrasound
)
For better GPU memory utilization, you can set:
XLA_PYTHON_CLIENT_MEM_FRACTION=0.9 python train.py --config robotic_ultrasound_lora --exp_name liver_ultrasound
📊 Understanding Outputs
Normalization Statistics
The training script automatically computes normalization statistics for your dataset on the first run. These statistics are stored in the assets
directory and are used to normalize inputs during training and inference.
Checkpoints
Training checkpoints are saved in the checkpoints
directory with the following structure:
checkpoints/
└── [config_name]/
└── [exp_name]/
├── 1000/
├── 2000/
└── ...
Each numbered directory contains a checkpoint saved at that training step.
Logging
Training progress is logged to: 1. The console (real-time updates) 2. Weights & Biases (if configured)
To view detailed training metrics, ensure you log into W&B:
wandb login
🚀 Testing Inference
See the policy_runner README for more information on how to test inference with the trained model.
🔧 Troubleshooting
Common Issues
- Out of memory errors: Try using the LoRA configuration (
robotic_ultrasound_lora
) instead of full fine-tuning, or reduce the batch size in the config. - Data format errors: Ensure your HDF5 data follows the expected format. Check the playback script to validate.
- Missing normalization statistics: The first training run should generate these automatically. If missing, check permissions in the assets directory.
🙏 Acknowledgements
This project builds upon the PI Zero model developed by Physical Intelligence. We thank them for making their models and training infrastructure available to the community.
Happy training! If you encounter any issues or have questions, please open an issue in this repository.