Skip to content

GR00T-H

← Back to Models & Policies

GR00T-H#

GR00T-H is a post-trained variant of NVIDIA Isaac GR00T N1.6 for surgical robotics. It builds on the GR00T N1.6 VLA foundation and adapts it using the Open-H embodiment dataset. The architecture combines a vision-language foundation model (SigLip2 + Eagle VLM) with a diffusion transformer head that denoises continuous actions via flow matching. Trained on 601.50 hours from the Open-H-Embodiment dataset across 7 robotic embodiments. For research and development only; not intended for clinical deployment.
PropertyDetails
Model size3B parameters (GR00T N1.6)
Model typeVision Language Action (VLA); PyTorch, TensorRT. Input: vision (variable RGB frames), state (proprioception vector), language instruction. Output: continuous-value action vectors. Supported: Ampere, Blackwell, Hopper, Lovelace, Jetson. Ubuntu.
PerformanceTrained on 601.50 hours from Open-H-Embodiment (58 datasets, 7 embodiments). Evaluations conducted on real-world robots.
Workflow
Hugging Facenvidia/GR00T-H