GR00T-H

GR00T-H #

GR00T-H is a post-trained variant of NVIDIA Isaac GR00T N1.6 for surgical robotics. It builds on the GR00T N1.6 VLA foundation and adapts it using the Open-H embodiment dataset. The architecture combines a vision-language foundation model (SigLip2 + Eagle VLM) with a diffusion transformer head that denoises continuous actions via flow matching. Trained on 601.50 hours from the Open-H-Embodiment dataset across 7 robotic embodiments. For research and development only; not intended for clinical deployment.

Property	Details
Model size	3B parameters (GR00T N1.6)
Model type	Vision Language Action (VLA); PyTorch, TensorRT. Input: vision (variable RGB frames), state (proprioception vector), language instruction. Output: continuous-value action vectors. Supported: Ampere, Blackwell, Hopper, Lovelace, Jetson. Ubuntu.
Performance	Trained on 601.50 hours from Open-H-Embodiment (58 datasets, 7 embodiments). Evaluations conducted on real-world robots.
Workflow	—
Hugging Face	nvidia/GR00T-H

GR00T-H

GR00T-H#

GR00T-H #