GR00T-H
GR00T-H#
GR00T-H is a post-trained variant of NVIDIA Isaac GR00T N1.6 for surgical robotics. It builds on the GR00T N1.6 VLA foundation and adapts it using the Open-H embodiment dataset. The architecture combines a vision-language foundation model (SigLip2 + Eagle VLM) with a diffusion transformer head that denoises continuous actions via flow matching. Trained on 601.50 hours from the Open-H-Embodiment dataset across 7 robotic embodiments. For research and development only; not intended for clinical deployment.
| Property | Details |
|---|---|
| Model size | 3B parameters (GR00T N1.6) |
| Model type | Vision Language Action (VLA); PyTorch, TensorRT. Input: vision (variable RGB frames), state (proprioception vector), language instruction. Output: continuous-value action vectors. Supported: Ampere, Blackwell, Hopper, Lovelace, Jetson. Ubuntu. |
| Performance | Trained on 601.50 hours from Open-H-Embodiment (58 datasets, 7 embodiments). Evaluations conducted on real-world robots. |
| Workflow | — |
| Hugging Face | nvidia/GR00T-H |