Skip to content

Models & Policies#

Pre-trained policies and evaluation frameworks for healthcare robotics applications

The following pre-trained models are used in our End-to-End Workflows and are hosted on Hugging Face by NVIDIA. Each model includes size, type, performance notes, and a link to its model card.

GR00T-N1.6-Rheo Pick-N-Place Tray#

GR00T-N1.6-Rheo-PickNPlace is a vision language action model (VLA) fine-tuned for surgical instrument handling in the Isaac for Healthcare Rheo workflow. It performs pick-and-place of a sterilized box from a shelf to a cart using a G1 embodiment. Intended for Rheo simulation workflows only; not for real-world clinical deployment. NVIDIA License; Apache-2.0 for Qwen2.5-7B-Instruct and SigLIP2-SO400M. Ready for commercial/non-commercial use.
PropertyDetails
Model size3B parameters (GR00T N1.6)
Model typeVision Language Action (VLA); PyTorch 2.8.0; GR00T N1.6. Input: vision (480×640 RGB), state (1×31), language. Output: 16×32 action tensor. Linux Ubuntu 22.04/24.04. Supported: Ampere, Blackwell, Hopper.
PerformanceNVIDIA RTX 5880 Ada: 92.4 ± 1.3 ms latency, 8 GB VRAM. Trained on 120 simulation samples (manual teleoperation + Isaac Lab Mimic).
WorkflowRheo

GR00T-N1.6-Rheo Sim Push Cart#

GR00T-N1.6-Rheo-PushCart is a vision language action model (VLA) fine-tuned for surgical instrument transport in the Isaac for Healthcare Rheo workflow. It performs push-cart behavior by grasping the cart handle and moving a cart loaded with a sterilized tray to the surgical table using a G1 embodiment. Intended for Rheo simulation workflows only; not for real-world clinical deployment. NVIDIA License; Apache-2.0 for Qwen2.5-7B-Instruct and SigLIP2-SO400M. Ready for commercial/non-commercial use.
PropertyDetails
Model size3B parameters (GR00T N1.6)
Model typeVision Language Action (VLA); PyTorch 2.8.0; GR00T N1.6. Input: vision (480×640 RGB), state (1×31), language. Output: 16×32 action tensor. Linux Ubuntu 22.04/24.04. Supported: Ampere, Blackwell, Hopper.
PerformanceNVIDIA RTX 5880 Ada: 89.3 ± 1.9 ms latency, 8 GB VRAM. Trained on 90 simulation samples (manual teleoperation + Isaac Lab Mimic).
WorkflowRheo

GR00T-N1.5-RL-Rheo Assemble Trocar#

GR00T-N1.5-RL-Rheo-AssembleTrocar is a vision language action model (VLA) fine-tuned for surgical instrument handling in the Isaac for Healthcare Rheo workflow. Using a G1 embodiment, it performs trocar assembly: retrieves the trocar (obturator and cannula) from a surgical tray on the left, assembles it, and places it on a Mayo Stand on the right. Intended for Rheo simulation workflows only; not for real-world clinical deployment. NVIDIA License; Apache-2.0 for Qwen2.5-7B-Instruct and SigLIP2-SO400M. Ready for commercial/non-commercial use.
PropertyDetails
Model size3B parameters (GR00T N1.5)
Model typeVision Language Action (VLA); PyTorch 2.8.0; GR00T N1.5. Input: vision (3×480×640 RGB, head + 2 wrist cameras), state (1×28), language. Output: 16×28 action tensor. Linux Ubuntu 22.04/24.04. Supported: Ampere, Blackwell, Hopper.
PerformanceNVIDIA RTX 5880 Ada: 54.2 ± 8.5 ms latency, 8 GB VRAM. Trained on 59 simulation samples (manual teleoperation).
WorkflowRheo

Liver Scan GR00T with Cosmos#

GR00T-N1 VLA fine-tuned to mimic a simple liver ultrasound sweep in the Isaac for Healthcare ultrasound environment. Uses relative action space and Cosmos data augmentation. Input: wrist + room camera (224×224 RGB), text prompt (250 tokens), 7D joint state. Output: next 16 6D relative actions. Trained on 400 simulated liver ultrasound sweeps at 30 Hz (~210 steps each). For use within Isaac for Healthcare only.
PropertyDetails
Model size2.2B parameters (GR00T N1)
Model typeVision Language Action (VLA); PyTorch 2.5.1; Eagle-2 VLM + Diffusion Transformer
Performance83.8% average success rate @ 0.01 m (50 eval examples, 3 runs). Inference: Ampere RTX A6000 350 ms, 9.45 GB. Supported: Ampere, Blackwell, Hopper.
WorkflowRobotic Ultrasound

Liver Scan PI0 with Cosmos#

π0 (Physical Intelligence) VLA fine-tuned for liver ultrasound sweep in Isaac for Healthcare. Uses Cosmos-transfer augmented data. Input: wrist + room camera (224×224), text (250 tokens), 7D joint state. Output: next 50 6D relative actions. Trained on 400 simulated sweeps plus 400 Cosmos-transfer augmented. For use within Isaac for Healthcare only.
PropertyDetails
Model size3B parameters (π0)
Model typeVision Language Action (VLA); JAX 0.5.0; PaliGemma VLM + MoE action-expert
Performance77.0% average success rate @ 0.01 m (50 eval examples, 3 runs). Inference: RTX 4090 100 ms, 9 GB. Supported: Ampere, Blackwell, Hopper.
WorkflowRobotic Ultrasound

SO-ARM Starter GR00T#

GR00T N1.5 VLA fine-tuned for autonomous surgical assistance (surgical instrument handling) with SO-ARM101 in Isaac for Healthcare. Input: room + wrist camera (224×224), robot state, language instruction. Output: 16×6 action tensor. Trained with simulation and real-world teleoperation. For SO-ARM starter scrub-nurse–style tasks only.
PropertyDetails
Model size3B parameters (GR00T N1.5)
Model typeVision Language Action (VLA); PyTorch 2.5.1, TensorRT 10.11; Linux Ubuntu 22.04/24.04
PerformanceAda RTX 6000: PyTorch 42.2 ms, TensorRT 27.0 ms; 5.7–6.6 GB. Supported: Ampere, Blackwell, Hopper.
WorkflowSO-ARM Starter

Model retrieval is typically done automatically when running each workflow. You can also download models manually (e.g. huggingface-cli download <repo_id>).