Models & Policies#
Pre-trained policies and evaluation frameworks for healthcare robotics applications
The following pre-trained models are used in our End-to-End Workflows and are hosted on Hugging Face by NVIDIA. Each model includes size, type, performance notes, and a link to its model card.
GR00T-N1.6-Rheo Pick-N-Place Tray#
GR00T-N1.6-Rheo-PickNPlace is a vision language action model (VLA) fine-tuned for surgical instrument handling in the Isaac for Healthcare Rheo workflow. It performs pick-and-place of a sterilized box from a shelf to a cart using a G1 embodiment. Intended for Rheo simulation workflows only; not for real-world clinical deployment. NVIDIA License; Apache-2.0 for Qwen2.5-7B-Instruct and SigLIP2-SO400M. Ready for commercial/non-commercial use.
| Property | Details |
|---|---|
| Model size | 3B parameters (GR00T N1.6) |
| Model type | Vision Language Action (VLA); PyTorch 2.8.0; GR00T N1.6. Input: vision (480×640 RGB), state (1×31), language. Output: 16×32 action tensor. Linux Ubuntu 22.04/24.04. Supported: Ampere, Blackwell, Hopper. |
| Performance | NVIDIA RTX 5880 Ada: 92.4 ± 1.3 ms latency, 8 GB VRAM. Trained on 120 simulation samples (manual teleoperation + Isaac Lab Mimic). |
| Workflow | Rheo |
GR00T-N1.6-Rheo Sim Push Cart#
GR00T-N1.6-Rheo-PushCart is a vision language action model (VLA) fine-tuned for surgical instrument transport in the Isaac for Healthcare Rheo workflow. It performs push-cart behavior by grasping the cart handle and moving a cart loaded with a sterilized tray to the surgical table using a G1 embodiment. Intended for Rheo simulation workflows only; not for real-world clinical deployment. NVIDIA License; Apache-2.0 for Qwen2.5-7B-Instruct and SigLIP2-SO400M. Ready for commercial/non-commercial use.
| Property | Details |
|---|---|
| Model size | 3B parameters (GR00T N1.6) |
| Model type | Vision Language Action (VLA); PyTorch 2.8.0; GR00T N1.6. Input: vision (480×640 RGB), state (1×31), language. Output: 16×32 action tensor. Linux Ubuntu 22.04/24.04. Supported: Ampere, Blackwell, Hopper. |
| Performance | NVIDIA RTX 5880 Ada: 89.3 ± 1.9 ms latency, 8 GB VRAM. Trained on 90 simulation samples (manual teleoperation + Isaac Lab Mimic). |
| Workflow | Rheo |
GR00T-N1.5-RL-Rheo Assemble Trocar#
GR00T-N1.5-RL-Rheo-AssembleTrocar is a vision language action model (VLA) fine-tuned for surgical instrument handling in the Isaac for Healthcare Rheo workflow. Using a G1 embodiment, it performs trocar assembly: retrieves the trocar (obturator and cannula) from a surgical tray on the left, assembles it, and places it on a Mayo Stand on the right. Intended for Rheo simulation workflows only; not for real-world clinical deployment. NVIDIA License; Apache-2.0 for Qwen2.5-7B-Instruct and SigLIP2-SO400M. Ready for commercial/non-commercial use.
| Property | Details |
|---|---|
| Model size | 3B parameters (GR00T N1.5) |
| Model type | Vision Language Action (VLA); PyTorch 2.8.0; GR00T N1.5. Input: vision (3×480×640 RGB, head + 2 wrist cameras), state (1×28), language. Output: 16×28 action tensor. Linux Ubuntu 22.04/24.04. Supported: Ampere, Blackwell, Hopper. |
| Performance | NVIDIA RTX 5880 Ada: 54.2 ± 8.5 ms latency, 8 GB VRAM. Trained on 59 simulation samples (manual teleoperation). |
| Workflow | Rheo |
Liver Scan GR00T with Cosmos#
GR00T-N1 VLA fine-tuned to mimic a simple liver ultrasound sweep in the Isaac for Healthcare ultrasound environment. Uses relative action space and Cosmos data augmentation. Input: wrist + room camera (224×224 RGB), text prompt (250 tokens), 7D joint state. Output: next 16 6D relative actions. Trained on 400 simulated liver ultrasound sweeps at 30 Hz (~210 steps each). For use within Isaac for Healthcare only.
| Property | Details |
|---|---|
| Model size | 2.2B parameters (GR00T N1) |
| Model type | Vision Language Action (VLA); PyTorch 2.5.1; Eagle-2 VLM + Diffusion Transformer |
| Performance | 83.8% average success rate @ 0.01 m (50 eval examples, 3 runs). Inference: Ampere RTX A6000 350 ms, 9.45 GB. Supported: Ampere, Blackwell, Hopper. |
| Workflow | Robotic Ultrasound |
Liver Scan PI0 with Cosmos#
π0 (Physical Intelligence) VLA fine-tuned for liver ultrasound sweep in Isaac for Healthcare. Uses Cosmos-transfer augmented data. Input: wrist + room camera (224×224), text (250 tokens), 7D joint state. Output: next 50 6D relative actions. Trained on 400 simulated sweeps plus 400 Cosmos-transfer augmented. For use within Isaac for Healthcare only.
| Property | Details |
|---|---|
| Model size | 3B parameters (π0) |
| Model type | Vision Language Action (VLA); JAX 0.5.0; PaliGemma VLM + MoE action-expert |
| Performance | 77.0% average success rate @ 0.01 m (50 eval examples, 3 runs). Inference: RTX 4090 100 ms, 9 GB. Supported: Ampere, Blackwell, Hopper. |
| Workflow | Robotic Ultrasound |
SO-ARM Starter GR00T#
GR00T N1.5 VLA fine-tuned for autonomous surgical assistance (surgical instrument handling) with SO-ARM101 in Isaac for Healthcare. Input: room + wrist camera (224×224), robot state, language instruction. Output: 16×6 action tensor. Trained with simulation and real-world teleoperation. For SO-ARM starter scrub-nurse–style tasks only.
| Property | Details |
|---|---|
| Model size | 3B parameters (GR00T N1.5) |
| Model type | Vision Language Action (VLA); PyTorch 2.5.1, TensorRT 10.11; Linux Ubuntu 22.04/24.04 |
| Performance | Ada RTX 6000: PyTorch 42.2 ms, TensorRT 27.0 ms; 5.7–6.6 GB. Supported: Ampere, Blackwell, Hopper. |
| Workflow | SO-ARM Starter |
Model retrieval is typically done automatically when running each workflow. You can also download models manually (e.g. huggingface-cli download <repo_id>).