Healthcare, the world’s largest customer service sector, faces urgent pressure to digitize patient interactions. Ambient voice AI is key, however, the next leap will come from Generative AI reasoning models—these will enable voice agents to provide intelligent, context-aware responses, automate documentation, and deliver highly personalized care, fundamentally transforming workflows and scaling efficient, accurate healthcare delivery. This developer example provides developers with the ingredients to build and scale such agents. This developer example has two use cases - Ambient Provider Voice Agent and Ambient Patient Voice Agent. The Ambient Provider Voice Agent does more than transcribe patient-provider conversations. It understands context, infers intent, and generates nuanced, structured clinical documentation in the SOAP note format (Subjective, Objective, Assessment, Plan) —autonomously, reducing manual input and supporting better clinical decisions. The Ambient Patient Voice Agent manages high-volume patient touchpoints (e.g., clinic intake, surveys, appointment scheduling, information queries) without clinician involvement. Its ability to reason dynamically allows for more personalized, empathetic patient interactions and real-time problem-solving within complex healthcare contexts.
Advanced Transcription
Fast Medical Reasoning
Comprehensive Speech Pipeline
Intelligent Guardrails
Note: Users may have to wait 5–10 minutes for the instance to start, depending on cloud availability.
The ambient healthcare agents developer example supports the following hardware and system configurations:
Self-Hosted Configuration:
| Service | Use Case | Recommended GPU |
|---|---|---|
| Riva ASR NIM | Audio Transcription and Diarization | 1x various options including L40, A100, and more (see modelcard) |
| Reasoning Model | Medical Note (SOAP) Generation | 2x H100 80 GB or 4x A100 80 GB |
NVIDIA API Catalog Configuration:
No GPU requirement when using public NVIDIA endpoints for NIM microservices (build.nvidia.com)
Self-Hosted Configuration:
| Service | Use Case | Recommended GPU |
|---|---|---|
| Riva ASR NIM | Audio Transcription | 1x various options including L40, A100, and more (see modelcard) |
| Riva TTS NIM | Speech Synthesis | 1x various options including L40, A100, and more (see modelcard) |
| NemoGuard Content Safety Model (Optional for enabling NeMo Guardrails) | Content Safety | 1x options including A100, H100, L40S, A6000 (see modelcard) |
| NemoGuard Topic Control Model (Optional for enabling NeMo Guardrails) | Topic Control | 1x options including A100, H100, L40S, A6000 (see modelcard) |
| Instruct Model | Agent Reasoning and Tool Calling | 2x H100 80 GB or 4x A100 80GB (see modelcard) |
NVIDIA API Catalog Configuration:
No GPU requirement when using public NVIDIA endpoints for NIM microservices (build.nvidia.com)
Please be aware of the following security considerations when using this repository:
For more information, see the SECURITY.md file in the developer example repository.
NVIDIA believes Trustworthy AI is a shared responsibility, and we have established policies and practices to enable development for a wide array of AI applications. When downloaded or used in accordance with our terms of service, developers should work with their supporting model team to ensure the models meet requirements for the relevant industry and use case and addresses unforeseen product misuse. For more detailed information on ethical considerations for the models, please see the Model Card++ Explainability, Bias, Safety & Security, and Privacy Subcards. Please report security vulnerabilities or NVIDIA AI Concerns here.
The API trial service is governed by the NVIDIA API Trial Terms of Service. The developer example software is governed by the Apache 2.0 License. Use of the NIM containers is governed by the NVIDIA Software License Agreement and Product Specific Terms for NVIDIA AI Products. Use of the ASR Parakeet 1.1b CTC en-US, ASR Parakeet CTC Riva 1.1b, Magpie TTS Multilingual, and Llama-3.3-70b-Instruct models is governed by the NVIDIA Community Model License Agreement. Use of the Llama-3.1-Nemoguard-8b-Topic-Control, Llama-3.1-Nemoguard-8b-Content-Safety and Llama-3.3-Nemotron-Super-49B-v1 models is governed by the NVIDIA Open Model License Agreement. Use of the Ace-Controller software is governed by the BSD 2-Clause License. ADDITIONAL INFORMATION: For Llama-3.1-Nemoguard-8b-Topic-Control and Llama-3.1-Nemoguard-8b-Content-Safety, Llama 3.1 Community License Agreement. For Llama-3.3-Nemotron-Super-49b-v1 and Llama-3.3-70b-Instruct, Llama 3.3 Community License Agreement. Built with Llama.

Build advanced AI agents for providers and patients using this developer example powered by NeMo Microservices, NVIDIA Nemotron, Riva ASR and TTS, and NVIDIA LLM NIM