This blueprint utilizes Pipecat to create a deployable voice agent, built on NVIDIA NIM microservices, for seamless integration into production environments. A production-ready conversational voice agent requires the integration of several complex components, including multiple AI models (such as STT, LLM, TTS, and guardrails), conversation context management, and frameworks for state management and legacy system integration. Additionally, it involves handling hooks for RAG, phrase endpointing, interruption management, ultra-low latency network transport, echo cancellation, and background noise reduction. The solution also requires integration with telephony systems, client-side SDKs for connection management and multimedia exchange, and integration with evaluation and observability tools. All these elements must be managed to ensure conversational latency (500-1500ms for voice-to-voice responses).
Pipecat, created by Daily.co, is an open-source framework that addresses these challenges, supporting 40+ AI models and services as plugins and offering SDKs for various platforms including Python, JavaScript, React, iOS, Android, and C++.
This blueprint gives developers a one-click deployable conversational voice agent. Enterprise easily can build and deploy voice agents across use cases, including customer service, virtual assistants, productivity, gaming, and IoT.
The blueprint is:
The solution leverages NVIDIA's cloud-based API Catalog endpoints, eliminating the need for local GPU hardware. All model inference is performed on NVIDIA's cloud infrastructure.
NIM microservices
3rd-Party Technologies
NVIDIA believes Trustworthy AI is a shared responsibility, and we have established policies and practices to enable development for a wide array of AI applications. When downloaded or used in accordance with our terms of service, developers should work with their supporting model team to ensure the models meet requirements for the relevant industry and use case and address unforeseen product misuse. For more detailed information on ethical considerations for the models, please see the Model Card++ Explainability, Bias, Safety & Security, and Privacy Subcards. Please report security vulnerabilities or NVIDIA AI concerns here.
Use of the models in this blueprint are governed by the NVIDIA AI Foundation Models NVIDIA AI Foundation Models Community License.
GOVERNING TERMS: The blueprint is governed by the NVIDIA Agreements | Enterprise Software | NVIDIA Software License Agreement and NVIDIA Agreements | Enterprise Software | Product Specific Terms for AI Product.
GOVERNING TERMS: The NIM container is governed by the NVIDIA Software License Agreement and the Product Specific Terms for AI Products;
Use of this model is governed by the NVIDIA AI Foundation Models Community License Agreement. ADDITIONAL INFORMATION: Llama 3.3 Community License Agreement, Built with Llama.
NVIDIA Riva Models Please refer to the Governing terms for NVIDIA parakeet-ctc-1_1b-asr here Please refer to the Governing terms for NVIDIA FastPitch-HifiGAN here
Automate voice AI agents with NVIDIA NIM microservices and Pipecat.