The Content-Localization Blueprint is a reference architecture designed for media producers and creators that deliver news, sports, movies, and television programming. It is specifically engineered to help these companies localize content for global audiences, thereby unlocking new revenue opportunities without requiring the duplication of their existing production infrastructure. This blueprint offers a modular and extensible, scalable, NIM-centric design that supports post-production workflows for localization for both audio and video workflows. It achieves this by orchestrating a suite of NVIDIA and partner AI microservices to enable key features like speech translation, active speaker detection, and AI-driven lip-sync for media.
Content-Localization Blueprint (gRPC) is a modular, reference architecture that orchestrates NVIDIA and partner AI microservices to enable localization of media.
The blueprint enables:
The blueprint is built around composable NVIDIA Inference Microservices (NIMs), custom controller logic, and client services, allowing customers and partners to integrate localization capabilities into existing broadcast or streaming pipelines without re‑architecting production workflows. Theblueprint integrates 3rd party speech-to-speech dubbing providers such as CAMB.AI and ElevenLabs alongside NVIDIA Riva.
To get access to the LipSync feature of the Content localization Blueprint, please request to join our NVIDIA AI for Media Private Access Program
The architecture supports both single‑speaker and multi‑speaker scenarios, and is designed to evolve as additional NIM capabilities become available.
GOVERNING TERMS (restricted access): The blueprint software is governed by the Apache License 2.0, and enables use of separate open source and proprietary software, models and services governed by their respective licenses, including those below.
Sample Assets: Use of the assets is governed by the NVIDIA Sample Data License.
Global
The Content-Localization Blueprint is designed for engineering‑led media organizations evaluating or deploying AI‑driven localization within audio and video pipelines.
Runtime Engine(s): NVIDIA Dynamo-Triton (formerly NVIDIA Triton Inference Server)
Supported Hardware Microarchitecture Compatibility:
Supported Operating System(s):
Acceleration Engine: TensorRT, Triton
Test Hardware: Per-NIM.
For entire blueprint:
NVIDIA believes Trustworthy AI is a shared responsibility and we have established policies and practices to enable development for a wide array of AI applications. When downloaded or used in accordance with our terms of service, developers should work with their internal model team to ensure this model meets requirements for the relevant industry and use case and addresses unforeseen product misuse.
You may not directly or indirectly use this Content Localization Blueprint to alter the name, likeness, image, or voice of any person in violation of applicable law or regulation or without the person’s express consent.
For more detailed information on ethical considerations for this model, please see the Model Card++ Explainability, Bias, Safety & Security, and Privacy Subcards.
Please report security vulnerabilities or NVIDIA AI Concerns here.

Localize and translate media and sync multiple speaker’s lips to translated audio.