
AI systems often face a trade-off between accuracy, latency, and cost. Complex reasoning or multimodal queries need powerful models, but routing every request through the same large model wastes compute and increases response times. Simpler queries don’t need that level of reasoning or visual understanding.
This developer example makes model selection dynamic and data-driven. It supports both text and image inputs and offers two main strategies:
By evaluating each request’s complexity, modality, and intent in real time, the router can send lightweight queries to fast, efficient models and reserve high-capacity models for tasks that actually need them. The result is a system that maintains strong performance while reducing unnecessary compute costs.
This developer example includes architectural diagrams, Docker-based deployment configurations, Jupyter notebooks for exploration and training, and complete source code for local deployment and customization. The LLM Router example supports the following key features and components:
NVIDIA NIM™ microservices and Nemotron Models
External Models
Infrastructure
NVIDIA believes trustworthy AI is a shared responsibility, and we have established policies and practices to enable development for a wide array of AI applications. When downloaded or used in accordance with our terms of service, developers should work with their supporting model team to ensure the models meet requirements for the relevant industry and use case and address unforeseen product misuse. For more detailed information on ethical considerations for the models, please see the Model Card++ Explainability, Bias, Safety and Security, and Privacy Subcards. Please report security vulnerabilities or NVIDIA AI concerns here.
Use of the models in this blueprint is governed by the NVIDIA AI Foundation Models Community License.
GOVERNING TERMS: The software is governed by the NVIDIA Software License Agreement and Product-Specific Terms for NVIDIA AI Products. Use of the Complexity and Task Qualifier model is governed by the NVIDIA Open Model License Agreement. Additional Information: MIT License.
GOVERNING TERMS: The NIM container is governed by the NVIDIA Software License Agreement and the Product Specific Terms for AI Products;
GOVERNING TERMS: The NIM container is governed by the NVIDIA Software License Agreement and the Product Specific Terms for AI Products;
GOVERNING TERMS: The NIM container is governed by the NVIDIA Software License Agreement and the Product Specific Terms for AI Products;
Use of these model is governed by the NVIDIA AI Foundation Models Community License Agreement. ADDITIONAL INFORMATION: Llama 3.1 Community License Agreement, Built with Llama;