
Deploying large language models (LLMs) at scale — where real users engage with AI to drive business outcomes — requires speed, flexibility and reliability. Yet as more LLMs and variants emerge, each with its own architecture and serving requirements, deployment becomes increasingly complex. Different inference frameworks can offer unique performance benefits and optimization opportunities, but managing them adds overhead.
This developer example will show you how to use the multi-LLM compatible NIM container to deploy a variety of different LLMs for high-performance inference on NVIDIA-accelerated infrastructure with a simple, unified workflow. Examples include using NIM to deploy LLMs hosted on Hugging Face and your local file system, exploring the available inference backend options per model, and deploying models in different quantization formats.
NIM supports a broad range of LLMs supported by NVIDIA TensorRT-LLM, vLLM and SGLang, including popular open LLMs and specialized variants on Hugging Face. For more details on supported LLM architectures, see the documentation.
The developer example includes an architecture diagram, an NVIDIA Brev launchable with a Jupyter notebook for rapid exploration and experimentation, and source code for local deployment. The NIM container supports the following key features:
NVIDIA NIM™ microservices
NVIDIA multi-LLM compatible NIM microservice container
Other
NVIDIA believes trustworthy AI is a shared responsibility, and we have established policies and practices to enable development for a wide array of AI applications. When downloaded or used in accordance with our terms of service, developers should work with their supporting model team to ensure the models meet requirements for the relevant industry and use case and address unforeseen product misuse. For more detailed information on ethical considerations for the models, please see the Model Card++ Explainability, Bias, Safety and Security, and Privacy Subcards. Please report security vulnerabilities or NVIDIA AI concerns here.
GOVERNING TERMS: The software is governed by the NVIDIA Software License Agreement and Product-Specific Terms for NVIDIA AI Products.