NVIDIA AI-Q Blueprint for intelligent agents Blueprint by NVIDIA

NVIDIA AI-Q Blueprint

The NVIDIA AI-Q Blueprint (pronounced IQ) enables developers to build fully customizable AI agents that they own, inspect and control. Built on LangChain DeepAgents and accelerated by the NVIDIA NeMo Agent Toolkit, AI-Q is an open reference example for building AI agents. It gives you both quick answers, citations, and in-depth, report-style research in one system, with benchmarks and evaluation harnesses so you can measure quality and improve over time.

Architecture

Key Features

AI-Q is powered by a LangGraph-based state machine. While the system functions as a fully orchestrated workflow, each agent can also be executed as a standalone component. Here are the key components:

Orchestration node: Classifies intent (meta vs. research), produces meta responses when needed, and sets depth (shallow vs. deep) in one step
Shallow research agent: Bounded tool-augmented research optimized for speed
Deep research agent: Multi-phase research with planning, iteration, and citation management
Workflow configuration: YAML configs define agents, tools, LLMs, and routing behavior so you can tune workflows without code changes
Modular workflows: All agents (orchestration node, shallow researcher, deep researcher, clarifier) are composable; each can run standalone or as part of the full pipeline
Evaluation harnesses: Built-in benchmarks (e.g., FreshQA, DeepResearch) and evaluation scripts to measure quality and iterate on prompts and agent architecture
Frontend options: Run via CLI, web UI, or async jobs
Deployment options: Deployment assets for Docker Compose as well as Helm
MCP tool integration: Connect to MCP servers through NeMo Agent Toolkit
As a skill, add-on skills, and sandbox execution: Use AI-Q as a skill in agent harnesses or add on DeepAgents skills in a job-scoped sandbox

Prerequisites

Required:

Python 3.11+
UV package manager
Node.js 22+ and npm (optional, for web UI mode)
API key for your chosen provider(s):
- NVIDIA API key from NVIDIA (for NVIDIA NIM models)
- OpenAI API key (for OpenAI models)
- Anthropic API key (for Claude models)
- Google API key (for Gemini models)

Optional:

Tavily API key (for web search functionality)
Serper API key (for academic paper search functionality)

System Requirements

Local / Hybrid Development

Developer machine to run AI-Q instance (no local GPU required)
LlamaIndex (optional local RAG)
Provider / Services / RAG APIs

Fully Self-Hosted / On-Prem

Server for AI-Q instances
NVIDIA Nemotron 3 Super 120b (agent)
GPT OSS 120B (agent)
NVIDIA Nemotron 3 Nano 30b (agent)
NVIDIA Nemotron Mini 4b instruct (optional document summary)
NVIDIA RAG Blueprint (optional RAG)
LlamaIndex (optional RAG)
- Llama Nemotron Embed vl 1b v2
- Nemotron Nano 12b v2 vl

Hosted Service

Server for AI-Q instances
Provider APIs
NVIDIA RAG Blueprint (optional)
LlamaIndex (optional)

Hardware Requirements

Hardware requirements will vary depending on configured options. Refer to the below documentation for more details.

Software Components

NVIDIA Technology

3rd Party Software

LangChain for workflows
Tavily for web search
Serper for paper search (Google Scholar)
LlamaIndex for RAG

License

This project is licensed under the Apache License 2.0. See License for details.

Ethical Considerations

NVIDIA believes Trustworthy AI is a shared responsibility, and we have established policies and practices to enable development for a wide array of AI applications. When downloaded or used in accordance with our terms of service, developers should work with their supporting model team to ensure the models meet requirements for the relevant industry and use case and address unforeseen product misuse. For more detailed information on ethical considerations for the models, please see the Model Card++ Explainability, Bias, Safety & Security, and Privacy Subcards. Please report security vulnerabilities or NVIDIA AI concerns here.

Terms of Use

This service is governed by the NVIDIA API Trial Terms of Service.