Ingest massive volumes of live or archived videos and extract insights for summarization and interactive Q&A
The NVIDIA AI Blueprint for video search and summarization (VSS) makes it easy to start building and customizing video analytics AI agents. These insightful, accurate, and interactive agents are powered by generative AI, vision language models (VLMs), large language models (LLMs), and NVIDIA NIM™ microservices—helping a variety of industries make better decisions, faster. They can be given tasks through natural language and perform complex operations like video summarization and visual question-answering, unlocking entirely new application possibilities.
Test the VSS blueprint on the cloud with Launchable, a set of pre-configured sandbox instances that let you quickly try the blueprint without having to bring your own compute infrastructure.
The following NIM microservices are used in this blueprint:
cosmos-nemotron-34b
meta / llama-3.1-70b-instruct
llama-3_2-nv-embedqa-1b-v2
llama-3_2-nv-rerankqa-1b-v2
The core video search and summarization blueprint pipeline supports the following hardware:
NVIDIA AI Blueprints are customizable agentic workflow examples that include NIM microservices, reference code, documentation, and a Helm chart for deployment. This blueprint gives you a reference architecture to deploy a visual agent that can quickly generate insights from stored and streamed video through a scalable video ingestion pipeline, VLMs, and hybrid-RAG modules.
The user selects an example video and prompt to guide the agent in generating a detailed summary. The agent splits the input video into smaller segments that are processed by a VLM (The preview uses OpenAI's GPT4o). These segments are processed in parallel by the VLM pipeline to produce detailed captions describing the events of each chunk in a scalable and efficient manner. The agent recursively summarizes the dense captions using an LLM, generating a final summary for the entire video once all chunk captions are processed.
Additionally, these captions are stored in vector and graph databases to power the Q&A feature of this blueprint, allowing the user to ask any open-ended questions about the video.
Use of the models in this blueprint is governed by the NVIDIA AI Foundation Models Community License.
GOVERNING TERMS: This preview is governed by the NVIDIA API Trial Terms of Service.
For the model that includes a Llama3.1 model: Llama 3.1 Community License Agreement, Built with Llama.
For the NVIDIA Retrieval QA Mistral 4B Reranking: Apache license.
For the NVIDIA Retrieval QA E5 Embedding v5: NV-EmbedQA-E5-v5: MIT license; NV-EmbedQA-Mistral7B-v2: Apache 2.0 license, and Snowflake arctic-embed-l: Apache 2.0 license.