
Accelerate post-training of end-to-end autonomous vehicle stacks with vector search and retrieval for large video datasets.
Autonomous vehicle (AV) development is a continuous process, requiring repeated post-training to refine the end-to-end stack and add new capabilities. Post-training for specific tasks requires datasets targeted to that scenario—such as a highway curve in low-light conditions. To effectively curate such datasets, developers must search through petabytes of multimodal training data, an incredibly labor-intensive process that requires detailed annotations and high recall accuracy.
NVIDIA Cosmos Dataset Search is a vector search workflow that rapidly accelerates the data labeling and processing pipeline for AV developers. It uses the Cosmos Embed NIM to enable semantic search, bypassing the need for human annotation and improving recall quality. It connects to NVIDIA Cosmos Curator to refine datasets and retrieve queried data with incredible efficiency and accuracy.
By enabling rapid, precise discovery of targeted scenarios, AV developers can:
The overall experience is divided into the following parts:
Standalone Deployment
Kubernetes
Inference:
Indexing:
Search:
Note: B100, GB200, RTX 6000 are not yet supported by the blueprint.
For the most up to date information refer to Blueprint Docs page.
Use of the software is governed by the NVIDIA Software License Agreement and the Product-Specific Terms for NVIDIA AI Products. Use of the NVIDIA Cosmos-Embed1-224p model is governed by the NVIDIA Community Model License. Use of the dataset is governed by the NVIDIA Autonomous Vehicle Dataset License Agreement. ADDITIONAL INFORMATION: For the Cosmos-Embed1-224p model, Apache 2.0 and MIT License.