NVIDIA
Explore
Models
Blueprints
GPUs
Docs
⌘KCtrl+K
Terms of Use
Privacy Policy
Your Privacy Choices
Contact

Copyright © 2025 NVIDIA Corporation

Search Results

Searching for: cv
Sorting by Most Recent

nvidiastreampetr

StreamPETR offers efficient 3D object detection for autonomous driving by propagating sparse object queries temporally.

autonomous vehiclesbevAV Stackautomotive

nvidiaparakeet-ctc-0.6b-zh-cn

Record-setting accuracy and performance for Mandarin English transcriptions.

ASRStreamingSpeech-to-TextMandarinNVIDIA NIM

nvidia3D Object Generation

Transform your scene idea into ready-to-use 3D assets using Llama 3.1 8B, NV SANA, and Microsoft TRELLIS

BlueprintRun-on-RTXNVIDIA AI

nvidiasparsedrive

End-to-end autonomous driving stack integrating perception, prediction, and planning with sparse scene representations for efficiency and safety.

autonomous vehiclesbevav stackautomotive

nvidianv-embedcode-7b-v1

The NV-EmbedCode model is a 7B Mistral-based embedding model optimized for code retrieval, supporting text, code, and hybrid queries.

nemo retrieverEmbeddingRetrieval Augmented Generation

nvidiallama-3.2-nv-embedqa-1b-v2

Multilingual and cross-lingual text question-answering retrieval with long context support and optimized data storage efficiency.

nemo retrieverrun-on-rtxembeddingRetrieval Augmented GenerationText-to-Embedding

nvidiallama-3.2-nv-rerankqa-1b-v2

Fine-tuned reranking model for multilingual, cross-lingual text question-answering retrieval, with long context support.

nemo retrieverrun-on-rtxRetrieval Augmented Generationreranking

nvidianv-yolox-page-elements-v1

Model for object detection, fine-tuned to detect charts, tables, and titles in documents.

Object DetectionData ingestionChart Detectionnemo retrieverTable Detectionrun-on-rtxextraction

nvidiaVulnerability Analysis for Container Security

Rapidly identify and mitigate container security vulnerabilities with generative AI.

generative aiLaunchablenv-embedqa-e5-v5Blueprintllama-3_1-70b-instructcybersecurityNVIDIA AI

nvidianv-dinov2

NV-DINOv2 is a visual foundation model that generates vector embeddings for the input image.

Image-to-Embeddingcomputer visiondeepstreamNVIDIA NIMobject Classification

nvidianv-grounding-dino

Grounding dino is an open vocabulary zero-shot object detection model.

Object Detectioncomputer visiondeepstreamNVIDIA NIM

nvidianv-rerankqa-mistral-4b-v3

Multilingual text reranking model.

nemo retrieverRerankingRetrieval Augmented Generation

nvidianv-embedqa-e5-v5

English text embedding model for question-answering retrieval.

Embeddingrun-on-rtxRetrieval Augmented GenerationNemo retrieverText-to-Embedding

nvidianv-embedqa-mistral-7b-v2

Multilingual text question-answering retrieval, transforming textual information into dense vector representations.

nemo retrieverEmbeddingRetrieval Augmented Generation

nvidiamaisi

MAISI is a pre-trained volumetric (3D) CT Latent Diffusion Generative Model.

Image GenerationMedical ImagingNVIDIA NIM

nvidianvclip

NV-CLIP is a multimodal embeddings model for image and text.

Computer visionmultimodal embeddingstext and imageRun-on-rtx

nvidiaocdrnet

OCDNet and OCRNet are pre-trained models designed for optical character detection and recognition respectively.

Optical Character RecognitionimageOptical Character Detectioncvvlmcomputer visionTAO Toolkitvideo

nvidianv-embed-v1

Generates high-quality numerical embeddings from text inputs.

Non-Commercial Use OnlyRetrieval Augmented GenerationText-to-Embedding

nvidiavisual-changenet

Visual Changenet detects pixel-level change maps between two images and outputs a semantic change segmentation mask

imageImage GenerationcvImage Segmentationvlmcomputer visionTAO ToolkitvideoNVIDIA NIM

nvidiaretail-object-detection

EfficientDet-based object detection network to detect 100 specific retail objects from an input video.

Object Detectionimagecvvlmcomputer visionTAO ToolkitvideoNVIDIA NIM

googlepaligemma

Vision language model adept at comprehending text and visual inputs to produce informative responses

imagecvVision AssistantvlmVisual Question Answeringcomputer visionLanguage GenerationImage-to-Textvideo