Explore
Models
Blueprints
GPUs
Docs
⌘K
Ctrl+K
?
Login
Models
Deploy and scale models on your GPU infrastructure of choice with NVIDIA NIM inference microservices
Optimized by NVIDIA
Launch from Hugging Face
Beta
Filters
11 models
Sort By
dateCreated:DESC
Most Recent
NVIDIA
Free Endpoint
streampetr
StreamPETR offers efficient 3D object detection for autonomous driving by propagating sparse object queries temporally.
autonomous vehicles
+3
5mo
Items per page
24
1
1
of 1 pages
13.64K
NVIDIA
Downloadable
parakeet-ctc-0.6b-zh-cn
Record-setting accuracy and performance for Mandarin English transcriptions.
ASR
+4
2.94K
7mo
NVIDIA
Free Endpoint
sparsedrive
End-to-end autonomous driving stack integrating perception, prediction, and planning with sparse scene representations for efficiency and safety.
autonomous vehicles
+3
62
9mo
NVIDIA
Free Endpoint
nv-embedcode-7b-v1
The NV-EmbedCode model is a 7B Mistral-based embedding model optimized for code retrieval, supporting text, code, and hybrid queries.
nemo retriever
+2
157K
11mo
NVIDIA
Downloadable
llama-3.2-nv-embedqa-1b-v2
Multilingual and cross-lingual text question-answering retrieval with long context support and optimized data storage efficiency.
nemo retriever
+3
2.52M
9mo
NVIDIA
Downloadable
llama-3.2-nv-rerankqa-1b-v2
Fine-tuned reranking model for multilingual, cross-lingual text question-answering retrieval, with long context support.
nemo retriever
+2
103K
9mo
NVIDIA
Downloadable
nv-yolox-page-elements-v1
Model for object detection, fine-tuned to detect charts, tables, and titles in documents.
Object Detection
+6
1.23K
9mo
NVIDIA
Downloadable
nv-embedqa-e5-v5
English text embedding model for question-answering retrieval.
Embedding
+4
9.24M
9mo
NVIDIA
Downloadable
nvclip
NV-CLIP is a multimodal embeddings model for image and text.
Computer vision
+3
57.45K
10mo
NVIDIA
Free Endpoint
nv-embed-v1
Generates high-quality numerical embeddings from text inputs.
Non-Commercial Use Only
+2
3.67M
9mo
Google
Free Endpoint
paligemma
Vision language model adept at comprehending text and visual inputs to produce informative responses
image
+8
28.56K
1y