
Record-setting accuracy and performance for Mandarin English transcriptions.

End-to-end autonomous driving stack integrating perception, prediction, and planning with sparse scene representations for efficiency and safety.

The NV-EmbedCode model is a 7B Mistral-based embedding model optimized for code retrieval, supporting text, code, and hybrid queries.

Multilingual and cross-lingual text question-answering retrieval with long context support and optimized data storage efficiency.

Fine-tuned reranking model for multilingual, cross-lingual text question-answering retrieval, with long context support.

Model for object detection, fine-tuned to detect charts, tables, and titles in documents.

Grounding dino is an open vocabulary zero-shot object detection model.

Most advanced language model for reasoning, code, multilingual tasks; runs on a single GPU.

Multilingual text reranking model.

English text embedding model for question-answering retrieval.

Multilingual text question-answering retrieval, transforming textual information into dense vector representations.

Generates high-quality numerical embeddings from text inputs.

Visual Changenet detects pixel-level change maps between two images and outputs a semantic change segmentation mask

EfficientDet-based object detection network to detect 100 specific retail objects from an input video.