NVIDIA
Explore
Models
Blueprints
GPUs
Docs
Terms of Use
Privacy Policy
Your Privacy Choices
Contact

Copyright © 2025 NVIDIA Corporation

Search Results

Searching for: TAO Toolkit
Sorting by Most Recent

stockmarkstockmark-2-100b-instruct

Japanese-specialized large-language-model for enterprises to read and understand complex business documents.

sovereign aijapanesestockmarkchatlarge language model

qwenqwen3-next-80b-a3b-thinking

80B parameter AI model with hybrid reasoning, MoE architecture, support for 119 languages.

ReasoningchatText-to-Text

microsoftTRELLIS

MSFT TRELLIS is a 3D AI model that generates high-quality 3D assets from text or image inputs.

text-to-3dRun-on-RTXimage-to-3d

deepseek-aideepseek-v3.1

DeepSeek V3.1 Instruct is a hybrid AI model with fast reasoning, 128K context, and strong tool use.

ReasoningchatText-to-Text

stabilityaistable-diffusion-3.5-large

Stable Diffusion 3.5 is a popular text-to-image generation model

Image GenerationText-to-Image

openaigpt-oss-20b

Smaller Mixture of Experts (MoE) text-only LLM for efficient AI reasoning and math

text-to-textchatreasoningmath

openaigpt-oss-120b

Mixture of Experts (MoE) reasoning LLM (text-only) designed to fit within 80GB GPU.

text-to-textchatreasoningmath

nvidiaparakeet-tdt-0.6b-v2

Accurate and optimized English transcriptions with punctuation and word timestamps

ASREnglishNVIDIA NIMNVIDIA Rivaspeech-to-text

opengpt-xteuken-7b-instruct-commercial-v0.4

Multilingual 7B LLM, instruction-tuned on all 24 EU languages for stable, culturally aligned output.

sovereign aitext-to-textchateuropeanMultilingual

nvidiamagpie-tts-flow

Expressive and engaging text-to-speech, generated from a short audio sample.

TTSText-to-SpeechNVIDIA NIMNVIDIA Riva

metallama-guard-4-12b

Multi-modal model to classify safety for input prompts as well output responses.

LLM Multimodal SafetyContent SafetyGuardrailContent Moderator

nvidiallama-3.2-nemoretriever-500m-rerank-v2

GPU-accelerated model optimized for providing a probability score that a given passage contains the information to answer a question.

nemo retrieverRetrieval Augmented Generationreranking

nvidiacosmos-transfer1-7b

Generates physics-aware video world states for physical AI development using text prompts and multiple spatial control inputs derived from real-world data or simulation.

Synthetic Data GenerationAutonomous VehiclesPhysical AIroboticsvideo-to-world

nvidiaBackground Noise Removal

Removes unwanted noises from audio improving speech intelligibility.

Nvidia MaxineSpeech-to-speechDigital HumanSpeech Enhancement

nvidiamagpie-tts-zeroshot

Expressive and engaging text-to-speech, generated from a short audio sample.

TTSText-to-SpeechNVIDIA NIMNVIDIA Riva

utter-projecteurollm-9b-instruct

State-of-the-art, multilingual model tailored to all 24 official European Union languages.

Sovereign AIchatText-to-TextMultilingualEuropeanRegional Language Generation

gotocompanygemma-2-9b-cpt-sahabatai-instruct

SOTA LLM pre-trained for instruction following and proficiency in Indonesian language and its dialects.

Sovereign AIchatIndonesianText-to-TextRegional Language Generation

nvidiacosmos-predict1-5b

Generates future frames of a physics-aware world state based on simply an image or short video prompt for physical AI development.

Synthetic Data GenerationPhysical AIpolicy evaluationroboticsvideo-to-world

nvidiasparsedrive

End-to-end autonomous driving stack integrating perception, prediction, and planning with sparse scene representations for efficiency and safety.

autonomous vehiclesbevav stackautomotive

nvidianemoretriever-table-structure-v1

Model for object detection, fine-tuned to detect charts, tables, and titles in documents.

Object DetectionChart Detectionnemo retrieverTable Detectiondata ingestionrun-on-rtx

nvidianemoretriever-graphic-elements-v1

Model for object detection, fine-tuned to detect charts, tables, and titles in documents.

Object DetectionChart Detectionnemo retrieverTable Detectiondata ingestionrun-on-rtx

nvidianemoretriever-page-elements-v2

Model for object detection, fine-tuned to detect charts, tables, and titles in documents.

Object DetectionChart Detectionnemo retrieverTable Detectiondata ingestionrun-on-rtx

googlegemma-3-1b-it

A lightweight, multilingual, advanced SLM text model for edge computing, resource constraint applications

TranslationchatText-to-TextLanguage Generation

microsoftphi-4-mini-instruct

Lightweight multilingual LLM powering AI applications in latency bound, memory/compute constrained environments

chatCode GenerationText-to-TextLanguage Generation

arcevo2-40b

Evo 2 is a biological foundation model that is able to integrate information over long genomic sequences while retaining sensitivity to single-nucleotide changes.

DNA GenerationbiologynimBionemoDrug Discovery

nvidiacanary-1b-asr

Multi-lingual model supporting speech-to-text recognition and translation.

Automatic Speech RecognitionAutomatic Speech TranslationNVIDIA NIMNVIDIA Riva

nvidiallama-3.1-nemoguard-8b-topic-control

Topic control model to keep conversations focused on approved topics, avoiding inappropriate content.

Dialogue SafetyLLM safetyGuard ModelContent safety

qwenqwen2.5-7b-instruct

Chinese and English LLM targeting for language, coding, mathematics, reasoning, etc.

Chinese Language GenerationchatText-to-TextLarge Language Models

nvidiacosmos-nemotron-34b

Multi-modal vision-language model that understands text/img/video and creates informative responses

VLMVision language modelimage captionimage to text

qwenqwen2.5-coder-32b-instruct

Advanced LLM for code generation, reasoning, and fixing across popular programming languages.

code completioncode generationchattext-to-code

qwenqwen2.5-coder-7b-instruct

Powerful mid-size code model with a 32K context length, excelling in coding in multiple languages.

code completioncode generationchattext-to-code

metallama-3.3-70b-instruct

Advanced LLM for reasoning, math, general knowledge, and function calling

ReasoningchatCode GenerationText-to-TextInstruction followingMath

university-at-buffalocached

Context-aware chart extraction that can detect 18 classes for chart basic elements, excluding plot elements.

nemo retrieverChart Element DetectionImage-To-Text

nvidianv-yolox-page-elements-v1

Model for object detection, fine-tuned to detect charts, tables, and titles in documents.

Object DetectionData ingestionChart Detectionnemo retrieverTable Detectionrun-on-rtxextraction

nvidiaaudio2face-3d

Converts streamed audio to facial blendshapes for realtime lipsyncing and facial performances.

Speech-to-AnimationDigital HumansAudio-to-FaceNVIDIA NIM

nvidianemotron-4-mini-hindi-4b-instruct

A bilingual Hindi-English SLM for on-device inference, tailored specifically for Hindi Language.

IndicchatText-to-TextLanguage Generation

ibmgranite-guardian-3.0-8b

Detects jailbreaking, bias, violence, profanity, sexual content, and unethical behavior

GuardrailText-to-text

nvidiastudiovoice

Enhance speech by correcting common audio degradations to create studio quality speech output.

Nvidia MaxineSpeech-to-speechDigital HumanRun-on-RTXSpeech Enhancement

nvidiallama-3.1-nemotron-70b-reward

Leaderboard topping reward model supporting RLHF for better alignment with human preferences.

Text-to-textReward ModelRLHF

metallama-3.2-3b-instruct

Advanced state-of-the-art small language model with language understanding, superior reasoning, and text generation.

chatCode GenerationText-to-TextLanguage Generation

metallama-3.2-1b-instruct

Advanced state-of-the-art small language model with language understanding, superior reasoning, and text generation.

chatCode GenerationText-to-TextLanguage Generation

qwenqwen2-7b-instruct

Chinese and English LLM targeting for language, coding, mathematics, reasoning, etc.

Chinese Language GenerationchatText-to-TextLarge Language Models

abacusaidracarys-llama-3.1-70b-instruct

Fine-tuned Llama 3.1 70B model for code generation, summarization, and multi-language tasks.

chatCode GenerationText-to-Text

nvidiavila

Multi-modal vision-language model that understands text/img/video and creates informative responses

VLMVision language modelimage captionimage to text

ai21labsjamba-1.5-mini-instruct

Cutting-edge MOE based LLM designed to excel in a wide array of generative AI tasks.

chatLanguage GenerationText-to-text

nvidianemotron-mini-4b-instruct

Optimized SLM for on-device inference and fine-tuned for roleplay, RAG and function calling

chatText-to-TextLanguage Generation

nvidiamistral-nemo-minitron-8b-base

State-of-the-art small language model delivering superior accuracy for chatbot, virtual assistants, and content generation.

language generationtext-to-textchatsmall language model

microsoftphi-3.5-mini-instruct

Lightweight multilingual LLM powering AI applications in latency bound, memory/compute constrained environments

chatCode GenerationText-to-TextLanguage GenerationLarge Language Models

rakutenrakutenai-7b-instruct

Advanced state-of-the-art LLM with language understanding, superior reasoning, and text generation.

chatText-to-TextLanguage GenerationLarge Language Models

rakutenrakutenai-7b-chat

Advanced state-of-the-art LLM with language understanding, superior reasoning, and text generation.

chatText-to-TextLanguage GenerationLarge Language Models

googleshieldgemma-9b

Guardrail model to ensure that responses from LLMs are appropriate and safe

GuardrailText-to-Text

googlegemma-2-2b-it

Advanced small language generative AI model for edge applications

chatCode GenerationText-to-TextLanguage Generation

nvidiausdsearch

AI-powered search for OpenUSD data, 3D models, images, and assets using text or image-based inputs.

OpenUSDSynthetic Data GenerationDigital TwinUSDText-to-3D

nvidiaeyecontact

Estimate gaze angles of a person in a video and redirect to make it frontal.

telepresenceNvidia MaxineDigital Human

thudmchatglm3-6b

Supports Chinese and English languages to handle tasks including chatbot, content generation, coding, and translation.

Text TranslationchatCode GenerationText-to-TextRegional Language Generation

baichuan-incbaichuan2-13b-chat

Support Chinese and English chat, coding, math, instruction following, solving quizzes

Chinese Language GenerationText TranslationchatText-to-Text

metallama-3.1-70b-instruct

Powers complex conversations with superior contextual understanding, reasoning and text generation.

chatCode GenerationText-to-TextLanguage Generation

metallama-3.1-8b-instruct

Advanced state-of-the-art model with language understanding, superior reasoning, and text generation.

chatCode GenerationText-to-TextLanguage GenerationRun-on-RTX

microsoftphi-3-medium-128k-instruct

Cutting-edge lightweight open language model exceling in high-quality reasoning.

chatCode GenerationText-to-TextLanguage GenerationLarge Language Models

googlegemma-2-27b-it

Cutting-edge text generation model text understanding, transformation, and code generation.

chatCode GenerationText-to-TextLanguage Generation

googlegemma-2-9b-it

Cutting-edge text generation model text understanding, transformation, and code generation.

chatCode GenerationText-to-TextLanguage Generation

nvidiallama3-chatqa-1.5-8b

Advanced LLM to generate high-quality, context-aware responses for chatbots and search engines.

text-to-textchatNon-Commercial Use Only

mistralaimistral-7b-instruct-v0.3

This LLM follows instructions, completes requests, and generates creative text.

chatText-to-TextLanguage Generation

stabilityaistable-diffusion-3-medium

Advanced text-to-image model for generating high quality images

Image GenerationText-to-Image

nvidiaocdrnet

OCDNet and OCRNet are pre-trained models designed for optical character detection and recognition respectively.

Optical Character RecognitionimageOptical Character Detectioncvvlmcomputer visionTAO Toolkitvideo

upstagesolar-10.7b-instruct

Excels in NLP tasks, particularly in instruction-following, reasoning, and mathematics.

Non-Commercial Use OnlychatText-to-TextLanguage GenerationLarge Language Models

mediatekbreeze-7b-instruct

LLM for improved language comprehension and chatbot-oriented capabilities in Traditional Chinese.

chatText-to-TextRegional Language Generation

nvidiavisual-changenet

Visual Changenet detects pixel-level change maps between two images and outputs a semantic change segmentation mask

imageImage GenerationcvImage Segmentationvlmcomputer visionTAO ToolkitvideoNVIDIA NIM

nvidiaretail-object-detection

EfficientDet-based object detection network to detect 100 specific retail objects from an input video.

Object Detectionimagecvvlmcomputer visionTAO ToolkitvideoNVIDIA NIM

microsoftphi-3-small-8k-instruct

Cutting-edge lightweight open language model exceling in high-quality reasoning.

chatCode GenerationText-to-TextLanguage GenerationLarge Language Models

microsoftphi-3-small-128k-instruct

Long context cutting-edge lightweight open language model exceling in high-quality reasoning.

chatCode GenerationText-to-TextLanguage GenerationLarge Language Models

microsoftphi-3-medium-4k-instruct

Cutting-edge lightweight open language model exceling in high-quality reasoning.

chatCode GenerationText-to-TextLanguage GenerationLarge Language Models

googlepaligemma

Vision language model adept at comprehending text and visual inputs to produce informative responses

imagecvVision AssistantvlmVisual Question Answeringcomputer visionLanguage GenerationImage-to-Textvideo

aisingaporesea-lion-7b-instruct

LLM to represent and serve the linguistic and cultural diversity of Southeast Asia

ChatText-to-TextRegional Language GenerationLarge Language Models

microsoftphi-3-mini-4k-instruct

Lightweight, state-of-the-art open LLM with strong math and logical reasoning skills.

chatCode GenerationText-to-TextLanguage GenerationLarge Language Models

microsoftphi-3-mini-128k-instruct

Lightweight, state-of-the-art open LLM with strong math and logical reasoning skills.

chatCode GenerationText-to-TextLanguage GenerationLarge Language Models

mistralaimixtral-8x22b-instruct-v0.1

An MOE LLM that follows instructions, completes requests, and generates creative text.

Advanced ReasoningchatCode GenerationText-to-TextLarge Language Models

metallama3-70b-instruct

Powers complex conversations with superior contextual understanding, reasoning and text generation.

chatLarge Language modelsCode GenerationText-to-TextLanguage Generation

metallama3-8b-instruct

Advanced state-of-the-art LLM with language understanding, superior reasoning, and text generation.

chatCode GenerationText-to-TextLanguage GenerationLarge Language Models

nvidiarerank-qa-mistral-4b

GPU-accelerated model optimized for providing a probability score that a given passage contains the information to answer a question.

RankingRetrieval Augmented Generation

googlegemma-7b

Cutting-edge text generation model text understanding, transformation, and code generation.

chatCode GenerationText-to-TextLanguage Generation

mistralaimistral-7b-instruct-v0.2

This LLM follows instructions, completes requests, and generates creative text.

chatText-to-TextLanguage GenerationNVIDIA NIM

mistralaimixtral-8x7b-instruct-v0.1

An MOE LLM that follows instructions, completes requests, and generates creative text.

Advanced ReasoningchatCode GenerationText-to-TextLarge Language Models