Skip to main content
NVIDIA
Explore
Models
Skills
Blueprints
GPUs
Docs
Terms of Use
Privacy Policy
Your Privacy Choices
Contact

Copyright © 2026 NVIDIA Corporation

66 results for

Filters

  • Free Endpoint
    15
  • Partner Endpoint
    15
  • Download Available
    26
  • Developer Example
    1
  • Enterprise Blueprint
    1
  • Launchable
    1
  • Image Generation
    7
  • Image-to-Text
    7
  • Text-to-Image
    7
  • Synthetic Data Generation
    3
  • Optical Character Recognition
    2
  • Deepinfra
    14
  • Bitdeer
    5
  • OpenRouter
    5
  • GMI Cloud
    3
  • Vultr
    3
  • NVIDIA
    47
  • Black forest labs
    4
  • Qwen
    4
  • Google
    3
  • Meta
    2
  • Developer
    23
  • AI Engineer
    22
  • Ml Engineer
    21
  • Application Developer
    15
  • Data Scientist
    8
  • NVIDIA AI
    1
  • NVIDIA Isaac GR00T
    1
  • NVIDIA Omniverse
    1
  • AI And Machine Learning
    20
  • Physical AI
    5
  • Developer Tools
    2
  • Infrastructure
    1
  • B200
    3
  • H100 80GB HBM3
    2
  • A100 PG509 200
    1
  • A100 SXM4 80GB
    1
  • A10G
    1
  • TAO Toolkit
    19
  • Jetson
    7
  • NeMo Retriever
    1
  • Physical AI Dataset
    1
  • Qwen
    Downloadable

    qwen-image

    Qwen-Image is a text-to-image foundation model with advanced multilingual text rendering.
    Model
    Text-to-Image
    1mo
    Items per page
    of 3 pages
    Qwen
    Downloadable

    qwen-image-edit

    Qwen-Image-Edit is an image editing model with multilingual text editing and strong subject consistency.
    Model
    Text-to-Image
    1mo

    Use to promote overlay files and built artifacts into the staged BSP image. Do NOT use to flash or build. Triggers: promote bsp image.
    Skill
    Developer
    12
    1d

    Use to flash a promoted BSP image to a Jetson DUT in RCM mode via flash.sh or l4t_initrd_flash.sh. Do NOT use for BSP customization, image promotion, or carrier derivation.
    Skill
    Developer
    12
    1d

    Two-step image grounding pipeline: extracts referring expressions from (image, caption) pairs and grounds them to pixel-space bounding boxes via a VLM. Use when the user wants to ground captions to bboxes, generate phrase-grounded annotations, auto-label
    Skill
    Developer
    456
    10d

    Use after jetson-flash-image to run static BSP checks, on-target smoke/regression tests on a flashed DUT, or both. Not for build or flash steps. Triggers: validate bsp, on-target validation.
    Skill
    Developer
    12
    1d

    PyTorch-based TAO image classification. Supports a wide range of backbones (FAN, EfficientNet, ResNet, etc.) with distillation and quantization for deployment. Use when training, evaluating, distilling, quantizing, exporting, or running inference for a TA
    Skill
    Developer
    457
    10d

    Extract Jetson Linux + sample-rootfs tarballs and run apply_binaries.sh for the active target, then record bsp_image in the profile. Use after jetson-init-target; not for source-tree setup.
    Skill
    Developer
    12
    1d

    Use when the user wants to orchestrate defect image generation, run associated setup, or handle outputs on OSMO. The Day 0 path handles cold-start with USD-to-ROI, image-edit augmentation, and AnomalyGen to create initial PCBA datasets. The Day 1 path per
    Skill
    Developer
    684
    23d
    DGX Station
    45 MIN

    Image & Video Generation with ComfyUI

    Generate images and videos with FLUX, Wan 2.1, HunyuanVideo, and Cosmos on DGX Station
    Playbook
    Image Generation
    28d

    Runs the DEFT embed-then-mine workflow for VCN AOI iterations — embeds the gap-analysis target parquet, embeds a source pool, and mines nearest-neighbour source images for downstream augmentation. Use as the immediate next step after `tao-route-visual-cha
    Skill
    Developer
    455
    10d
    DGX Spark
    1 HR

    FLUX.1 Dreambooth LoRA Fine-tuning

    Fine-tune FLUX.1-dev 12B model using Dreambooth LoRA for custom image generation
    Playbook
    Image Generation
    8mo
    Meta
    DownloadableFree Endpoint

    llama-3.2-90b-vision-instruct

    Cutting-edge vision-Language model exceling in high-quality reasoning from images.
    Model
    Image-Text Retrieval
    2.69M
    1y
    Qwen
    DownloadableFree Endpoint

    qwen3.5-397b-a17b

    Next-gen Qwen 3.5 VLM (400B MoE) brings advanced vision, chat, RAG, and agentic capabilities.
    Model
    MoE
    13.15M
    4mo
    Meta
    DownloadableFree Endpoint

    llama-3.2-11b-vision-instruct

    Cutting-edge vision-language model exceling in high-quality reasoning from images.
    Model
    Image-Text Retrieval
    1.67M
    1y
    NVIDIA
    DownloadableFree Endpoint

    llama-3.1-nemotron-nano-vl-8b-v1

    Multi-modal vision-language model that understands text/img and creates informative responses
    Model
    doc intelligence
    10.15M
    11mo
    Mistral AI
    DownloadableFree Endpoint

    mistral-small-4-119b-2603

    Hybrid MoE model unifying instruct, reasoning, and coding with multimodal input and 256k context
    Model
    code generation
    12.52M
    3mo
    NVIDIA
    Free Endpoint

    cosmos3-nano

    Generates physics-aware videos from text prompts or an image prompt for physical AI development.
    Model
    autonomous vehicles
    1.79K
    22d
    DGX Spark
    1 HR

    Vision-Language Model Fine-tuning

    Fine-tune Vision-Language Models for image and video understanding tasks using Qwen2.5-VL and InternVL3
    Playbook
    DGX
    8mo
    Black-forest-labs
    Downloadable

    flux.2-klein-4b

    FLUX.2-klein-4B is a distilled image generation and editing model, producing outputs at lighting speed
    Model
    image editing
    271K
    3mo
    Microsoft
    Downloadable

    TRELLIS

    MSFT TRELLIS is a 3D AI model that generates high-quality 3D assets from text or image inputs.
    Model
    text-to-3d
    3.65K
    9mo
    Google
    Free Endpoint

    paligemma

    Vision language model adept at comprehending text and visual inputs to produce informative responses
    Model
    image
    10.22K
    1y
    NVIDIA
    Downloadable

    vista-3d

    VISTA-3D is a specialized interactive foundation model for segmenting and anotating human anatomies.
    Model
    Interactive Annotation
    824
    1y
    Qwen
    DownloadableFree Endpoint

    qwen3.5-122b-a10b

    122B MoE LLM (10B active) for coding, reasoning, multimodal chat. Agent-ready.
    Model
    tool calling
    10.33M
    3mo