Skip to main content
NVIDIA
Explore
Models
Skills
Blueprints
GPUs
Docs
Terms of Use
Privacy Policy
Your Privacy Choices
Contact

Copyright © 2026 NVIDIA Corporation

67 results for

Filters

  • Free Endpoint
    15
  • Partner Endpoint
    15
  • Download Available
    27
  • Developer Example
    1
  • Enterprise Blueprint
    1
  • Launchable
    1
  • Image Generation
    7
  • Image-to-Text
    7
  • Text-to-Image
    7
  • Optical Character Recognition
    3
  • Synthetic Data Generation
    3
  • Deepinfra
    12
  • Together AI
    7
  • OpenRouter
    6
  • Bitdeer
    3
  • GMI Cloud
    3
  • NVIDIA
    48
  • Black forest labs
    4
  • Qwen
    4
  • Google
    3
  • Meta
    2
  • Developer
    23
  • AI Engineer
    22
  • Ml Engineer
    21
  • Application Developer
    15
  • Data Scientist
    8
  • NVIDIA AI
    1
  • NVIDIA Isaac GR00T
    1
  • NVIDIA Omniverse
    1
  • AI And Machine Learning
    20
  • Physical AI
    5
  • Developer Tools
    2
  • Infrastructure
    1
  • B200
    2
  • A100 PG509 200
    1
  • A100 SXM4 80GB
    1
  • A10G
    1
  • GB200
    1
  • TAO Toolkit
    19
  • Jetson
    7
  • NeMo Retriever
    1
  • Physical AI Dataset
    1
  • Qwen
    Downloadable

    qwen-image

    Qwen-Image is a text-to-image foundation model with advanced multilingual text rendering.
    Model
    Text-to-Image
    2mo
    Items per page
    of 3 pages
    Qwen
    Downloadable

    qwen-image-edit

    Qwen-Image-Edit is an image editing model with multilingual text editing and strong subject consistency.
    Model
    Text-to-Image
    2mo

    Use to promote overlay files and built artifacts into the staged BSP image. Do NOT use to flash or build. Triggers: promote bsp image.
    Skill
    Developer
    284
    10d

    Use to flash a promoted BSP image to a Jetson DUT in RCM mode via flash.sh or l4t_initrd_flash.sh. Do NOT use for BSP customization, image promotion, or carrier derivation.
    Skill
    Developer
    290
    10d

    Two-step image grounding pipeline: extracts referring expressions from (image, caption) pairs and grounds them to pixel-space bounding boxes via a VLM. Use when the user wants to ground captions to bboxes, generate phrase-grounded annotations, auto-label
    Skill
    Developer
    735
    19d

    Use after jetson-flash-image to run static BSP checks, on-target smoke/regression tests on a flashed DUT, or both. Not for build or flash steps. Triggers: validate bsp, on-target validation.
    Skill
    Developer
    284
    10d

    PyTorch-based TAO image classification. Supports a wide range of backbones (FAN, EfficientNet, ResNet, etc.) with distillation and quantization for deployment. Use when training, evaluating, distilling, quantizing, exporting, or running inference for a TA
    Skill
    Developer
    690
    19d

    Extract Jetson Linux + sample-rootfs tarballs and run apply_binaries.sh for the active target, then record bsp_image in the profile. Use after jetson-init-target; not for source-tree setup.
    Skill
    Developer
    290
    10d

    Use when the user wants to orchestrate defect image generation with NVIDIA Cosmos AnomalyGen (Cosmos-Predict2-derived) on OSMO for PCBA, metal surface, and glass inspection. The Day 0 path handles cold-start with USD-to-ROI, image-edit augmentation, and A
    Skill
    Developer
    984
    1mo
    DGX Station
    45 MIN

    Image & Video Generation with ComfyUI

    Generate images and videos with FLUX, Wan 2.1, HunyuanVideo, and Cosmos on DGX Station
    Playbook
    Image Generation
    1mo

    Runs the DEFT embed-then-mine workflow for VCN AOI iterations — embeds the gap-analysis target parquet, embeds a source pool, and mines nearest-neighbour source images for downstream augmentation. Use as the immediate next step after `tao-route-visual-cha
    Skill
    Developer
    732
    19d
    DGX Spark
    1 HR

    FLUX.1 Dreambooth LoRA Fine-tuning

    Fine-tune FLUX.1-dev 12B model using Dreambooth LoRA for custom image generation
    Playbook
    Image Generation
    8mo
    Qwen
    DownloadableFree Endpoint

    qwen3.5-397b-a17b

    Next-gen Qwen 3.5 VLM (400B MoE) brings advanced vision, chat, RAG, and agentic capabilities.
    Model
    MoE
    13M
    4mo
    Meta
    DownloadableFree Endpoint

    llama-3.2-11b-vision-instruct

    Cutting-edge vision-language model exceling in high-quality reasoning from images.
    Model
    Image-Text Retrieval
    2M
    1y
    Meta
    DownloadableFree Endpoint

    llama-3.2-90b-vision-instruct

    Cutting-edge vision-Language model exceling in high-quality reasoning from images.
    Model
    Image-Text Retrieval
    3M
    1y
    NVIDIA
    DownloadableFree Endpoint

    llama-3.1-nemotron-nano-vl-8b-v1

    Multi-modal vision-language model that understands text/img and creates informative responses
    Model
    doc intelligence
    10M
    1y
    Mistral AI
    DownloadableFree Endpoint

    mistral-small-4-119b-2603

    Hybrid MoE model unifying instruct, reasoning, and coding with multimodal input and 256k context
    Model
    code generation
    13M
    3mo
    NVIDIA
    Free Endpoint

    cosmos3-nano

    Generates physics-aware videos from text prompts or an image prompt for physical AI development.
    Model
    autonomous vehicles
    2K
    1mo
    DGX Spark
    1 HR

    Vision-Language Model Fine-tuning

    Fine-tune Vision-Language Models for image and video understanding tasks using Qwen2.5-VL and InternVL3
    Playbook
    DGX
    8mo
    Black-forest-labs
    Downloadable

    flux.2-klein-4b

    FLUX.2-klein-4B is a distilled image generation and editing model, producing outputs at lighting speed
    Model
    image editing
    271K
    3mo
    Microsoft
    Downloadable

    TRELLIS

    MSFT TRELLIS is a 3D AI model that generates high-quality 3D assets from text or image inputs.
    Model
    text-to-3d
    4K
    10mo
    Google
    Free Endpoint

    paligemma

    Vision language model adept at comprehending text and visual inputs to produce informative responses
    Model
    image
    10K
    1y
    NVIDIA
    Downloadable

    vista-3d

    VISTA-3D is a specialized interactive foundation model for segmenting and anotating human anatomies.
    Model
    Interactive Annotation
    824
    1y
    Qwen
    DownloadableFree Endpoint

    qwen3.5-122b-a10b

    122B MoE LLM (10B active) for coding, reasoning, multimodal chat. Agent-ready.
    Model
    tool calling
    10M
    3mo