NVIDIA
Explore
Models
Blueprints
GPUs
Docs
⌘KCtrl+K
Terms of Use
Privacy Policy
Your Privacy Choices
Contact

Copyright © 2026 NVIDIA Corporation

40 results for

Filters

  • Download Available
    24
  • API Endpoint
    16
  • Enterprise
    0
  • Launchable
    0
  • Image-to-Text
    8
  • Image Generation
    6
  • Text-to-Image
    5
  • Optical Character Recognition
    3
  • Synthetic Data Generation
    3
  • NVIDIA
    19
  • Google
    4
  • Black forest labs
    3
  • Microsoft
    3
  • Hive
    2
  • NVIDIA AI
    0
  • NVIDIA Isaac GR00T
    0
  • NVIDIA Omniverse
    0
  • Hive
    ai-generated-image-detection
    Robust image classification model for detecting and managing AI-generated content.
    Model
    image classification
    9.61K
    11mo
    Hive
    deepfake-image-detection
    Advanced AI model detects faces and identifies deep fake images.
    Model
    computer vision
    4.42K
    11mo
    Qwen
    qwen3.5-397b-a17b
    Next-gen Qwen 3.5 VLM (400B MoE) brings advanced vision, chat, RAG, and agentic capabilities.
    Model
    MoE
    4.66M
    2w
    NVIDIA
    maisi
    MAISI is a pre-trained volumetric (3D) CT Latent Diffusion Generative Model.
    Model
    Image Generation
    631
    11mo
    Meta
    llama-3.2-11b-vision-instruct
    Cutting-edge vision-language model exceling in high-quality reasoning from images.
    Model
    Image-Text Retrieval
    550K
    9mo
    Meta
    llama-3.2-90b-vision-instruct
    Cutting-edge vision-Language model exceling in high-quality reasoning from images.
    Model
    Image-Text Retrieval
    518K
    9mo
    NVIDIA
    visual-changenet
    Visual Changenet detects pixel-level change maps between two images and outputs a semantic change segmentation mask
    Model
    image
    592
    1y
    NVIDIA
    cosmos-nemotron-34b
    Multi-modal vision-language model that understands text/img/video and creates informative responses
    Model
    VLM
    6
    1y
    Mistral AI
    mistral-small-3.1-24b-instruct-2503
    Efficient multimodal model excelling at multilingual tasks, image understanding, and fast-responses
    Model
    language generation
    1.05M
    9mo
    NVIDIA
    llama-3.1-nemotron-nano-vl-8b-v1
    Multi-modal vision-language model that understands text/img and creates informative responses
    Model
    doc intelligence
    5.85M
    8mo
    NVIDIA
    nvclip
    NV-CLIP is a multimodal embeddings model for image and text.
    Model
    Computer vision
    37.28K
    8mo
    Microsoft
    TRELLIS
    MSFT TRELLIS is a 3D AI model that generates high-quality 3D assets from text or image inputs.
    Model
    text-to-3d
    5.12K
    6mo
    University at Buffalo
    cached
    Context-aware chart extraction that can detect 18 classes for chart basic elements, excluding plot elements.
    Model
    nemo retriever
    911
    1y
    Google
    paligemma
    Vision language model adept at comprehending text and visual inputs to produce informative responses
    Model
    image
    324K
    1y
    NVIDIA
    retail-object-detection
    EfficientDet-based object detection network to detect 100 specific retail objects from an input video.
    Model
    Object Detection
    778
    1y
    NVIDIA
    vista-3d
    VISTA-3D is a specialized interactive foundation model for segmenting and anotating human anatomies.
    Model
    Interactive Annotation
    770
    10mo
    NVIDIA
    ocdrnet
    OCDNet and OCRNet are pre-trained models designed for optical character detection and recognition respectively.
    Model
    Optical Character Recognition
    785
    1y
    Baidu
    paddleocr
    Model for table extraction that receives an image as input, runs OCR on the image, and returns the text within the image and its bounding boxes.
    Model
    Optical Character Recognition
    231K
    7mo
    Stability AI
    stable-diffusion-3-medium
    Advanced text-to-image model for generating high quality images
    Model
    Image Generation
    37.01K
    1y
    Black-forest-labs
    FLUX.1-dev
    FLUX.1 is a state-of-the-art suite of image generation models
    Model
    Image Generation
    70.36K
    8mo
    Black-forest-labs
    FLUX.1-Kontext-dev
    FLUX.1 Kontext is a multimodal model that enables in-context image generation and editing.
    Model
    Image Generation
    3.86K
    6mo
    Black-forest-labs
    FLUX.1-schnell
    FLUX.1-schnell is a distilled image generation model, producing high quality images at fast speeds
    Model
    Image Generation
    37.07K
    8mo
    Moonshotai
    kimi-k2.5
    1T multimodal MoE for high‑capacity video and image understanding with efficient inference.
    Model
    Multimodal
    18.5M
    1mo
    NVIDIA
    nemoretriever-ocr
    Powerful OCR model for fast, accurate real-world image text extraction, layout, and structure analysis.
    Model
    Optical Character Recognition
    67.87K
    7mo
    Items per page
    of 2 pages