Explore
Models
Blueprints
GPUs
Docs
⌘K
Ctrl+K
?
Login
30 results for
Filters
Models (25)
Blueprints (2)
Other (3)
Sort By
score:DESC
Best Match
Qwen
Downloadable
qwen-image
Qwen-Image is a text-to-image foundation model with advanced multilingual text rendering.
Model
Text-to-Image
+1
2w
Items per page
24
1
1
2
2
of 2 pages
Qwen
Downloadable
qwen-image-edit
Qwen-Image-Edit is an image editing model with multilingual text editing and strong subject consistency.
Model
Text-to-Image
+1
2w
DGX Spark
1 HR
FLUX.1 Dreambooth LoRA Fine-tuning
Fine-tune FLUX.1-dev 12B model using Dreambooth LoRA for custom image generation
Playbook
Image Generation
+6
7mo
Meta
Downloadable
llama-3.2-90b-vision-instruct
Cutting-edge vision-Language model exceling in high-quality reasoning from images.
Model
Image-Text Retrieval
+4
2.08M
11mo
Qwen
Downloadable
qwen3.5-397b-a17b
Next-gen Qwen 3.5 VLM (400B MoE) brings advanced vision, chat, RAG, and agentic capabilities.
Model
MoE
+3
11.36M
3mo
Meta
Downloadable
llama-3.2-11b-vision-instruct
Cutting-edge vision-language model exceling in high-quality reasoning from images.
Model
Image-Text Retrieval
+4
1.5M
11mo
NVIDIA
Downloadable
llama-3.1-nemotron-nano-vl-8b-v1
Multi-modal vision-language model that understands text/img and creates informative responses
Model
doc intelligence
+2
8.15M
10mo
DGX Spark
1 HR
Vision-Language Model Fine-tuning
Fine-tune Vision-Language Models for image and video understanding tasks using Qwen2.5-VL and InternVL3
Playbook
DGX
+6
7mo
Mistral AI
Downloadable
mistral-small-4-119b-2603
Hybrid MoE model unifying instruct, reasoning, and coding with multimodal input and 256k context
Model
code generation
+2
21.52M
2mo
Google
Free Endpoint
paligemma
Vision language model adept at comprehending text and visual inputs to produce informative responses
Model
image
+8
15.72K
1y
Qwen
Downloadable
qwen3.5-122b-a10b
122B MoE LLM (10B active) for coding, reasoning, multimodal chat. Agent-ready.
Model
tool calling
+3
11.26M
2mo
NVIDIA
Downloadable
vista-3d
VISTA-3D is a specialized interactive foundation model for segmenting and anotating human anatomies.
Model
Interactive Annotation
+3
1.15K
1y
Robotics
Enterprise
Synthetic Manipulation Motion Generation for Robotics
Generate exponentially large amounts of synthetic motion trajectories for robot manipulation from just a few human demonstrations.
Blueprint
synthetic data
+9
3mo
Moonshotai
Downloadable
kimi-k2.6
1T multimodal MoE for long-horizon coding, agentic tool use, and image/video understanding.
Model
Multimodal
+3
4.59M
2w
NVIDIA
Downloadable
nemoretriever-ocr
Powerful OCR model for fast, accurate real-world image text extraction, layout, and structure analysis.
Model
Table Extraction
+4
14.23K
10mo
NVIDIA
Downloadable
nemoretriever-ocr-v1
Powerful OCR model for fast, accurate real-world image text extraction, layout, and structure analysis.
Model
Table Extraction
+4
2.07M
9mo
NVIDIA
Downloadable
nemotron-ocr-v1
Powerful OCR model for fast, accurate real-world image text extraction, layout, and structure analysis.
Model
Table Extraction
+4
333K
2mo
Microsoft
Free Endpoint
phi-4-multimodal-instruct
Cutting-edge open multimodal model exceling in high-quality reasoning from image and audio inputs.
Model
Speech Recognition
+4
480K
12mo
Stability AI
Downloadable
stable-diffusion-3.5-large
Stable Diffusion 3.5 is a popular text-to-image generation model
Model
Text-to-Image
+1
9mo
Google
Free Endpoint
gemma-3n-e2b-it
An edge computing AI model which accepts text, audio and image input, ideal for resource-constrained environments
Model
language generation
+3
44.43M
10mo
Google
Free Endpoint
gemma-3n-e4b-it
An edge computing AI model which accepts text, audio and image input, ideal for resource-constrained environments
Model
language generation
+3
3.41M
10mo
NVIDIA
Downloadable
nemotron-nano-12b-v2-vl
Nemotron Nano 12B v2 VL enables multi-image and video understanding, along with visual Q&A and summarization capabilities.
Model
language generation
+3
2.8M
6mo
NVIDIA
Free Endpoint
cosmos-predict1-5b
Generates future frames of a physics-aware world state based on simply an image or short video prompt for physical AI development.
Model
Synthetic Data Generation
+4
874
1y
DGX Spark
45 MIN
Comfy UI
Install and use Comfy UI to generate images
Playbook
DGX
+1
7mo