Skip to main content
Explore
Models
Skills
Blueprints
GPUs
Docs
⌘K
Ctrl+K
?
Login
32 results for
Filters
Models (32)
Blueprints (2)
Skills (2)
Other (5)
Sort By
score:DESC
Best Match
Qwen
Downloadable
qwen-image
Qwen-Image is a text-to-image foundation model with advanced multilingual text rendering.
Model
Text-to-Image
+1
1mo
Items per page
24
1
1
2
2
of 2 pages
Qwen
Downloadable
qwen-image-edit
Qwen-Image-Edit is an image editing model with multilingual text editing and strong subject consistency.
Model
Text-to-Image
+1
1mo
Meta
Downloadable
Free Endpoint
llama-3.2-11b-vision-instruct
Cutting-edge vision-language model exceling in high-quality reasoning from images.
Model
Image-Text Retrieval
+4
1.71M
1y
Qwen
Downloadable
Free Endpoint
qwen3.5-397b-a17b
Next-gen Qwen 3.5 VLM (400B MoE) brings advanced vision, chat, RAG, and agentic capabilities.
Model
MoE
+3
12.46M
3mo
Meta
Downloadable
Free Endpoint
llama-3.2-90b-vision-instruct
Cutting-edge vision-Language model exceling in high-quality reasoning from images.
Model
Image-Text Retrieval
+4
2.8M
1y
NVIDIA
Free Endpoint
cosmos3-nano
Generates physics-aware videos from text prompts or an image prompt for physical AI development.
Model
autonomous vehicles
+5
1.58K
7d
Black-forest-labs
Downloadable
flux.2-klein-4b
FLUX.2-klein-4B is a distilled image generation and editing model, producing outputs at lighting speed
Model
image editing
+3
270K
2mo
Microsoft
Downloadable
TRELLIS
MSFT TRELLIS is a 3D AI model that generates high-quality 3D assets from text or image inputs.
Model
text-to-3d
+2
3.99K
9mo
NVIDIA
Downloadable
Free Endpoint
llama-3.1-nemotron-nano-vl-8b-v1
Multi-modal vision-language model that understands text/img and creates informative responses
Model
doc intelligence
+7
9.93M
11mo
Mistral AI
Downloadable
Free Endpoint
mistral-small-4-119b-2603
Hybrid MoE model unifying instruct, reasoning, and coding with multimodal input and 256k context
Model
code generation
+2
16.19M
2mo
Google
Free Endpoint
paligemma
Vision language model adept at comprehending text and visual inputs to produce informative responses
Model
image
+8
10.29K
1y
NVIDIA
Downloadable
vista-3d
VISTA-3D is a specialized interactive foundation model for segmenting and anotating human anatomies.
Model
Interactive Annotation
+3
803
1y
Qwen
Downloadable
Free Endpoint
qwen3.5-122b-a10b
122B MoE LLM (10B active) for coding, reasoning, multimodal chat. Agent-ready.
Model
B200
+5
10.34M
3mo
Baidu
Downloadable
paddleocr
Model for table extraction that receives an image as input, runs OCR on the image, and returns the text within the image and its bounding boxes.
Model
Optical Character Recognition
+14
415K
11mo
Black-forest-labs
Downloadable
FLUX.1-dev
FLUX.1 is a state-of-the-art suite of image generation models
Model
Text-to-Image
+2
257K
1y
Black-forest-labs
Downloadable
FLUX.1-Kontext-dev
FLUX.1 Kontext is a multimodal model that enables in-context image generation and editing.
Model
Text-to-Image
+2
3.63K
9mo
Black-forest-labs
Downloadable
FLUX.1-schnell
FLUX.1-schnell is a distilled image generation model, producing high quality images at fast speeds
Model
Text-to-Image
+2
265K
1y
Moonshotai
Downloadable
Free Endpoint
kimi-k2.6
1T multimodal MoE for long-horizon coding, agentic tool use, and image/video understanding.
Model
B200
+6
6.78M
1mo
NVIDIA
Downloadable
nemoretriever-ocr
Powerful OCR model for fast, accurate real-world image text extraction, layout, and structure analysis.
Model
Table Extraction
+4
8.97K
10mo
NVIDIA
Downloadable
nemotron-ocr-v1
Powerful OCR model for fast, accurate real-world image text extraction, layout, and structure analysis.
Model
Table Extraction
+4
351K
2mo
Microsoft
Free Endpoint
phi-4-multimodal-instruct
Cutting-edge open multimodal model exceling in high-quality reasoning from image and audio inputs.
Model
Speech Recognition
+4
269K
1y
Stability AI
Downloadable
stable-diffusion-3.5-large
Stable Diffusion 3.5 is a popular text-to-image generation model
Model
Text-to-Image
+1
9mo
Google
Free Endpoint
gemma-3n-e2b-it
An edge computing AI model which accepts text, audio and image input, ideal for resource-constrained environments
Model
language generation
+3
43.86M
10mo
Google
Free Endpoint
gemma-3n-e4b-it
An edge computing AI model which accepts text, audio and image input, ideal for resource-constrained environments
Model
language generation
+3
3.74M
10mo