Try NVIDIA NIM APIs

nemoretriever-ocr

Powerful OCR model for fast, accurate real-world image text extraction, layout, and structure analysis.

69.61K

7mo

nemoretriever-ocr-v1

Powerful OCR model for fast, accurate real-world image text extraction, layout, and structure analysis.

557K

7mo

llama-3.1-nemotron-nano-vl-8b-v1

Multi-modal vision-language model that understands text/img and creates informative responses

doc intelligence

7.93M

8mo

Baidu

paddleocr

Model for table extraction that receives an image as input, runs OCR on the image, and returns the text within the image and its bounding boxes.

189K

8mo

ocdrnet

OCDNet and OCRNet are pre-trained models designed for optical character detection and recognition respectively.

771

LaunchableEnterprise

Build a Video Search and Summarization (VSS) Agent

Ingest massive volumes of live or archived videos and extract insights for summarization and interactive Q&A

Blueprint

vision

cosmos-predict1-5b

Generates future frames of a physics-aware world state based on simply an image or short video prompt for physical AI development.

Synthetic Data Generation

27.35K

11mo

cosmos-reason2-8b

Vision language model that excels in understanding the physical world using structured reasoning on videos or images.

video understanding

205K

2mo

cosmos-transfer1-7b

Generates physics-aware video world states for physical AI development using text prompts and multiple spatial control inputs derived from real-world data or simulation.

Synthetic Data Generation

15.84K

8mo

cosmos-transfer2.5-2b

Generates physics-aware video world states for physical AI development using text prompts and multiple spatial control inputs derived from real-world data or simulation.

Synthetic Data Generation

Microsoft

TRELLIS

MSFT TRELLIS is a 3D AI model that generates high-quality 3D assets from text or image inputs.

text-to-3d

5.66K

6mo

usdsearch

AI-powered search for OpenUSD data, 3D models, images, and assets using text or image-based inputs.