Try NVIDIA NIM APIs

Downloadable

nemoretriever-ocr

Powerful OCR model for fast, accurate real-world image text extraction, layout, and structure analysis.

Table Extraction

11mo

Items per page

of 1 pages

Downloadable

nemotron-ocr-v1

Powerful OCR model for fast, accurate real-world image text extraction, layout, and structure analysis.

Table Extraction

341K

4mo

Downloadable

nemotron-ocr-v2

Nemotron OCR v2 is a state-of-the-art multilingual text recognition model designed for robust end-to-end optical character recognition (OCR) on complex real-world images.

Table Extraction

338K

23d

Baidu

Downloadable

paddleocr

Model for table extraction that receives an image as input, runs OCR on the image, and returns the text within the image and its bounding boxes.

Optical Character Recognition

201K

Downloadable

nemoretriever-parse

Cutting-edge vision-language model exceling in retrieving text and metadata from images.

optical character recognition

386K

Downloadable

nemotron-parse

Cutting-edge vision-language model exceling in retrieving text and metadata from images.

text and table extraction

8mo

Downloadable

nv-yolox-page-elements-v1

Model for object detection, fine-tuned to detect charts, tables, and titles in documents.

Object Detection

191

Download NVIDIA Jetson Linux BSP artifacts (BSP tarball, sample rootfs, public_sources, x-tools, guides) for the active target. Use for Auto Setup; not for extraction or profile edits.

Developer

846

25d

Use to call the VIOS REST API (sensor list, timelines, clip extraction, snapshots, add/delete sensors and streams). Not for VLM inference or search.

Developer

1mo

CLIP vision-language model for image-text retrieval, zero-shot classification, embedding extraction, ONNX export, and TensorRT deployment. Use when fine-tuning or training CLIP, running zero-shot classification, computing image embeddings, or deploying CL

AI Engineer

1mo

SegFormer for semantic segmentation. Lightweight transformer-based architecture with hierarchical feature extraction, efficient for real-time segmentation tasks. Use when training, evaluating, exporting, quantizing, or running inference for a TAO SegForme