
Model for object detection, fine-tuned to detect charts, tables, and titles in documents.
nemotron-table-structure-v1 is a specialized object detection model designed to identify and extract the structure of tables in images. Based on YOLOX (an anchor-free version of YOLO), it detects and localizes three key components within tables:
This specialized focus on table structure enables precise decomposition of complex tables into their constituent parts, forming the foundation for downstream retrieval tasks (for example converting tables into Markdown to improve retrieval accuracy).
This model is ready for commercial/non-commercial use.
GOVERNING TERMS: The trial service is governed by the NVIDIA API Trial Terms of Service. Use of this model is governed by the NVIDIA Open Model License Agreement.
You are responsible for ensuring that your use of NVIDIA provided models complies with all applicable laws.
Model Developer: NVIDIA
Global
This model specializes in analyzing images containing tables by:
It is designed to work in conjunction with OCR systems to preserve relationships between table elements and enable accurate extraction of tabular data from images.
Ideal for:
Build.NVIDIA.com 03/02/2026 via nemotron-table-structure-v1
References:
Architecture Type: YOLOX
Network Architecture: DarkNet53 Backbone + FPN decoupled head (one 1x1 convolution + 2 parallel 3x3 convolutions: one for classification and one for bounding box prediction)
Number of Model Parameters: 5.4e7
Input Resize: (1024, 1024)
Input Types: Image
Input Formats: RGB
Input Parameters: Two Dimensional (2D)
Other Input Properties: Image is resized to (1024, 1024).
Output Types: Structured detections (bounding boxes + class + confidence)
Output Format: JSON-compatible structure
Output Parameters: One Dimensional (1D)
Other Output Properties: Output classes include cell, row, and column. Thresholds used for non-maximum suppression: conf_thresh = 0.01; iou_thresh = 0.25.
Our AI models are designed and/or optimized to run on NVIDIA GPU-accelerated systems. By leveraging NVIDIA's hardware (e.g. GPU cores) and software frameworks (e.g., CUDA libraries), the model achieves faster training and inference times compared to CPU-only solutions.
Runtime Engines: TensorRT
Supported Hardware Microarchitecture Compatibility:
NVIDIA Ampere
NVIDIA Hopper
NVIDIA Lovelace
Operating Systems: Linux
The integration of foundation and fine-tuned models into AI systems requires additional testing using use-case-specific data to ensure safe and effective deployment. Following the V-model methodology, iterative testing and validation at both unit and system levels are essential to mitigate risks, meet technical and functional requirements, and ensure compliance with safety and ethical standards before deployment.
nemotron-table-structure-v1
Short Name: nemotron-table-structure-v1
Data Modality: Image
Training Data Collection: Automated
Training Labeling: Automated
Training Properties: Pretrained on COCO train2017 and fine-tuned on 23,977 images from the Digital Corpora dataset, with annotations from Azure AI Document Intelligence. Bounding boxes per class: 1,828,978 cells, 134,089 columns, and 316,901 rows. The layout model of Document Intelligence was used with 2024-02-29-preview API version.
Evaluation Data Collection: Hybrid (Automated, Human)
Evaluation Labeling: Hybrid (Automated, Human)
Evaluation Properties: The primary evaluation set is 2,459 Digital Corpora images with Azure labels. Bounding boxes per class: 200,840 cells, 13,670 columns, and 34,575 rows. mAP was used as an evaluation metric. In addition, we evaluated with Azure labels from manually selected pages, as well as manual inspection on public PDFs and PowerPoint slides.
Per-class Performance Metrics:
| Class | Average Precision (%) | Average Recall (%) |
|---|---|---|
| cell | 58.365 | 60.647 |
| row | 76.992 | 81.115 |
| column | 85.293 | 87.434 |
Acceleration Engine: TensorRT
Test Hardware: NVIDIA Hopper (H100 PCIe/SXM)
NVIDIA believes Trustworthy AI is a shared responsibility and we have established policies and practices to enable development for a wide array of AI applications. When downloaded or used in accordance with our terms of service, developers should work with their supporting model team to ensure this model meets requirements for the relevant industry and use case, and address unforeseen product misuse.
For more detailed information on ethical considerations for this model, please see the Model Card++ Explainability, Bias, Safety & Security, and Privacy Subcards.
Please report model quality, risk, security vulnerabilities or NVIDIA AI Concerns here.