university-at-buffalo/cached

PREVIEW

Context-aware chart extraction that can detect 18 classes for chart basic elements, excluding plot elements.

Model Overview

Description

CACHED (Context-Aware Chart Element Detection) is a state-of-the-art chart element detection model from University at Buffalo. It was published in Document Analysis and Recognition - ICDAR 2023 conference. The code is based on the MMDetection Framework.

CACHED is associated with PaddleOCR to perform Optical Character Recognition (OCR). PaddleOCR is an ultra lightweight OCR system by Baidu.

This model is ready for commercial use.

Third-Party Community Consideration

This model is not owned or developed by NVIDIA. This model has been developed and built to a third-party’s requirements for this application and use case; see links to Non-NVIDIA Context-Aware Chart Element Detection GitHub and PaddleOCR Toolkit.

Terms of use

CACHED is licensed under MIT. PaddleOCR is licensed under Apache-2.

You are responsible for ensuring that your use of models complies with all applicable laws.

References

Model Architecture

  • CACHED
    • Architecture Type: Region-Based Convolutional Neural Network (R-CNN)
    • Network Architecture: Cascade with local-global context fusion module
  • PaddleOCR
    • Architecture Type for Text Detector: CNN
    • Network Architecture for Text Detector: LK-PAN
    • Architecture Type for Text Recognition: Hybrid Transformer CNN
    • Network Architecture for Text Recognition: SVTR-LCNet (NRTR Head and CTCLoss head)

Input

Input Type(s): Image
Input Format(s): Red, Green, Blue (RGB)
Input Parameters: Two Dimensional (2D)
Other Properties Related to Input: Expected input is a nd array image of shape [Channel, Width, Height], or a nd array batch of image of shape [Batch, Channel, Width, Height].

Output

Output Type(s): Text associated to each of the following classes : "chart_title", "x_title", "y_title", "xlabel", "ylabel", "other", "legend_label", "legend_title", "mark_label", "value_label"
Output Format: Dict of String
Output Parameters: 1D
Other Properties Related to Output: None

Supported Hardware Microarchitecture Compatibility: NVIDIA Ampere, NVIDIA Hopper, NVIDIA Lovelace

Supported Operating System(s):

  • Linux

Model Version(s):

  • university-at-buffalo/cached

Training Dataset:

None of the models were trained by NVIDIA.

  • PubMed Central (PMC) Chart Dataset

    • Link: https://chartinfo.github.io/index_2022.html. No nSpect ID.
    • Data Collection Method: Automated, Human
    • Labeling Method: Human
    • Description: A real-world dataset collected from PubMed Central Documents and manually annotated, released in the ICPR 2022 CHART-Infographic competition. There are 5614 images for chart element detection, 4293 images for final plot detection and data extraction, and 22924 images for chart classification.
  • Text detection and recognition datasets

    • Link: Refer to https://github.com/PaddlePaddle/PaddleOCR. No nSpect ID.
    • Data Collection Method: Human, Synthetic, Unknown
    • Labeling Method: Human, Synthetic, Unknown
    • Description: PaddleOCR is trained on a collection of OCR datasets which includes: LSVT (Sun et al. 2019), RCTW-17 (Shiet al. 2017), MTWI 2018 (He and Yang 2018), CASIA-10K (He et al. 2018), SROIE (Huang et al. 2019), MLT 2019 (Nayef et al. 2019), BDI (Karatzas et al. 2011), MSRATD500 (Yao et al. 2012) and CCPD 2019 (Xu et al. 2018).

Inference:

Engine: Tensor(RT)
Test Hardware: Tested on all supported hardware listed in compatibility section

Ethical Considerations:

NVIDIA believes Trustworthy AI is a shared responsibility and we have established policies and practices to enable development for a wide array of AI applications. When downloaded or used in accordance with our terms of service, developers should work with their internal model team to ensure this model meets requirements for the relevant industry and use case and addresses unforeseen product misuse.

Please report security vulnerabilities or NVIDIA AI Concerns here.