baidu/paddleocr

PREVIEW

Model for table extraction that receives an image as input, runs OCR on the image, and returns the text within the image and its bounding boxes.

Model Overview

Description

PaddleOCR is an ultra-lightweight Optical Character Recognition (OCR) system developed by Baidu. It supports a variety of cutting-edge OCR algorithms and provides value at every stage of the AI pipeline, including data generation, model training, and inference.

This model is ready for commercial use.

Third-Party Community Consideration

This model is not owned or developed by NVIDIA. This model has been developed and built to a third-party’s requirements for this application and use case; see link to Non-NVIDIA PaddleOCR Toolkit.

Terms of use

PaddleOCR is licensed under Apache-2. You are responsible for ensuring that your use of models complies with all applicable laws.

References

Github Arxiv

Model Architecture

Architecture Type for Text Detector: CNN
Network Architecture for Text Detector: LK-PAN

Architecture Type for Text Recognition: Hybrid Transformer CNN
Network Architecture for Text Recognition: SVTR-LCNet (NRTR Head and CTCLoss head)

Input

Input Type(s): Image
Input Format(s): Red, Green, Blue (RGB)
Input Parameters: Two Dimensional (2D)
Other Properties Related to Input: nd array, or batch of nd arrays are passed in with shape [Batch, Channel, Width, Height]. PaddleOCR does some internal thresholding, but none was implemented from our side.

Output

Output Type(s): Text
Output Format: String
Output Parameters: 1D
Other Properties Related to Output: Batch of text strings.

Supported Hardware Microarchitecture Compatibility: NVIDIA Ampere, NVIDIA Hopper, NVIDIA Lovelace

Supported Operating System(s):

  • Linux

Model Version(s):

  • baidu/paddleocr

Training Dataset:

Link:

Text detection datasets include LSVT (Sun et al. 2019), RCTW-17 (Shiet al. 2017), MTWI 2018 (He and Yang 2018), CASIA-10K (He et al. 2018), SROIE (Huang et al. 2019), MLT 2019 (Nayef et al. 2019), BDI (Karatzas et al. 2011), MSRATD500 (Yao et al. 2012) and CCPD 2019 (Xu et al. 2018).

These are two of the datasets (among others) which are used for text recognition: OpenImages
InvoiceDatasets

Data Collection Method by dataset: Unknown
Labeling Method by dataset Unknown

Text Detection: 127k training images (68K real scene images from Baidu image search and public datasets and 59K synthetic images)

Text Recognition: 18.5M training images (7M real scene images from Baidu image search and public datasets and 11.5M synthetic images)

Inference:

Engine: Tensor(RT)
Test Hardware: Tested on all supported hardware listed in compatibility section

Ethical Considerations:

NVIDIA believes Trustworthy AI is a shared responsibility and we have established policies and practices to enable development for a wide array of AI applications. When downloaded or used in accordance with our terms of service, developers should work with their internal model team to ensure this model meets requirements for the relevant industry and use case and addresses unforeseen product misuse.

Please report security vulnerabilities or NVIDIA AI Concerns here.