![](/_next/image?url=https%3A%2F%2Fassets.ngc.nvidia.com%2Fproducts%2Fapi-catalog%2Fimages%2Fpaddleocr.jpg&w=3840&q=75)
baidu/paddleocr
Model for table extraction that receives an image as input, runs OCR on the image, and returns the text within the image and its bounding boxes.
Model Overview
Description
PaddleOCR is an ultra-lightweight Optical Character Recognition (OCR) system developed by Baidu. It supports a variety of cutting-edge OCR algorithms and provides value at every stage of the AI pipeline, including data generation, model training, and inference.
This model is ready for commercial use.
Third-Party Community Consideration
This model is not owned or developed by NVIDIA. This model has been developed and built to a third-party’s requirements for this application and use case; see link to Non-NVIDIA PaddleOCR Toolkit.
Terms of use
PaddleOCR is licensed under Apache-2. You are responsible for ensuring that your use of models complies with all applicable laws.
References
Model Architecture
Architecture Type for Text Detector: CNN
Network Architecture for Text Detector: LK-PAN
Architecture Type for Text Recognition: Hybrid Transformer CNN
Network Architecture for Text Recognition: SVTR-LCNet (NRTR Head and CTCLoss head)
Input
Input Type(s): Image
Input Format(s): Red, Green, Blue (RGB)
Input Parameters: Two Dimensional (2D)
Other Properties Related to Input: nd array, or batch of nd arrays are passed in with shape [Batch, Channel, Width, Height]. PaddleOCR does some internal thresholding, but none was implemented from our side.
Output
Output Type(s): Text
Output Format: String
Output Parameters: 1D
Other Properties Related to Output: Batch of text strings.
Supported Hardware Microarchitecture Compatibility: NVIDIA Ampere, NVIDIA Hopper, NVIDIA Lovelace
Supported Operating System(s):
- Linux
Model Version(s):
- baidu/paddleocr
Training Dataset:
Link:
Text detection datasets include LSVT (Sun et al. 2019), RCTW-17 (Shiet al. 2017), MTWI 2018 (He and Yang 2018), CASIA-10K (He et al. 2018), SROIE (Huang et al. 2019), MLT 2019 (Nayef et al. 2019), BDI (Karatzas et al. 2011), MSRATD500 (Yao et al. 2012) and CCPD 2019 (Xu et al. 2018).
These are two of the datasets (among others) which are used for text recognition:
OpenImages
InvoiceDatasets
Data Collection Method by dataset: Unknown
Labeling Method by dataset Unknown
Text Detection: 127k training images (68K real scene images from Baidu image search and public datasets and 59K synthetic images)
Text Recognition: 18.5M training images (7M real scene images from Baidu image search and public datasets and 11.5M synthetic images)
Inference:
Engine: Tensor(RT)
Test Hardware: Tested on all supported hardware listed in compatibility section
Ethical Considerations:
NVIDIA believes Trustworthy AI is a shared responsibility and we have established policies and practices to enable development for a wide array of AI applications. When downloaded or used in accordance with our terms of service, developers should work with their internal model team to ensure this model meets requirements for the relevant industry and use case and addresses unforeseen product misuse.
Please report security vulnerabilities or NVIDIA AI Concerns here.