## Model Overview ### Description PaddleOCR is an ultra-lightweight Optical Character Recognition (OCR) system developed by Baidu. It supports a variety of cutting-edge OCR algorithms and provides value at every stage of the AI pipeline, including data generation, model training, and inference. This model is ready for commercial use. ## Third-Party Community Consideration This model is not owned or developed by NVIDIA. This model has been developed and built to a third-party’s requirements for this application and use case; see link to Non-NVIDIA [PaddleOCR Toolkit](https://github.com/PaddlePaddle/PaddleOCR). ### Terms of use PaddleOCR is licensed under [Apache-2](https://www.apache.org/licenses/LICENSE-2.0). **You are responsible for ensuring that your use of models complies with all applicable laws.** ### References [Github](https://github.com/PaddlePaddle/PaddleOCR/blob/main/README_en.md) [Arxiv](https://arxiv.org/abs/2206.03001) ## Model Architecture **Architecture Type for Text Detector:** CNN
**Network Architecture for Text Detector:** LK-PAN **Architecture Type for Text Recognition:** Hybrid Transformer CNN
**Network Architecture for Text Recognition:** SVTR-LCNet (NRTR Head and CTCLoss head)
## Input **Input Type(s):** Image
**Input Format(s):** Red, Green, Blue (RGB)
**Input Parameters:** Two Dimensional (2D)
**Other Properties Related to Input:** nd array, or batch of nd arrays are passed in with shape [Batch, Channel, Width, Height]. PaddleOCR does some internal thresholding, but none was implemented from our side.
## Output **Output Type(s):** Text
**Output Format:** String
**Output Parameters:** 1D
**Other Properties Related to Output:** Batch of text strings.
**Supported Hardware Microarchitecture Compatibility:** NVIDIA Ampere, NVIDIA Hopper, NVIDIA Lovelace
## Supported Operating System(s): * Linux
## Model Version(s): * baidu/paddleocr
## Training Dataset: **Link:**
Text detection datasets include LSVT (Sun et al. 2019), RCTW-17 (Shiet al. 2017), MTWI 2018 (He and Yang 2018), CASIA-10K (He et al. 2018), SROIE (Huang et al. 2019), MLT 2019 (Nayef et al. 2019), BDI (Karatzas et al. 2011), MSRATD500 (Yao et al. 2012) and CCPD 2019 (Xu et al. 2018). These are two of the datasets (among others) which are used for text recognition: [OpenImages](https://github.com/openimages/dataset)
[InvoiceDatasets](https://github.com/FuxiJia/InvoiceDatasets) **Data Collection Method by dataset:** Unknown
**Labeling Method by dataset** Unknown
Text Detection: 127k training images (68K real scene images from Baidu image search and public datasets and 59K synthetic images) Text Recognition: 18.5M training images (7M real scene images from Baidu image search and public datasets and 11.5M synthetic images) ## Inference: **Engine:** Tensor(RT)
**Test Hardware:** Tested on all supported hardware listed in compatibility section
## Ethical Considerations: NVIDIA believes Trustworthy AI is a shared responsibility and we have established policies and practices to enable development for a wide array of AI applications. When downloaded or used in accordance with our terms of service, developers should work with their internal model team to ensure this model meets requirements for the relevant industry and use case and addresses unforeseen product misuse. Please report security vulnerabilities or NVIDIA AI Concerns [here](https://www.nvidia.com/en-us/support/submit-security-vulnerability/).