The multimodal PDF data extraction workflow uses NVIDIA NeMoTM Retriever NIM microservices to unlock highly accurate insights from massive volumes of enterprise data.
With this enterprise-scale multimodal document retrieval workflow, developers can create digital humans, AI agents, or customer service chatbots that can quickly become experts on any area captured within their corpus of data.
The multimodal retrieval workflow is designed to enhance generative AI applications with RAG capabilities which can be connected to proprietary data–wherever it resides. Use this workflow to supercharge your RAG applications with unprecedented intelligence.
The following NIM is used by this blueprint:
NVIDIA NIMTM Agent Blueprints are customizable AI workflow examples that equip enterprise developers with NIM microservices, reference code, documentation, and a Helm chart for deployment.
This blueprint is built as a reference implementation for building a multimodal data ingest pipeline for retrieval-augmented generation (RAG). Included in this workflow are several key operational components representative of common activities one might see in a complex multimodal workflow. Three independent streams are processed:
Each extraction task takes advantage of one or more NIM components to identify (YOLOX), deconstruct (DePlot, CACHED), and recognize (PaddleOCR) text representations of PDF elements presented to the workflow, elevating PDF processing from text-only to a full multimodal workflow that extracts rich relations and context encoded in the charts and tables within the PDF.
Ingest and extract highly accurate insights contained in text, graphs, charts, and tables within massive volumes of PDF documents.