Intelligent Document Processing Demo
A PDE engine is a library developed by DataArt that aggregates years of knowledge about data extraction from PDF documents.
The core functionality is recognizing the structure of the documents, finding tables inside, and converting PDFs from plain text documents, as OCR tools usually do, to the structured JSON format ready for further processing by existing financial and insurance systems, CRMs, ERPs and so on.
The PDE engine plugs into the existing document’s processing pipeline, finding and extracting tables automatically and hierarchically, so that we know where column headers are located and what each row represents. It also extracts other blocks of texts, recognizing headers and titles.