A document understanding API
Cross-referenced across 55 tracked directories
#3900
Popularity Rank
1 / 55
Listed In
Emerging
Adoption Stage
3/13/2026
First Seen
Recently added to the ecosystem
Christoph Auer <[email protected]>, Michele Dolfi <[email protected]>, Maxim Lysak <[email protected]>, Nikos Livathinos <[email protected]>, Ahmed Nassar <[email protected]>, Panos Vagenas <[email protected]>, Peter Staar <[email protected]>
SDK and CLI for parsing PDF, DOCX, HTML, and more, to a unified document representation for powering downstream workflows such as gen AI applications.
...moreThe official repo for “Dolphin: Document Image Parsing via Heterogeneous Anchor Prompting”, ACL, 2025.
Generate consolidated text files from websites for LLM training and inference – Powered by Firecrawl
"Outclassing Frontier LLMs in Information Extraction"