Generate consolidated text files from websites for LLM training and inference – Powered by Firecrawl
Cross-referenced across 55 tracked directories
#3501
Popularity Rank
1 / 55
Listed In
Emerging
Adoption Stage
3/13/2026
First Seen
Recently added to the ecosystem
Christoph Auer <cau@zurich.ibm.com>, Michele Dolfi <dol@zurich.ibm.com>, Maxim Lysak <mly@zurich.ibm.com>, Nikos Livathinos <nli@zurich.ibm.com>, Ahmed Nassar <ahn@zurich.ibm.com>, Panos Vagenas <pva@zurich.ibm.com>, Peter Staar <taa@zurich.ibm.com>
SDK and CLI for parsing PDF, DOCX, HTML, and more, to a unified document representation for powering downstream workflows such as gen AI applications.
...moreThe official repo for “Dolphin: Document Image Parsing via Heterogeneous Anchor Prompting”, ACL, 2025.
A document understanding API
"Outclassing Frontier LLMs in Information Extraction"