A high-quality tool for convert PDF to Markdown and JSON
Cross-referenced across 55 tracked directories
#3807
Popularity Rank
1 / 55
Listed In
Emerging
Adoption Stage
2/29/2024
Created
56,572
GitHub Stars
Score: 100/100
0 dependency vulnerabilities found
Run an AI-powered security scan to analyze this package's source code for vulnerabilities, prompt injection vectors, data exfiltration risks, and behavior mismatches.
Scans fetch actual source code from the GitHub repository, not just the README.
Christoph Auer <cau@zurich.ibm.com>, Michele Dolfi <dol@zurich.ibm.com>, Maxim Lysak <mly@zurich.ibm.com>, Nikos Livathinos <nli@zurich.ibm.com>, Ahmed Nassar <ahn@zurich.ibm.com>, Panos Vagenas <pva@zurich.ibm.com>, Peter Staar <taa@zurich.ibm.com>
SDK and CLI for parsing PDF, DOCX, HTML, and more, to a unified document representation for powering downstream workflows such as gen AI applications.
...moreThe official repo for “Dolphin: Document Image Parsing via Heterogeneous Anchor Prompting”, ACL, 2025.
Generate consolidated text files from websites for LLM training and inference – Powered by Firecrawl
"Outclassing Frontier LLMs in Information Extraction"
4,689
Forks
196
Open Issues
3/19/2026
Last Commit
Recently added to the ecosystem