document (PDF, Word, PPTX ...) extraction and parse API using OCRs + Ollama supported models. Anonymize documents. Remove PII. Convert any document or picture to structured JSON or Markdown
Cross-referenced across 55 tracked directories
#3774
Popularity Rank
1 / 55
Listed In
Emerging
Adoption Stage
10/23/2024
Created
3,026
GitHub Stars
Score: 100/100
0 dependency vulnerabilities found
Run an AI-powered security scan to analyze this package's source code for vulnerabilities, prompt injection vectors, data exfiltration risks, and behavior mismatches.
Scans fetch actual source code from the GitHub repository, not just the README.
"Outclassing Frontier LLMs in Information Extraction"
A document understanding API
The official repo for “Dolphin: Document Image Parsing via Heterogeneous Anchor Prompting”, ACL, 2025.
Generate consolidated text files from websites for LLM training and inference – Powered by Firecrawl
254
Forks
47
Open Issues
12/8/2025
Last Commit
Recently added to the ecosystem