Document Processing
2858AI tools in the Document Processing category
@lov3kaizen/agentsea-ingest
lov3kaizen
Comprehensive document processing pipeline for Node.js - PDF, DOCX, HTML, Markdown parsing with intelligent chunking, table/image extraction, and OCR
...moreexine
nicolumma
Universal Markdown extraction engine (CLI)
@saemhco/nestjs-html-pdf
saemhco
A NestJS module to generate PDF files from HTML
@cherrystudio/mac-system-ocr
dejeune
Node.js N-API native module for MacOS Vision Framework OCR
pdfjs-dist-dj
miraclesol
Generic build of Mozilla's PDF.js library.
paperflow-mcp
davidson11
MCP server that lets Claude process PDFs through a self-hosted PaperFlow backend with smart parser selection, token-saving summaries, and structured Markdown output.
...morepageindex-ts
tandava0060
LLM-agnostic document indexing for js/ts - bring your own LLM and text
@frasma/extractify
frasma
Functional utilities to extract, transform and flow your data
cordova-plugin-scanbot-sdk
scanbot
Cordova Plugin for the Scanbot Document and Barcode Scanner SDK
suitest-js-api
GitHub Actions
Suitest is a test automation and device manipulation tool for living room devices and web browsers.
node-tesseract-ocr
zapolnoch
A Node.js wrapper for the Tesseract OCR API
ddddocr-node
GitHub Actions
The JS version of DdddOcr
documentation-hub
diatech
A modern document processing and session management desktop application
md-to-pdf
simonhaenisch
CLI tool for converting Markdown files to PDF.
browse-the-web
asayman
AI Browser Automation API - Control Headless Chrome via RESTful HTTP endpoints. Perfect for web scraping, RPA, automated testing, and AI agent integration with 70+ endpoints including screenshots, PDF generation, network monitoring, and more.
...morenode-ts-ocr
nicolaspearson
A simple wrapper around command-line utils to assist in PDF / Image OCR (Optical Character Recognition) processing using Tesseract.
...moreundms
xcvzmoon
Text and Metadata Extraction Library for Document Files with Text Similarity Comparison
webdriver-image-comparison
wdio-user
An image compare module that can be used for different NodeJS Test automation frameworks that support the webdriver protocol
...moremakepdf
jcormont
Opinionated Markdown-to-PDF converter
textract
dbashford
Extracting text from files of various type including html, pdf, doc, docx, xls, xlsx, csv, pptx, png, jpg, gif, rtf, text/*, and various open office.
...more