Data Extraction
698AI tools in the Data Extraction category
@tavily/core
guyhartstein
Official JavaScript library for Tavily.
supapup
onepointfour-packs
⚡ Lightning-fast MCP browser dev tool. Navigate → Get instant structured data. No screenshots needed! Puppeteer: 📸 → CSS selectors → JS eval. Supapup: semantic IDs ready to use. 10x faster, 90% fewer tokens.
...more@mdream/nuxt
GitHub Actions
Nuxt module for converting HTML pages to Markdown using mdream
dom-parser
ershov-konst
Fast dom parser based on regexps
ts-web-scraper
chrisbreuer
A powerful web scraper for both static and client-side rendered sites using only Bun native APIs
@bigknoxy/exa-cli
bigknoxy
CLI wrapper for Exa MCP tools - search, crawl, and research from the command line
reviewbr-mcp
vic3m
MCP Server for Brazilian Academic Repositories (OAI-PMH, DSpace REST, HTML scraping) and PRISMA Systematic Reviews
machinepack-http
eashaw
Send HTTP requests, scrape webpages, and stream data in your JavaScript/Node.js/Sails.js app with a simple, `jQuery.get()`-like interface for sending HTTP requests and processing server responses.
...moreopen-web-unlocker
GitHub Actions
Fetch public web pages through a configurable fetch/browser pipeline and parse them into structured JSON or clean markdown.
...morebetter-browse
mylesiyabor
Zero-dependency browser automation via Chrome DevTools Protocol with ARIA accessibility snapshots — 10-100x cheaper than vision-based approaches
...moreayakashi
zisismaras
The next generation web scraping framework
the-a11y-machine
hywan
The A11y Machine is an automated accessibility testing tool which crawls and tests all pages of any website.
maxun-sdk
karishmashukla
Maxun Node SDK for web scraping and data extraction
proxys-site
urready
The official open-source codebase for Proxys.Site - A comprehensive proxy comparison tool and list.
@cd39390/mcp-web-crawler
cd39390
An MCP server plugin to crawl all hyperlinks from a website for AI learning purposes.
@teng-lin/agent-fetch
teng-lin
Full-content web fetcher with Chrome TLS fingerprinting and multi-strategy content extraction
@obscrd/robots
larsmosr
AI crawler blocking — generate robots.txt, meta tags, and HTTP headers for 30+ AI bots
facebook-marketplace-cli
lotrez
CLI tool for Facebook Marketplace and Messenger automation
devbridge-styleguide
devbproto
Styleguide automatization tool.
unfluff
ageitgey
A web page content extractor