Data Extraction
677AI tools in the Data Extraction category
open-web-unlocker
GitHub Actions
Fetch public web pages through a configurable fetch/browser pipeline and parse them into structured JSON or clean markdown.
...more@bigknoxy/exa-cli
bigknoxy
CLI wrapper for Exa MCP tools - search, crawl, and research from the command line
machinepack-http
eashaw
Send HTTP requests, scrape webpages, and stream data in your JavaScript/Node.js/Sails.js app with a simple, `jQuery.get()`-like interface for sending HTTP requests and processing server responses.
...morechrome-automation-mcp
jackzhao98
MCP server for browser automation with custom scripts
scrappey-wrapper
dormic97
Official Node.js wrapper for the Scrappey web scraping API. Bypass Cloudflare, Datadome, PerimeterX, and other antibot protections. Solve captchas automatically.
...moredom-parser
ershov-konst
Fast dom parser based on regexps
@mdream/nuxt
GitHub Actions
Nuxt module for converting HTML pages to Markdown using mdream
the-a11y-machine
hywan
The A11y Machine is an automated accessibility testing tool which crawls and tests all pages of any website.
better-browse
mylesiyabor
Zero-dependency browser automation via Chrome DevTools Protocol with ARIA accessibility snapshots — 10-100x cheaper than vision-based approaches
...more@cd39390/mcp-web-crawler
cd39390
An MCP server plugin to crawl all hyperlinks from a website for AI learning purposes.
ayakashi
zisismaras
The next generation web scraping framework
opensteer
timjang3
Open-source browser automation SDK and CLI that lets AI agents build complex scrapers directly in your codebase.
proxys-site
urready
The official open-source codebase for Proxys.Site - A comprehensive proxy comparison tool and list.
@obscrd/robots
larsmosr
AI crawler blocking — generate robots.txt, meta tags, and HTTP headers for 30+ AI bots
devbridge-styleguide
devbproto
Styleguide automatization tool.
@hyperbrowser/agent
leoscope
Hyperbrowsers Web Agent
@askjo/camofox-browser
askjo
Headless browser automation server and OpenClaw plugin for AI agents - anti-detection, element refs, and session isolation
...morefacebook-marketplace-cli
lotrez
CLI tool for Facebook Marketplace and Messenger automation
online-audit
omc345
MCP server for auditing a person's public online presence — Google search, GitHub, Reddit, web scraping
@thinkbrowse/cli
derivativelabs
CLI for controlling browsers via ThinkBrowse cloud and local infrastructure