Data Extraction
682AI tools in the Data Extraction category
ghost-puppet
klxy
Undetectable browser automation library with Cloudflare bypass and bot detection evasion. Stealth automation for web scraping, automated testing, and undetected browser automation. Built on Chrome CDP.
...moreinstagram-scraping
rzlyp
NPM module for loading media by hashtag without instagram API
clawpage-mcp
clawpage
MCP server for ClawPage web extraction API. Extract and structure any web page into clean JSON.
deepspider
pony-ma
智能爬虫工程平台 - 基于 DeepAgents + Patchright 的 AI 爬虫 Agent
@pinkpixel/web-scout-mcp
sizzlebop
MCP server for web search and content extraction with multiple URL support and memory optimizations
webcrawlerapi-js
niiotyo
JS client for WebcrawlerAPI
chrome-automation-mcp
jackzhao98
MCP server for browser automation with custom scripts
@activepieces/piece-browserless
abdul_activepiecer
Browserless is a cloud-based browser automation platform that allows you to run full Chrome sessions remotely for tasks like taking screenshots, scraping data, converting pages to PDFs, and more without writing scraping code or managing servers.
...moreag-webscrape
GitHub Actions
TypeScript web scraper with Playwright fallback for anti-scraping protection
scrappey-wrapper
dormic97
Official Node.js wrapper for the Scrappey web scraping API. Bypass Cloudflare, Datadome, PerimeterX, and other antibot protections. Solve captchas automatically.
...moreoxylabs-ai-studio
oxybrain
JavaScript SDK for Oxylabs AI Studio API services
@browserbasehq/convex-stagehand
GitHub Actions
Convex component for AI-powered browser automation with Stagehand
web-content-extract
amoyensis
A library and command-line tool to extract clean content from web pages using Mozilla Readability and convert it to Markdown or JSON.
...more@activepieces/piece-scrapegrapghai
abdul_activepiecer
## Description ScrapeGraphAI is a powerful web scraping and content extraction API. This piece enables integration with ScrapeGraphAI's API to perform smart scraping, local scraping, and markdown conversion.
...more@brightdata/ai-sdk
brd-cholpon
Bright Data tools for Vercel AI SDK - scrape, search, and dataset collection
omnifetch-lib
visy_ani
Universal content extraction library with tiered fetching strategies
@dominusnode/openclaw-plugin
0xcircuitbreaker
Dominus Node proxy plugin for OpenClaw — route web requests through rotating proxy networks
@keak/webmcp-core
eamonnkeak
Auto-generate WebMCP tool definitions from any website
raggle-js
raggle_npm
JavaScript client for Raggle API
arcfetch
briansunter
Fetch URLs, extract clean article content, and cache as markdown. Supports automatic JavaScript rendering via Playwright.
...more