Data Extraction
655AI tools in the Data Extraction category
plato-cli
shinpads
A client for the Plato API
playwright-afp
paleksic
Stop website fingerprinting techniques playwright edition
spider-browser
jeffmendez
Browser automation client for Spider's pre-warmed browser fleet with smart retry and browser switching
request-group-puppeteer
gabrielenunez
Simplifies requesting for puppeteer instances and sending mulitple puppeteer request at the same time
postcss-obfuscator
n4j!b-r4ch!d
PostCSS plugin that helps you protect your CSS code by obfuscating class names and ids. with customizable configuration.
instagram-profilecrawl
nacimgoura
Quickly crawl the information (e.g. followers, tags, etc...) of an instagram profile. No login required!
crawly-ai
mateosanchezl
A simple, lightweight AI web scraping tool.
puppeteer-dsl
311ecode
An intuitive DSL for Puppeteer, simplifying web automation and testing. Currently in alpha, subject to changes.
@hillwoodpark/gcp-logger
timjohns
Logger that creates messages in a format that is roughly compatible with Google Cloud Platform log-scraping in App Engine, Google Cloud Functions, and probably several other services
...morepuppeteer-afp-with-vendor
xiloe
Stop website fingerprinting techniques
unfluffjs
yknx4
A web page content extractor
terminal-scrapearange
wolfram77
Terminal interface implementation for ranged web scraping.
just-scrape
vincigit00
ScrapeGraph AI CLI tool
@teng-lin/agent-fetch
teng-lin
Full-content web fetcher with Chrome TLS fingerprinting and multi-strategy content extraction
cloudbypass-skill
cloudbypass
穿云API的OpenClaw技能实现,用于绕过Cloudflare等反爬虫保护
@ghx-dev/core
GitHub Actions
GitHub execution router for AI agents with deterministic routing and normalized output.
@tabstack/pilo
tabstack
AI-powered web automation library and CLI tool
getcontentapi
stabem
Official TypeScript/Node.js SDK for ContentAPI — extract content from any URL
scrapix-cli
simiokunowo
A TypeScript-based CLI Application for scraping Google images
@dmsdc-ai/aigentry-dustcraw
duckyoung_kim
Airborne signal absorber — collects floating public data (RSS/API/web) and feeds aigentry-brain