Data Extraction
667AI tools in the Data Extraction category
secret-agent
blakebyrnes
The Web Browser Built for Scraping
@devbookhq/docsets-pipeline-manager
valentatomas
CLI for adding new documentation to Devbook. The CLI command name is `docsets`. You can start documentation scraping and indexing based on configs in the `devbook-docsets` repository with the `create` sub-command. Then you can release documentation that f
...morewalkscape-helper
rikurb8
WalkScape helper - wiki scraping and AI-powered Q&A
crawlee-one
juro-oravec
Production-ready web scraping in a single function call. Built on Crawlee. Data transforms, caching, privacy compliance, and error tracking -- out of the box.
...morebright-data-scraping-browser-nodejs-playwright-project
steiner-hakas
Dependency Confusion to RCE By Steiner254
crawly-mccrawlface
budickda
Crawl data from webpages and apply content extraction.
@sharpapi/sharpapi-node-web-scraping
makowskid
SharpAPI.com Node.js SDK for Web Scraping API
@framers/agentos-ext-news-search
jdunnfive
News article search via NewsAPI for AgentOS
@jambudipa/spider
marknorgate
A comprehensive web scraping library with resumable operations, middleware support, and built-in rate limiting
googlethis
luanrt
A simple yet powerful module to retrieve organic search results and much more from Google.
@evointel/anno
evo-dragon
Web content extraction for AI agents — ensemble extraction with confidence scoring, 93% token reduction vs raw HTML
scrapegraph-js
vincigit00
Official JavaScript/TypeScript SDK for the ScrapeGraph AI API — smart web scraping powered by AI
@olib-ai/owl-browser-sdk
ahstanin
Node.js SDK for Owl Browser automation - Async-first with dynamic OpenAPI method generation
@algolia/netlify-plugin-crawler
h1fra
This plugin links your Netlify site with Algolia's Crawler. It will trigger a crawl on each successful build.
@xcrap/extractor
marcuth
Xcrap Extractor is a package of the Xcrap framework, it was developed to take care of the data extraction part of text files (currently supporting only HTML, JSON and Markdown) using declarative models.
...more@octivas/mcp
soeffing
MCP server for Octivas web scraping, crawling, and search API
@cle-does-things/scpr
cle-does-things
Simple and intuitive CLI tool and MCP server to perform web scraping operations.
gurkha
monitz87
Data extraction module
@sapkotamadan/cache-server
sapkotamadan
CacheServer is an efficient web page extractor that uses Puppeteer to launch a headless browser and fetch web page content.
...more@rtrvr-ai/core
bhavanikalisetty
Core runtime and API client primitives for rtrvr CLI/SDK