Data Extraction
651AI tools in the Data Extraction category
@agentic-intelligence/dom-engine
chapa0711
Agentic DOM Intelligence - A lightweight TypeScript library for DOM analysis and manipulation, designed for web automation and AI agents
...more@activepieces/piece-browserless
abdul_activepiecer
Browserless is a cloud-based browser automation platform that allows you to run full Chrome sessions remotely for tasks like taking screenshots, scraping data, converting pages to PDFs, and more without writing scraping code or managing servers.
...moreopenscrape-cl
snowbase-studio
Interactive CLI tool for web scraping with Puppeteer. Extract titles, descriptions, links, headings, paragraphs, and full text from any website.
...more@mithron/deezer-music-metadata
mithron
A Typescript package for scraping Deezer musics (Support Tracks, Albums, Playlists, and Share Links) also include search option
...moretavily-cli
pyrytakala
CLI for the Tavily AI search API - search, extract, crawl, and map the web
reviewbr-mcp
vic3m
MCP Server for Brazilian Academic Repositories (OAI-PMH, DSpace REST, HTML scraping) and PRISMA Systematic Reviews
xtor
daijiahua
Declarative HTML data extraction library with schema-based selectors
parallaxapis-sdk-playwright
pxcaptcha
ParallaxAPIs SDK
n8n-nodes-serpex
divyeshradadiya
n8n community node for Serpex - Real-time search results from Google, Bing, DuckDuckGo, and more
ag-webscrape
GitHub Actions
TypeScript web scraper with Playwright fallback for anti-scraping protection
@testmuai/testmu-cloud
testmu
The AI Browser Automation SDK for TestMu AI Browser Cloud
crawl-server
imike3049
Efficient SEO-focused server for Wasm-generated pages
@dmsdc-ai/aigentry-dustcraw
duckyoung_kim
Airborne signal absorber — collects floating public data (RSS/API/web) and feeds aigentry-brain
camofox-browser
redf0x1
Anti-detection browser server for AI agents — REST API wrapping Camoufox engine with OpenClaw plugin support
open-web-unlocker
GitHub Actions
Fetch public web pages through a configurable fetch/browser pipeline and parse them into structured JSON or clean markdown.
...more@crawl-me-maybe/sitemap
autopsyaardvark
A generic sitemap generation Vite plugin. Outputs sitemap.xml and robots.txt files after build. **This does not scan your directory for outputted routes, that approach only works for fully static sites. ISR and SSR are offlimits, hence I made this.**
...more@4ier/neo
4ier
Turn any website into an AI-callable API. Passive traffic capture, API schema generation, and execution.
apex-scraper
semo_dev
A stealth web scraper for crawling websites and extracting clean text content with page and word limits.
@monibrand/se-scraper
sviande
A module using puppeteer to scrape several search engines such as Google, Bing and Duckduckgo
almuten-scraper
oliver797
A tool for scraping and calculating almuten (planetary dignity) in astrology