Data Extraction

651

AI tools in the Data Extraction category

All (651)MCP Servers (64)Skills (557)Agents (30)

@agentic-intelligence/dom-engine

chapa0711

Agentic DOM Intelligence - A lightweight TypeScript library for DOM analysis and manipulation, designed for web automation and AI agents

...more

SkillData Extraction

11 dir

@activepieces/piece-browserless

abdul_activepiecer

Browserless is a cloud-based browser automation platform that allows you to run full Chrome sessions remotely for tasks like taking screenshots, scraping data, converting pages to PDFs, and more without writing scraping code or managing servers.

...more

SkillData Extraction

1 dir

openscrape-cl

snowbase-studio

Interactive CLI tool for web scraping with Puppeteer. Extract titles, descriptions, links, headings, paragraphs, and full text from any website.

...more

SkillData Extraction

1 dir

@mithron/deezer-music-metadata

mithron

A Typescript package for scraping Deezer musics (Support Tracks, Albums, Playlists, and Share Links) also include search option

...more

SkillData Extraction

1 dir

tavily-cli

pyrytakala

CLI for the Tavily AI search API - search, extract, crawl, and map the web

SkillData Extraction

1 dir

reviewbr-mcp

vic3m

MCP Server for Brazilian Academic Repositories (OAI-PMH, DSpace REST, HTML scraping) and PRISMA Systematic Reviews

MCP ServerData Extraction

1 dir

xtor

daijiahua

Declarative HTML data extraction library with schema-based selectors

SkillData Extraction

1 dir

parallaxapis-sdk-playwright

pxcaptcha

ParallaxAPIs SDK

SkillData Extraction

351 dir

n8n-nodes-serpex

divyeshradadiya

n8n community node for Serpex - Real-time search results from Google, Bing, DuckDuckGo, and more

SkillData Extraction

1 dir

ag-webscrape

GitHub Actions

TypeScript web scraper with Playwright fallback for anti-scraping protection

SkillData Extraction

1 dir

@testmuai/testmu-cloud

testmu

The AI Browser Automation SDK for TestMu AI Browser Cloud

AgentData Extraction

1 dir

crawl-server

imike3049

Efficient SEO-focused server for Wasm-generated pages

SkillData Extraction

11 dir

@dmsdc-ai/aigentry-dustcraw

duckyoung_kim

Airborne signal absorber — collects floating public data (RSS/API/web) and feeds aigentry-brain

SkillData Extraction

1 dir

camofox-browser

redf0x1

Anti-detection browser server for AI agents — REST API wrapping Camoufox engine with OpenClaw plugin support

MCP ServerData Extraction

421 dir

open-web-unlocker

GitHub Actions

Fetch public web pages through a configurable fetch/browser pipeline and parse them into structured JSON or clean markdown.

...more

MCP ServerData Extraction

41 dir

@crawl-me-maybe/sitemap

autopsyaardvark

A generic sitemap generation Vite plugin. Outputs sitemap.xml and robots.txt files after build. **This does not scan your directory for outputted routes, that approach only works for fully static sites. ISR and SSR are offlimits, hence I made this.**

...more

SkillData Extraction

1 dir

@4ier/neo

4ier

Turn any website into an AI-callable API. Passive traffic capture, API schema generation, and execution.

AgentData Extraction

6021 dir

apex-scraper

semo_dev

A stealth web scraper for crawling websites and extracting clean text content with page and word limits.

SkillData Extraction

1 dir

@monibrand/se-scraper

sviande

A module using puppeteer to scrape several search engines such as Google, Bing and Duckduckgo

SkillData Extraction

31 dir

almuten-scraper

oliver797

A tool for scraping and calculating almuten (planetary dignity) in astrology

SkillData Extraction

11 dir