Data Extraction
667AI tools in the Data Extraction category
@xcrap/extractor
marcuth
Xcrap Extractor is a package of the Xcrap framework, it was developed to take care of the data extraction part of text files (currently supporting only HTML, JSON and Markdown) using declarative models.
...morecrawl-obj
vkolluru1974
A utility package for telecom automation and integration. Includes telecom-mas-agent and other useful libraries.
nautiljon-scraper-mod
junkofly
Nautiljon's anime and manga website scraping tool
crawl-dir
vkolluru1974
A utility package for telecom automation and integration. Includes telecom-mas-agent and other useful libraries.
@cle-does-things/scpr
cle-does-things
Simple and intuitive CLI tool and MCP server to perform web scraping operations.
transparent-proxy
gr3p
Real transparent HTTP-Proxy-Server. Upstream your requests whatever you want!
deepcrawl
felixlyu1018
JavaScript/TypeScript SDK for Deepcrawl API
browser-tls-fetch
dan1ve
fetch-compatible HTTP client with TLS fingerprinting
gurkha
monitz87
Data extraction module
koonjs
scrapehub
Browser-impersonating HTTP client with TLS/HTTP2 fingerprint spoofing
@rahulxf/random-user-agent
rahulxf
Generate random user agent
hrequests-js
jwriter20
TypeScript port of hrequests library - Full-featured HTTP client with TLS fingerprinting and browser automation
x-crawl
coderhxl
x-crawl is a flexible Node.js AI-assisted crawler library.
sl-dbmaria
putraadtya26
A powerful web scraping tool for everything
@absahmad/wreq-js
GitHub Actions
Node.js/TypeScript HTTP client with browser TLS fingerprint impersonation (JA3/JA4). Bypass Cloudflare and anti-bot detection. Rust-powered, fetch()-compatible.
...morecomponent-search2
timaschew
search through crawl components
@teng-lin/agent-fetch
teng-lin
Full-content web fetcher with Chrome TLS fingerprinting and multi-strategy content extraction
cloudbypass-skill
cloudbypass
穿云API的OpenClaw技能实现,用于绕过Cloudflare等反爬虫保护
top-user-agents
kikobeats
An always up-to-date list of the top 100 most common browser user-agents for HTTP requests
rebrowser-playwright-core
nwebson
A drop-in replacement for playwright-core patched with rebrowser-patches. It allows to pass modern automation detection tests.
...more