Data Extraction
654AI tools in the Data Extraction category
@harvestapi/scraper
xorcuit
HarvestAPI provides LinkedIn data scraping tools for real-time, high-performance scraping at a low cost. API allows to search for Linkedin `jobs`, `companies`, `profiles`, and `posts` using a wide range of filters.
...morenode-web-crawler
jaykshah
Node Web Crawler is a web spider written with Nodejs. It gives you the full power of jQuery on the server to parse a big number of pages as they are downloaded, asynchronously. Scraping should be simple and fun!
...more@ogulcancelik/pi-web-browse
ogulcancelik
Web search and content extraction skill for pi-coding-agent. Search the web and fetch pages via a real headless browser (CDP). Works on Linux, macOS, and Windows.
...morecodehs_grades
randomdevd3v
This is a [NodeJS](https://nodejs.org/) tool using the [Puppeteer](https://developers.google.com/web/tools/puppeteer) headless browser to crawl the [CodeHS](https://codehs.com) code teaching platform for a teacher's students' grades.
...more@xcrap/got-scraping-client
marcuth
Xcrap Got Scraping Client is a package of the Xcrap framework that implements an HTTP client using the Got Scraping library.
...moresite-crawl
raj1000
A CLI tool to recursively crawl websites and download content
crawly-mccrawlface
budickda
Crawl data from webpages and apply content extraction.
mycrawl
zunkun
craw a definite web
@cap.js/widget
tiagozip
Client-side widget for Cap, a lightweight, modern open-source CAPTCHA alternative designed using SHA-256 PoW.
twitter-crawler
herchu
NodeJS Crawler for Twitter
adex-linkedin-scrapper
adefemigreat
Flexible linkedin scrapper developed by Adefemigreat
@hyperbrowser/sdk
leoscope
Node SDK for Hyperbrowser API
rebrowser-puppeteer-core
nwebson
A drop-in replacement for puppeteer-core patched with rebrowser-patches. It allows to pass modern automation detection tests.
...moregetcontentapi
stabem
Official TypeScript/Node.js SDK for ContentAPI — extract content from any URL
apify
GitHub Actions
The scalable web crawling and scraping library for JavaScript/Node.js. Enables development of data extraction and web automation jobs (not only) with headless Chrome and Puppeteer.
...more@mihnea.dev/recaptcha-solver
mihnea.dev
[](https://www.npmjs.com/package/@mihnea.dev/recaptcha-solver) [](https://opensource.org/licenses/MIT)
...more@letsscrapedata/scraper
letsscrapedata
Web scraper that scraping web pages by LetsScrapeData XML template
@nampham1106/search-cli
nampham1106
A modern TypeScript CLI tool for web search and content fetching powered by DuckDuckGo
scrapix-cli
simiokunowo
A TypeScript-based CLI Application for scraping Google images
camofox-browser
redf0x1
Anti-detection browser server for AI agents — REST API wrapping Camoufox engine with OpenClaw plugin support