Open-Source Toolkit for Efficient Unstructured Data Processing with Pre-built Modules and Local to Cluster Scalability.
Cross-referenced across 55 tracked directories
#2561
Popularity Rank
1 / 55
Listed In
Emerging
Adoption Stage
Mar 13, 2026
First Seen
Recently added to the ecosystem
Score: 100/100
0 dependency vulnerabilities found
Run an AI-powered security scan to analyze this package's source code for vulnerabilities, prompt injection vectors, data exfiltration risks, and behavior mismatches.
Scans fetch actual source code from the GitHub repository — not just the README.
Freeing data processing from scripting madness by providing a set of platform-agnostic customizable pipeline processing blocks.
...moreA powerful tool for creating high-quality training datasets for Large Language Models