General Corpus of Contemporary Brazilian Portuguese with provenance and typology information - Corpus Geral do Português Brasileiro Contemporâneo
Cross-referenced across 55 tracked directories
#3523
Popularity Rank
1 / 55
Listed In
Emerging
Adoption Stage
3/13/2026
First Seen
Recently added to the ecosystem
an open dataset with 30 trillion tokens for training Large Language Models
Large-scale Artificial Intelligence Open Network
a foundational dataset by Meta for research on video learning and multimodal perception [Dataset Download](https://ego-exo4d-data.org/)
...moreexploring 12 million of the 2.3 billion images used to train Stable Diffusion's image generator