Skills
85,427Reusable AI skills and capabilities for agent workflows
A compendium of information regarding Stable Diffusion (SD)
repo contains the official code, data and sample inversions of Textual Inversion paper
Deforum extension for AUTOMATIC1111's Stable Diffusion webui [[wiki docs]](https://github.com/deforum-art/sd-webui-deforum/wiki)
Image inpainting tool powered by SOTA AI Model
Generating retro pixel game characters with Generative Adversarial Networks. Dataset "TinyHero" included.
GFPGAN aims at developing Practical Algorithms for Real-world Face Restoration
Open Diffusion Models for High-Quality Video Generation
Implementation of MusicLM, Google's new SOTA model for music generation using attention networks, in Pytorch
a simple notebook demonstrating prompt-based music generation via Mubert API
EPUB to audiobook converter, optimized for Audiobookshelf
A curated list of resources of audio-driven talking face generation
"A multi-voice TTS system trained with an emphasis on quality"
Port of OpenAI's Whisper model in C/C++. It can be executed locally.
An Optimized Speech-to-Text Pipeline for the Whisper Model
accelerates transcription with the combination of OpenAI's Whisper Large v2, HF Transformers, Optimum, and flash attention
Foundational Models for State-of-the-Art Speech and Text Translation
Latest Papers and Datasets on Multimodal Large Language Models, and Their Evaluation.
Examples showing how to use the OpenAI vision API to run inference on images, video files and webcam streams
ImageBind One Embedding Space to Bind Them All