Quantity Isn't Quality
There are thousands of AI skills available across various platforms and repositories. That sounds great until you actually try to find one for a specific task. You'll find ten skills that claim to do code review. Three of them haven't been updated in months. Two work only with a specific model version. One is actually just a wrapper around another skill. And the remaining four vary wildly in quality.
This isn't a problem unique to AI skills. Every software marketplace goes through a phase where quantity grows faster than quality. App stores, plugin marketplaces, and package registries all experienced the same pattern. But for AI skills, the quality problem is harder to detect because skill outputs are variable by nature. A skill that works 80% of the time might look fine during a quick test but frustrate you in daily use.
Why Quality Is Hard to Measure
Traditional software either works or it doesn't. An npm package that handles dates either parses your format correctly or throws an error. AI skills operate in a gray zone. A summarization skill might produce a summary that's technically accurate but misses the key points. A code review skill might catch syntax issues but miss logic bugs. You won't know until you've used it enough to see the patterns.
This makes skill evaluation time-consuming. You can't just install a skill, run one test, and decide. You need to run it against multiple inputs, compare its output against what you'd produce manually, and assess whether the quality level is good enough for your use case. That's a real investment of time, and most people don't make it before committing to a skill.
The Curation Gap
What's missing is reliable curation. Cross-referencing across directories helps for MCP servers because presence in multiple curated lists is a quality signal. For skills, the curation infrastructure is less developed. There are fewer skill-specific directories, fewer review processes, and fewer quality metrics.
Security scoring helps for the technical quality dimension but doesn't capture output quality. A skill can be perfectly secure and well-maintained while still producing mediocre results. The output quality dimension needs different evaluation approaches, and the ecosystem hasn't standardized these yet.
What Works Now
Until better curation exists, the most reliable approach is to lean on community recommendations. When a developer you trust says a specific skill works well for a specific use case, that recommendation carries more weight than any listing or metric. Building a network of people who evaluate and share skill recommendations is the most practical strategy for finding quality in a noisy marketplace.
Testing skills yourself against your actual use cases is the second-best approach. It takes time upfront but saves frustration downstream. A skill that passes your own quality bar is worth more than one with impressive metrics but untested performance in your context.