Evaluate and benchmark your LLM applications for quality, safety, and performance.
7 articles
Introduction to the LLM evaluation platform and key concepts.
Create and run evaluation experiments to test your models.
Upload, browse, and manage evaluation datasets.
Set up evaluation metrics and scoring thresholds.
Run demographic bias audits against compliance frameworks like NYC LL144 and EEOC.
View and manage the AI models used across your evaluation experiments.
Compare two models side by side on the same prompt.