How AI systems are assessed or tested.
21 resources
Template and guidance for conducting Fundamental Rights Impact Assessments as required by the EU AI Act for high-risk AI systems. It helps organizations assess potential impacts on fundamental rights and document mitigation measures.
Canada's Algorithmic Impact Assessment tool helps federal departments assess and mitigate risks associated with automated decision systems. It assigns an impact level (I-IV) based on the system's potential effects on individuals and society.
A unified framework for evaluating language models across hundreds of tasks. The LM Evaluation Harness provides standardized benchmarking capabilities for assessing model capabilities, safety, and alignment properties.
The ISO/IEC 25000 series (SQuaRE) provides a framework for software product quality requirements and evaluation. It establishes quality models, metrics, and evaluation processes applicable to AI systems as software products.
A practical template for systematically evaluating AI systems from initial concept to real-world deployment. The guide covers risks, benefits, and impacts assessment and serves as an operational playbook for organizations implementing AI systems.
This template provides a structured list of questions related to data protection issues that organizations should consider prior to conducting a Data Protection Impact Assessment. It serves as a practical guide for identifying and evaluating privacy risks in data processing activities.
Microsoft's template for conducting responsible AI impact assessments to evaluate AI systems against responsible AI goals and principles. The template provides structured guidance for assessing which responsible AI goals apply to specific AI systems and evaluating their potential impacts.
A mandatory risk assessment questionnaire tool that supports Canada's Treasury Board Directive on Automated Decision-Making. The tool determines the impact level of automated decision-systems through risk evaluation and mitigation measures.
This report examines Canada's Algorithmic Impact Assessment (AIA) process through interviews with Treasury Board oversight team members and a Canadian immigration lawyer. It provides insights into how the AIA framework operates in practice and its real-world impacts on government algorithmic decision-making, particularly in immigration cases.
The NIST AI Risk Management Framework is a voluntary framework designed to help organizations incorporate trustworthiness considerations into AI products, services, and systems. It provides guidance for the design, development, use, and evaluation of AI technologies with a focus on risk management and responsible AI practices.
A comprehensive risk assessment guide developed by the UC AI Council to help evaluate AI systems and their potential risks within university settings. The guide provides structured methodologies for assessing AI model training, bias risks, development processes, and validation procedures for institutional AI deployments.
A practical guide for conducting AI risk assessments that covers identifying and evaluating the probability of harm from AI systems. The resource provides methodologies for implementing mitigation measures and documenting risk assessment processes for governance compliance.
A comprehensive framework developed by The Institute of Internal Auditors for auditing artificial intelligence systems and implementations. The framework provides guidance and methodologies for internal auditors to assess AI-related risks, controls, and governance structures within organizations.
A comprehensive auditing checklist developed by the European Data Protection Board for evaluating AI algorithms based on machine learning. The document covers the complete AI lifecycle including algorithm training, pre-processing, and operational implementation stages from a data processing perspective.
A comprehensive checklist and framework for auditing AI systems, focusing on technical evaluation procedures and compliance requirements. The resource emphasizes building automated testing pipelines for continuous monitoring of AI system performance and data quality within CI/CD environments.
A step-by-step guide that explains how to use fairness metrics to detect and quantify bias in AI models. The resource helps practitioners identify where AI systems may cause disparate treatment against certain groups and provides methods for building more equitable AI systems.
This resource provides guidance on evaluating machine learning models for fairness and bias using Google Cloud's Vertex AI platform. It explains how unfair models can cause systemic harm to underrepresented groups and offers specific evaluation metrics to detect bias during data collection and post-training evaluation processes.
This research paper presents a scoping review analyzing fairness techniques in clinical AI applications and identifies evidence gaps in current methodologies. The study examines group fairness approaches, outcome fairness metrics, and various bias mitigation methods used in healthcare AI systems.
Technical documentation for evaluating AI models using Google Cloud's Vertex AI platform. Covers methods for running batch inference jobs and preparing ground truth data for model assessment using both AutoML and custom training approaches.
OLMES is a standardized framework for reproducible language model evaluations that is open, practical, and fully documented. It can be applied to existing leaderboards and evaluation code bases to ensure consistent and reliable AI model assessment.
DeepEval is an open-source framework designed for evaluating and testing large language model systems. It provides a simple-to-use interface similar to Pytest but specialized for unit testing LLM outputs and performance.