Datasets and benchmarks

Governance focused datasets, not model training datasets.

16 resources

Type:

16 resources found

datasetUCLA • 2021

FairFace: Face Attribute Dataset for Balanced Race, Gender, and Age

A balanced face image dataset designed for evaluating fairness in face analysis systems. Contains balanced representation across race, gender, and age groups to enable bias evaluation.

Bias and fairness datasets

datasetGoogle & Contributors • 2023

BIG-bench: Beyond the Imitation Game Benchmark

A collaborative benchmark for evaluating large language models across diverse tasks. Includes tasks designed to probe reasoning, knowledge, safety, and alignment properties.

Evaluation datasets

datasetStanford CRFM • 2023

HELM: Holistic Evaluation of Language Models

Stanford's comprehensive framework for evaluating language models across multiple dimensions including accuracy, calibration, robustness, fairness, bias, toxicity, and efficiency.

Evaluation datasets

datasetResponsible AI Collaborative • 2024

AI Incident Database Dataset

Structured dataset of AI incidents and harms for research and analysis. Enables systematic study of AI failures, harm patterns, and risk factors across different AI applications.

Incident datasets

datasetSony AI • 2024

Groundbreaking Fairness Evaluation Dataset From Sony AI

A comprehensive fairness evaluation dataset containing 10,318 consensually-sourced images of 1,981 unique subjects with extensive annotations. This dataset serves as a global benchmark for ethical data collection and responsible AI development, specifically designed to evaluate bias and fairness in AI systems.

Bias and fairness datasets

datasetSony AI • 2024

FHIBE: Fairness Evaluation Dataset for Human-Centric Computer Vision

FHIBE is the first publicly available, consensually-collected, and globally diverse fairness evaluation dataset designed for human-centric computer vision tasks. The dataset serves as a global benchmark for ethical data collection and responsible AI development, enabling researchers and developers to evaluate fairness across diverse populations.

Bias and fairness datasets

datasetNature • 2025

Fair Human-Centric Image Dataset for Ethical AI Benchmarking

The Fair Human-Centric Image Benchmark (FHIBE) is an image dataset designed to evaluate AI systems for fairness and bias in computer vision applications. It implements best practices for responsible data curation and provides standardized benchmarks for testing algorithmic fairness across diverse human populations.

Bias and fairness datasets

researchViso.ai • 2024

Bias Detection in Computer Vision: Ensuring Fairness with AI Models

This resource explores methods for detecting bias in computer vision systems, including CNN feature descriptors and SVM classifiers for identifying bias in visual datasets. It examines how explainable AI techniques can improve transparency and trustworthiness of deep learning models used in computer vision applications.

Bias and fairness datasets

toold4data • 2024

Bias Detection Model

An English sequence classification model specifically trained on the MBAD Dataset to automatically detect bias and assess fairness in textual content, particularly news articles. This tool enables automated analysis of potential biases in written content through machine learning-based classification.

Bias and fairness datasets

toolAlgorithm Audit • 2024

Unsupervised Bias Detection Tool

A technical implementation of the HBAC algorithm that detects bias in algorithmic decision-making systems without requiring labeled data. The tool maximizes differences in bias variables between clusters and includes statistical testing to prevent false conclusions about discriminatory patterns.

Bias and fairness datasetsEU

datasetFuture of Life Institute • 2025

2025 AI Safety Index

An AI safety index that evaluates model performance using Stanford's AIR-Bench 2024 (AI Risk Benchmark). The benchmark is designed to align with emerging government regulations and company policies for AI safety assessment.

Evaluation datasets

datasetMLCommons • 2024

Introducing v0.5 of the AI Safety Benchmark from MLCommons

This paper introduces version 0.5 of the AI Safety Benchmark developed by MLCommons AI Safety Working Group. The benchmark is designed to assess the safety risks of AI systems that use chat-tuned language models, providing a standardized evaluation framework for AI safety.

Evaluation datasets

frameworkMLCommons • 2024

AI Risk & Reliability

MLCommons' AI Risk & Reliability working group develops tests and benchmarks for evaluating AI safety across specific use cases. The framework aims to summarize safety assessment results in ways that enable decision-making by non-experts through standardized benchmarking approaches.

Evaluation datasets

datasetNature Publishing Group • 2025

Responsible AI Measures Dataset for Ethics Evaluation of AI Systems

A comprehensive dataset consolidating 12,067 data points across 791 evaluation measures covering 11 ethical principles for AI systems. The dataset is extracted from 257 computing literature sources and provides standardized metrics for evaluating the ethical dimensions of AI systems.

Evaluation datasets

researchWikipedia • 2024

Algorithmic Bias

Wikipedia article covering algorithmic bias, including well-documented examples like the COMPAS criminal risk assessment software that has been criticized for exhibiting racial bias. The article discusses how biased datasets can perpetuate and amplify discrimination in algorithmic decision-making systems.

Bias and fairness datasets

toolArize AI • 2024

Algorithmic Bias: Examples and Tools for Tackling Model Fairness In Production

A resource from Arize AI providing examples of algorithmic bias and practical tools for addressing model fairness issues in production environments. The resource highlights various bias mitigation tools including Google's PAIR AI tools for addressing fairness and bias in image datasets using TensorFlow.

Bias and fairness datasets