Q1 & Q2, 2024 Update: A Comprehensive Guide for GenAI Safety and Security

Aug 11, 2024

This resource is a work in progress. We encourage community contributions! See Contributing for details.

Table of Contents

Red Teaming and Adversarial ML

Counterfit: Microsoft's open-source automation tool for security testing of AI systems.
PyRIT: Microsoft's Python Risk Identification Toolkit for generative AI, a framework for red teaming foundation models and their applications.
Adversarial Robustness Toolbox (ART): A Python library for machine learning security, including evasion, poisoning, extraction, and inference attacks.
TrojAI: A Python library for generating trojaned AI models, helpful for simulating backdoor attacks.
Adversarial Policies for Safeguarding LLMs (APSL): A framework for creating adversarial policies for red teaming LLMs.
RealToxicityPrompts: A dataset of prompts designed to elicit toxic language from language models.

HELM: Stanford's Holistic Evaluation of Language Models.
MLCommons AI Safety Benchmark: A standardized benchmark for measuring LLM safety.
GenAI - Evaluating Generative AI: NIST's platform for assessing generative AI technologies.
BIG-bench: A collaborative benchmark for measuring language model capabilities.
Dynabench: A platform for dynamic benchmarking of AI models.
EleutherAI Language Model Evaluation Harness: A toolkit for evaluating language models.
TruthfulQA: A benchmark for measuring LLM truthfulness.
Ethics in AI Benchmark: A benchmark for evaluating ethical considerations in AI.
RobustBench: A benchmark for evaluating the robustness of image classification models.
ImageNet-C: A benchmark for evaluating the robustness of image classifiers to corruptions.
GLUE Benchmark: A collection of resources for evaluating NLU models.
Dioptra: NIST's software testbed for assessing the trustworthy characteristics of AI, with a focus on adversarial attacks and data poisoning.
Inspect: The U.K. AI Safety Institute's toolkit for assessing model capabilities and safety.

Privacy-Preserving Machine Learning (PPML): Resources for privacy-preserving techniques.
Federated Learning: Resources for training AI models on decentralized data.
Differential Privacy: Resources for adding noise to datasets to protect privacy.
CrypTen: Meta's framework for privacy-preserving machine learning.
TensorFlow Privacy: Google's library for training models with differential privacy.
OpenMined: A community developing tools for privacy-preserving machine learning.
PySyft: A library for secure and private deep learning.
Google's Secure AI Framework (SAIF): A taxonomy of AI security risks with recommended mitigations.
Microsoft's Privacy-Preserving Machine Learning: Resources for building privacy-protecting AI systems.

AI Explainability 360: IBM's toolkit for explaining AI model decisions.
LIME: A technique for explaining ML classifier predictions.
SHAP: A method for explaining individual ML model predictions.
Captum: Meta's model interpretability library.
Gemma Scope: Google Deepmind's set of sparse autoencoders (SAEs) for interpreting Gemma 2 models.
Mishax: Google Deepmind's tool for aiding language model interpretability, used in Gemma Scope development.

NIST AI Risk Management Framework: A voluntary framework for managing AI system risks.
EU Artificial Intelligence Act: A proposed regulation for safe and ethical AI in the European Union.
OECD AI Principles: Principles for responsible AI stewardship.
The White House Blueprint for an AI Bill of Rights: Guidelines for ethical AI development.
UNESCO Recommendation on the Ethics of Artificial Intelligence: A global framework for ethical AI.
Google's Secure AI Framework (SAIF): A taxonomy of AI security risks and mitigations.

Partnership on AI: Non-profit advancing responsible AI.
Stanford CRFM: Research center for understanding and improving foundation models.
OpenAI Safety and Security: OpenAI's resource page on AI safety and security.

Contributions are welcome! If you know of any tools or resources that should be included, please submit a pull request.

July 1, 2024