inspect_evals
Copyright 2024 UK AI Security Institute

This product is licensed under the MIT License. It includes code from
third-party projects under various licenses:

  - MIT License — full text is the repository's own `LICENSE` file.
  - Apache License, Version 2.0 — full text in
    `third-party-licenses/Apache-2.0.txt`.
  - Llama 3.2 Community License Agreement — full text at
    https://www.llama.com/llama3_2/license/

Per-project copyright notices and source pointers are listed below.

---

BBEH (BigBench Extra Hard)
Copyright 2025 Google LLC
License: Apache-2.0
Source: https://github.com/google-deepmind/bbeh

Included in: src/inspect_evals/bbeh/utils.py

---

GDM In-House CTF
Copyright 2024 DeepMind Technologies Limited
License: Apache-2.0

Included in:
  src/inspect_evals/gdm_in_house_ctf/challenges/spray/app.py
  src/inspect_evals/gdm_in_house_ctf/challenges/idor/app.py
  src/inspect_evals/gdm_in_house_ctf/challenges/sqli/app.py
  src/inspect_evals/gdm_in_house_ctf/challenges/cmd_injection/app.py

---

GDM In-House CTF
Copyright 2024 Google LLC
License: Apache-2.0

Included in:
  src/inspect_evals/gdm_in_house_ctf/challenges/pw_reuse/db.sql
  src/inspect_evals/gdm_in_house_ctf/challenges/db_3/db.sql
  src/inspect_evals/gdm_in_house_ctf/challenges/sqli/app.sql

---

SciCode
Copyright 2024 The SciCode Authors
License: Apache-2.0
Source: https://github.com/scicode-bench/SciCode

Included in (adapted):
  src/inspect_evals/scicode/process_data.py
    (adapted from https://github.com/scicode-bench/SciCode/blob/main/src/scicode/parse/parse.py)
  src/inspect_evals/scicode/test_util.py
    (adapted from https://github.com/scicode-bench/SciCode/blob/main/src/scicode/compare/cmp.py)

---

GAIA Benchmark Scorer
Copyright 2023 The GAIA Benchmark Authors
License: Apache-2.0
Source: https://huggingface.co/spaces/gaia-benchmark/leaderboard/blob/main/scorer.py

Included in (partially copied): src/inspect_evals/gaia/scorer.py

---

HELM (Holistic Evaluation of Language Models)
Copyright 2022 Stanford CRFM
License: Apache-2.0
Source: https://github.com/stanford-crfm/helm

Included in (partially copied): src/inspect_evals/air_bench/air_bench.py

---

Gorilla (Berkeley Function Call Leaderboard)
Copyright 2023 The Gorilla Authors
License: Apache-2.0
Source: https://github.com/ShishirPatil/gorilla

Included in (partially copied):
- src/inspect_evals/bfcl/data.py
- src/inspect_evals/bfcl/score/scorer.py
- src/inspect_evals/bfcl/score/multi_turn_scorer.py
- src/inspect_evals/bfcl/solve/single_turn_solver.py
- src/inspect_evals/bfcl/solve/multi_turn_solver.py
- src/inspect_evals/bfcl/utils/task_categories.py
- src/inspect_evals/bfcl/utils/function_parsing.py
- src/inspect_evals/bfcl/backends/loader.py
- src/inspect_evals/bfcl/backends/downloader.py
- src/inspect_evals/bfcl/prompts.py

---

HuggingFace Datasets
Copyright 2020 The HuggingFace Datasets Authors and the current dataset script contributor.
License: Apache-2.0

Included in:
  src/inspect_evals/piqa/huggingface_artifact/piqa.py
  src/inspect_evals/apps/huggingface_artifact/apps.py

---

HuggingFace Datasets
Copyright 2022 The HuggingFace Datasets Authors and the current dataset script contributor.
License: Apache-2.0

Included in:
  src/inspect_evals/medqa/huggingface_artifact/med_qa.py

---

TensorFlow Datasets / HuggingFace Datasets
Copyright 2020 The TensorFlow Datasets Authors and the HuggingFace Datasets Authors.
License: Apache-2.0

Included in:
  src/inspect_evals/abstention_bench/recipe/abstention_datasets/huggingface_artifact/situated_qa.py

---

TensorFlow Datasets / HuggingFace Datasets
Copyright 2022 The TensorFlow Datasets Authors and the HuggingFace Datasets Authors.
License: Apache-2.0

Included in:
  src/inspect_evals/abstention_bench/recipe/abstention_datasets/huggingface_artifact/qasper.py

---

APE (Attempt to Persuade Eval)
Copyright 2025 Alignment Research Center
License: Apache-2.0
Source: https://github.com/AlignmentResearch/AttemptPersuadeEval

Included in (adapted):
  src/inspect_evals/ape/prompts.py
  src/inspect_evals/ape/utils.py

---

The Agent Company
Distributed under the MIT License
Source: https://github.com/TheAgentCompany/TheAgentCompany

Included in (adapted):
  src/inspect_evals/theagentcompany/tasks/<task>/evaluator.py

---

latch-eval-tools
Copyright LatchBio
Source: https://github.com/latchbio/latch-eval-tools

Included in (adapted):
  src/inspect_evals/scbench/graders/base.py
  src/inspect_evals/scbench/graders/numeric.py
  src/inspect_evals/scbench/graders/marker_gene.py
  src/inspect_evals/scbench/graders/distribution.py
  src/inspect_evals/scbench/graders/label_set.py
  src/inspect_evals/scbench/graders/multiple_choice.py
  src/inspect_evals/scbench/graders/spatial.py

---

PaperBench (OpenAI frontier-evals)
Distributed under the MIT License
Source: https://github.com/openai/frontier-evals

Included in (ported):
  src/inspect_evals/paperbench/score/blacklist_monitor.py
    (ported from https://github.com/openai/frontier-evals/blob/main/project/paperbench/paperbench/monitor/monitor.py)

---

PurpleLlama CybersecurityBenchmarks / CyberSecEval
Copyright (c) Meta Platforms, Inc. and affiliates.
License: Llama 3.2 Community License Agreement
  (see https://www.llama.com/llama3_2/license/)
Source: https://github.com/meta-llama/PurpleLlama

Included in (adapted):
  src/inspect_evals/cyberseceval_2/prompt_injection/task.py
  src/inspect_evals/cyberseceval_2/interpreter_abuse/task.py
  src/inspect_evals/cyberseceval_2/vulnerability_exploit/task.py
  src/inspect_evals/cyberseceval_3/visual_prompt_injection/task.py
  src/inspect_evals/cyberseceval_4/instruct_or_autocomplete/task.py
  src/inspect_evals/cyberseceval_4/mitre/task.py
  src/inspect_evals/cyberseceval_4/mitre_frr/task.py
  src/inspect_evals/cyberseceval_4/multilingual_prompt_injection/task.py
  src/inspect_evals/cyberseceval_4/multiturn_phishing/task.py
  src/inspect_evals/cyberseceval_4/autonomous_uplift/task.py
  src/inspect_evals/cyberseceval_4/autopatching/task.py
  src/inspect_evals/cyberseceval_4/insecure_code_detector/analyzers.py
  src/inspect_evals/cyberseceval_4/insecure_code_detector/insecure_code_detector.py
  src/inspect_evals/cyberseceval_4/insecure_code_detector/insecure_patterns.py
  src/inspect_evals/cyberseceval_4/insecure_code_detector/issues.py
  src/inspect_evals/cyberseceval_4/insecure_code_detector/languages.py
  src/inspect_evals/cyberseceval_4/insecure_code_detector/oss.py
  src/inspect_evals/cyberseceval_4/insecure_code_detector/usecases.py
  src/inspect_evals/cyberseceval_4/insecure_code_detector/rules/semgrep/python/subprocess_using_shell.py
  src/inspect_evals/cyberseceval_4/insecure_code_detector/rules/semgrep/python/exec_use.py
  src/inspect_evals/cyberseceval_4/insecure_code_detector/rules/semgrep/python/eval_use.py
  src/inspect_evals/cyberseceval_4/insecure_code_detector/rules/semgrep/python/sql_injection_cursor_execute.py
  src/inspect_evals/cyberseceval_4/insecure_code_detector/rules/semgrep/python/deserialization_pickle_use.py

---

CyberSOCEval
Copyright (c) Meta Platforms, Inc. and affiliates.
License: Llama 3.2 Community License Agreement
  (see https://www.llama.com/llama3_2/license/)
Source: https://github.com/meta-llama/PurpleLlama/tree/main/CyberSOCEval

Included in (adapted):
  src/inspect_evals/cyberseceval_4/malware_analysis.py
  src/inspect_evals/cyberseceval_4/threat_intelligence.py

---

simple-evals
Copyright (c) 2024 OpenAI
License: MIT
Source: https://github.com/openai/simple-evals

Included in:
  src/inspect_evals/healthbench/scorer.py
    (GRADER_TEMPLATE, parse_json_to_dict, calculate_score, and the
    bootstrap statistics helper, taken or adapted from
    https://github.com/openai/simple-evals/blob/main/healthbench_eval.py)
  src/inspect_evals/drop/drop.py
    (based on https://github.com/openai/simple-evals/blob/main/drop_eval.py)
  src/inspect_evals/mmlu/mmlu.py
    (based on https://github.com/openai/simple-evals/blob/main/mmlu_eval.py)
  src/inspect_evals/humaneval/humaneval.py
    (based on https://github.com/openai/simple-evals/blob/main/humaneval_eval.py)
  src/inspect_evals/math/math.py
    (based on https://github.com/openai/simple-evals/blob/main/math_eval.py)
  src/inspect_evals/vqa_rad/scorer.py
    (grader template adapted from SimpleQA in
    https://github.com/openai/simple-evals)

---

frontier-evals
Copyright (c) 2025 OpenAI
License: MIT
Source: https://github.com/openai/frontier-evals

Included in (partially ported):
  src/inspect_evals/paperbench/score/utils.py
    (partially ported from
    https://github.com/openai/frontier-evals/blob/main/project/paperbench/paperbench/judge/utils.py)

---

google-research (Instruction-Following Eval, MBPP)
Copyright Google LLC
License: Apache-2.0
Source: https://github.com/google-research/google-research

Included in (based on):
  src/inspect_evals/ifeval/ifeval.py
    (based on
    https://github.com/google-research/google-research/tree/master/instruction_following_eval)
  src/inspect_evals/mbpp/mbpp.py
    (based on
    https://github.com/google-research/google-research/tree/master/mbpp)

---

lm-evaluation-harness
Copyright (c) 2020 EleutherAI
License: MIT
Source: https://github.com/EleutherAI/lm-evaluation-harness

Included in (based on):
  src/inspect_evals/mmlu_pro/mmlu_pro.py
    (based on
    https://github.com/EleutherAI/lm-evaluation-harness/tree/main/lm_eval/tasks/mmlu_pro)
  src/inspect_evals/math/math.py
    (based on minerva_math and hendrycks_math task definitions)

---

ClassEval
Copyright (c) 2023 NLP Group, Nanyang Technological University
License: MIT
Source: https://github.com/FudanSELab/ClassEval

Included in (adapted):
  src/inspect_evals/class_eval/utils.py
    (InferenceUtil and construct_prompt adapted from the ClassEval project)

---

novelty-bench
Copyright (c) 2025 Yiming Zhang
License: MIT
Source: https://github.com/novelty-bench/novelty-bench

Included in (adapted):
  src/inspect_evals/novelty_bench/partition.py
  src/inspect_evals/novelty_bench/score.py

---

MuSR
Copyright (c) 2024 Zayne Sprague
License: MIT
Source: https://github.com/Zayne-sprague/MuSR

Included in (adapted):
  src/inspect_evals/musr/prompts.py

---

tau2-bench
Copyright (c) 2025 Sierra Research
License: MIT
Source: https://github.com/sierra-research/tau2-bench

Included in (closely adapted):
  src/inspect_evals/tau2/data_model/tasks.py
  src/inspect_evals/tau2/environment/db.py
  src/inspect_evals/tau2/airline/data_model.py
  src/inspect_evals/tau2/retail/tools.py
  src/inspect_evals/tau2/retail/data_model.py
  src/inspect_evals/tau2/telecom/agents.py
  src/inspect_evals/tau2/telecom/TelecomAgentTools.py
  src/inspect_evals/tau2/telecom/user_data_model.py
  src/inspect_evals/tau2/telecom/data_model.py

---

LiteLLM
Copyright (c) 2023 Berri AI
License: MIT
Source: https://github.com/BerriAI/litellm

Included in (data referenced):
  src/inspect_evals/paperbench/score/utils.py
    (OPENAI_CONTEXT_WINDOWS values referenced from
    model_prices_and_context_window.json)
