aplomb
Copyright (c) 2026 Shivam Ratnakar, Kartikeya Vats

This product includes and/or derives from third-party materials:

AdvBench (harmful anchors)
  Source: https://github.com/llm-attacks/llm-attacks
  License: MIT
  Use: harmful anchor prompts are loaded at build time to derive the averaged
  u_ref vector. AdvBench prompts are NOT redistributed in this package; only the
  derived (averaged) direction is shipped.

Frozen benign anchor set (data/benign_anchors_v1.json)
  Original benign prompts authored for this project. Hard-negative coverage is
  inspired by the categories in XSTest (https://huggingface.co/datasets/walledai/XSTest,
  CC-BY-4.0); prompts here are original paraphrases, not copies.

Qwen2.5-1.5B-Instruct (default backbone)
  Source: https://huggingface.co/Qwen/Qwen2.5-1.5B-Instruct
  License: Apache-2.0 (ungated). Verify the exact checkpoint's license, as some
  Qwen variants ship under the Qwen Research License.

Llama-3.1-8B-Instruct (optional, gated reference backbone)
  Source: https://huggingface.co/meta-llama/Llama-3.1-8B-Instruct
  License: Llama 3.1 Community License, Copyright (c) Meta Platforms, Inc.
  If you build and distribute a u_ref artifact derived from a Llama model, you must:
    - provide a copy of the Llama 3.1 Community License,
    - prominently display "Built with Llama",
    - retain this notice: "Llama 3.1 is licensed under the Llama 3.1 Community
      License, Copyright (c) Meta Platforms, Inc. All Rights Reserved.",
    - comply with the Llama Acceptable Use Policy.
  To avoid distributing a Llama-derived artifact, build the Llama u_ref locally
  rather than committing it.

This library implements detection only. The contrastive-logit steering ATTACK from
the source paper is intentionally excluded and maintained separately under gated access.
