Metadata-Version: 2.4
Name: unbias-plus
Version: 0.1.6
Summary: Bias detection and debiasing: identify segments, classify severity, get reasoning and replacements, full neutral rewrite
Project-URL: Homepage, https://github.com/VectorInstitute/unbias-plus
Project-URL: Repository, https://github.com/VectorInstitute/unbias-plus
Author-email: "Ahmed Y. Radwan" <ahmed.radwan@vectorinstitute.ai>
License: <h2 align="center"><b>Vector Institute License</b></h2>
        
        <p align="center">Last updated: 12-08-2025</p>
        
        THIS VECTOR INSTITUTE LICENSE (“AGREEMENT”) IS A LEGALLY BINDING AGREEMENT BETWEEN YOU OR THE ENTITY YOU REPRESENT AND THE VECTOR INSTITUTE (“VECTOR”) AND GOVERNS YOUR USE OF THE WORK THAT HAS BEEN PROVIDED TO YOU BY VECTOR AND/OR ANY OF ITS AFFILIATES. THE WORK MAY ONLY BE ACCESSED AND USED BY ACADEMIC ENTITIES, SPONSORS, OR PARTNERS (ALL AS DEFINED BELOW). BY ACCESSING OR USING THE WORK, YOU REPRESENT AND WARRANT THAT YOU ARE AN ACADEMIC ENTITY, A SPONSOR, OR A PARTNER OR THAT YOU ARE ENTERING INTO THIS AGREEMENT ON BEHALF OF AN ACADEMIC ENTITY, A SPONSOR, OR A PARTNER. IF YOU ARE ENTERING INTO THIS AGREEMENT ON BEHALF OF AN ACADEMIC ENTITY, A SPONSOR, OR A PARTNER, YOU FURTHER REPRESENT AND WARRANT THAT YOU HAVE FULL LEGAL AUTHORITY TO BIND THEM TO THESE TERMS AND ACKNOWLEDGE THAT ALL REFERENCES TO “YOU” IN THE TERMS REFER TO THAT ENTITY. IF YOU ARE UNABLE TO MAKE THESE REPRESENTATIONS OR COMPLY WITH THE TERMS OF THIS AGREEMENT, DO NOT ACCESS OR USE THE WORK. **<ins>BY ACCESSING OR USING THE WORK, YOU AGREE TO BE BOUND BY THE TERMS OF THIS AGREEMENT</ins>.**
        
        1. **Definitions**.
           1. “**Academic Entity**” means a not-for-profit degree-granting academic institution approved by Vector. Please contact Vector at dataoffice@vectorinstitute.ai for approval.
        
           1. “**Academic Research Purposes**” shall mean use for non-commercial academic research performed by an Academic Entity.
        
           2. “**Partner**” means a legal entity that has entered into a Partnership Agreement, Collaboration Agreement, or FastLane Company Agreement with Vector.
        
           3. “**Sell**” means practicing any or all of the rights granted under this Agreement to provide to any third party, for a fee or other consideration (including, without limitation, fees for hosting or consulting/support services related to the Work), a product or service whose value derives, entirely or substantially, from the functionality of the Work.
        
           4. “**Sponsor**” means a legal entity that has entered into a Sponsorship Agreement with Vector.
        
           5. “**You**” or “**Your**” means the Academic Entity, Sponsor, or Partner entering into this Agreement.
        
           6. “**Your Product**” means a product or service You create (at least in part) by accessing or using the Work and/or that incorporates portions of the Work.
        
           7. “**Work**” means any work, whether or not subject to copyright protection, including, but not limited to, software code (whether in source code or object code form), frameworks, models, applications, documentation, or data licensed pursuant to this Agreement.
        
        2. **License Grant**. If You are an Academic Entity, Vector grants You a limited, royalty-free, non-exclusive, non-sublicensable, non-transferable license to use, copy, modify, merge, publish, and/or distribute the Work solely internally for Academic Research Purposes and subject to the restrictions herein. If You are a Sponsor or a Partner, Vector grants You a limited, royalty-free, non-exclusive, non-sublicensable, non-transferable license to use, copy, modify, merge, publish and/or distribute the Work, subject to the restrictions herein. For the avoidance of doubt and notwithstanding anything herein to the contrary, You are not granted any rights to, and are not permitted to, Sell the Work.
        
        3. **License Restrictions**. Any use of the Work by any person or entity that is not an Academic Entity, a Sponsor, or a Partner or other than in accordance with this Agreement is strictly prohibited. You may not:
        
           1. copy any feature, design, or graphic of the Work;
        
           2. reverse engineer, re-engineer, decompile or disassemble the Work or cause or allow discovery of the source code or underlying ideas or algorithms of the Work;
        
           3. use or access the Work in order to build a competitive product or service or to assist someone else to build a competitive product or service or in connection with any other work that has the same or similar functionality as the Work;
        
           4. use the Work for performance, benchmarking or comparison testing or analysis, or disclose to any third party or otherwise disseminate any results thereof (all of which shall be considered confidential information of Vector);
        
           5. remove, obscure or modify any copyright, trademark, or other proprietary or intellectual property rights notices in the Work;
        
           6. use the Work in a manner that violates any applicable law, ordinance, regulation, or administrative order, or applicable Vector policies; or
        
           7. in any way attempt to do, or assist anyone else with, any of the foregoing. To the extent permissible by law, You waive any rights that You may have to do any of the foregoing.
        
        4. **Additional License Restrictions for Sponsors and Partners**. If You are a Sponsor or a Partner:
        
           1. You may only publish and/or distribute the Work as part of Your Product and may not otherwise transfer, distribute, host, or allow any third party (other than third parties acting on your behalf and bound by written terms at least as protective of Vector as this Agreement) to access or use the Work; and
        
           2. You must ensure that Your customers access and use the Work solely as incorporated into Your Product and solely as necessary for such customers to access and use Your Product.
        
        5. **Term and Termination**. This Agreement shall immediately and automatically terminate upon Your breach of this Agreement. Upon any termination of this Agreement, You shall cease all use of the Work and delete all copies in Your possession or control. The provisions of this Agreement which by their express or implied terms extend beyond the termination of this Agreement shall continue in full force and effect notwithstanding the termination or expiration of this Agreement.<br></br>
        If You are an Academic Entity, Vector may terminate this Agreement for any or no reason, with immediate effect. <br></br>
        If You are a Sponsor or a Partner, Vector, acting reasonably, may terminate this Agreement upon one hundred and eighty (180) days notice to You. <br></br>
        If You are a Sponsor or a Partner, You may, upon expiration or termination of Your Partner or Sponsor relationship with Vector, continue to use only the version(s) of the Work that You obtained from Vector subject to this Agreement when You were a Sponsor or Partner unless and until this Agreement is terminated by Vector.
        
        6. **Ownership**. The Work is licensed to You, not sold. Vector retains all rights, title, and interest in and to the Work, including all copyrights, trade secrets, trademarks, patents, and other forms of proprietary and intellectual property rights therein and thereto, belong to Vector or its licensors. This Agreement does not convey to You any interest in or to any Work, but only a limited right of use revocable in accordance with the terms of this Agreement. No rights are granted to You hereunder other than as expressly set forth in this Agreement and Vector reserves all rights not expressly granted herein.
        
        7. **No Maintenance**. Vector has no obligation under this Agreement to provide any maintenance, support, training or other services for or related to the Work. Vector may cease making the Work available at any time.
        
        8. **Notice and Attribution / Publicity**. You must (a) retain all copyright, patent, trademark, attribution, and other notices that are present in the Work and (b) include the following attribution notice within a "Notice" text file distributed as a part of the Work: "This work is licensed under the Vector Institute License, Copyright © Vector Institute. All Rights Reserved."<br></br>
        For products or services built using the Work,  prominently display 'Built with Vector Institute [Name]’ (where You are to populate the [Name] field with the name and version, if applicable, of the relevant software, model, framework and/or dataset) on all related materials, documentation, blogs, press releases or other customary places where other such statements are typically provided to users.
        
        9. **Disclaimer of Warranty**. THE WORK IS PROVIDED “AS IS,” “WHERE IS,” “AS AVAILABLE,” “WITH ALL FAULTS” AND, TO THE FULLEST EXTENT PERMITTED BY LAW, WITHOUT WARRANTY OF ANY KIND. VECTOR AND ITS LICENSORS DISCLAIM ALL WARRANTIES WITH RESPECT TO THE WORK, WHETHER EXPRESS OR IMPLIED, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF, MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE, NON-INFRINGEMENT, TITLE, SATISFACTORY QUALITY OR ARISING FROM A COURSE OF DEALING, LAW, USAGE, OR TRADE PRACTICE, AND ANY WARRANTIES REGARDING THE SECURITY, QUIET ENJOYMENT, QUALITY OF INFORMATION, RELIABILITY, TIMELINESS, AND PERFORMANCE ARE HEREBY DISCLAIMED TO THE EXTENT ALLOWED BY APPLICABLE LAW. THIS DISCLAIMER OF WARRANTY CONSTITUTES AN ESSENTIAL PART OF THIS AGREEMENT. NO ACCESS TO OR USE OF ANY WORK IS AUTHORIZED UNDER THIS AGREEMENT EXCEPT UNDER THIS DISCLAIMER.
        
        10. **Limitation of Liability**. VECTOR SHALL HAVE NO LIABILITIES TO YOU WHATSOEVER ARISING IN CONNECTION WITH THIS AGREEMENT OR THE WORK. UNDER NO CIRCUMSTANCES AND UNDER NO LEGAL THEORY, WHETHER TORT (INCLUDING NEGLIGENCE), CONTRACT, OR OTHERWISE, SHALL VECTOR OR ITS LICENSORS BE LIABLE FOR ANY DAMAGES, INCLUDING, BUT NOT LIMITED TO, DIRECT, INDIRECT, CONSEQUENTIAL, INCIDENTAL, SPECIAL, PUNITIVE, OR EXEMPLARY DAMAGES, OR ANY LOSS OF REVENUE, PROFITS, SALES, DATA, DATA USE, GOODWILL OR REPUTATION, EVEN IF VECTOR HAS BEEN INFORMED OF THE POSSIBILITY OF SUCH DAMAGES, EVEN IF VECTOR HAS BEEN ADVISED OF THE POSSIBILITY OF SUCH DAMAGES AND REGARDLESS OF THE FORM OF ACTION (INCLUDING CONTRACT, NEGLIGENCE, TORT OR WARRANTY).
        
        11. **Indemnification**. You shall indemnify, hold harmless, and defend Vector and its officers, directors, licensors, and employees against any and all losses, damages, liabilities, deficiencies, claims, actions, judgments, settlements, interest, awards, penalties, fines, costs, or expenses of whatever kind, including reasonable legal fees and the costs of enforcing any right to indemnification hereunder arising out of Your access to or use of the Work. You are responsible for the acts and omissions of Your employees, contractors, customers, and other users in respect of the Work.
        
        12. **Export Controls**. The Work may include information that may not be received, exported, imported, used, transferred, distributed, accessed, or re-exported except in compliance with the applicable laws and regulations of the relevant government authorities, including U.S. and Canadian export control and sanction regulations, such as provided in the Canadian Sanctions webpage (https://www.international.gc.ca/world-monde/international_relations-relations_internationales/sanctions/index.aspx). You also represent and covenant not to directly or indirectly allow access to or use of the Work in embargoed or sanctioned countries or regions, by sanctioned or denied persons, or for prohibited end-uses under U.S. or Canadian law. You confirm that You are not named on any Canadian or U.S. government list of persons or entities prohibited from transacting with any Canadian or U.S. person; (b) You are not a national of, or a company registered in, any sanctioned jurisdiction; (c) You will not allow any third party to access or use the Work in violation of any Canadian, U.S. or other export embargoes, prohibitions or restrictions; and (d) You will comply with all laws regarding the transmission of data exported from the country in which You (or Your users) are located to Canada and the United States.
        
        13. **Governing Law**. This Agreement and all matters arising out of or relating to this Agreement, are governed by, and construed in accordance with, the laws of Ontario and the federal laws of Canada applicable therein without regard to the choice or conflict of law provisions thereof. Any action arising from or relating to this Agreement shall only be brought in a court of competent jurisdiction in Ontario, and the parties consent to the jurisdiction, venue, and convenience of such courts. This Agreement shall not be governed by the United Nations Convention on Contracts for the International Sale of Goods, the application of which is hereby expressly excluded.
        
        14. **Assignment**. Vector may assign its rights under this Agreement, in whole or part, to any party without Your consent. You may not assign this Agreement or any of the rights granted herein or delegate any of Your obligations to any party, whether by operation of law or otherwise, without the prior written consent of Vector. Any change in control or other transaction that results in a change in Your majority ownership shall be considered an assignment under this Agreement and shall require prior written consent of Vector. Any assignment in violation of this section shall be void. This Agreement is binding upon and enforceable by each party’s permitted successors and assignees.
        
        15. **No Partnership**. This Agreement shall not be interpreted or construed to create an association, joint venture, agency relationship, or partnership between You and Vector or to impose any partnership obligation or partnership liability upon You or Vector. Neither You nor Vector shall have any right, power or authority to enter into any agreement or undertaking for, or act on behalf of, or to act as or be an agent or representative, or to otherwise bind, each other.
        
        16. **Miscellaneous**.
        In the event any term or provision of this Agreement or any application thereof shall be deemed to be illegal, void, or unenforceable, then the same shall not affect the remaining portions of this Agreement or any other application of the same which are not determined to be illegal, void or unenforceable, which remaining provisions and any other such application shall survive and constitute the agreement of the parties.<br></br>
        Vector’s failure at any time to require performance of any provision of this Agreement or to exercise any right provided for herein shall not be deemed a waiver of such provision or such right. All waivers must be in writing. Unless the written waiver contains an express statement to the contrary, no waiver by Vector of any breach of any provision of this Agreement or of any right provided for herein shall be construed as a waiver of any continuing or succeeding breach of such provision, a waiver of the provision itself, or a waiver of any right under this Agreement.<br></br>
        This Agreement constitutes the entire agreement between the parties pertaining to the Work, and supersedes all prior oral and written negotiations, agreements or understandings between the parties with respect to the subject matter hereof. No modification of any provision of this Agreement shall be valid or binding unless made in writing and signed by an authorized officer of Vector. <br></br>
        Section headings are used herein for convenience only and shall not be deemed to affect the scope, meaning or intent of this Agreement or any provisions hereof. Whenever examples are used in this Agreement with the words “including” or “such as,” or any derivation thereof, such examples are intended to be illustrative and not in limitation thereof.
License-File: LICENSE.md
Classifier: Development Status :: 3 - Alpha
Classifier: Intended Audience :: Developers
Classifier: Intended Audience :: Science/Research
Classifier: License :: Other/Proprietary License
Classifier: Operating System :: OS Independent
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.10
Classifier: Programming Language :: Python :: 3.11
Classifier: Topic :: Scientific/Engineering :: Artificial Intelligence
Requires-Python: >=3.10
Requires-Dist: accelerate<=1.10.1,>=1.0.0
Requires-Dist: bitsandbytes==0.48.1
Requires-Dist: build>=1.3.0
Requires-Dist: cmake==4.1.0
Requires-Dist: datasets>=2.18.0
Requires-Dist: einops>=0.8.1
Requires-Dist: fastapi>=0.110.0
Requires-Dist: google-cloud-bigquery>=3.0.0
Requires-Dist: jsonlines>=4.0.0
Requires-Dist: ninja>=1.13.0
Requires-Dist: numpy>=1.24.0
Requires-Dist: openai>=1.0.0
Requires-Dist: pandas>=2.0.0
Requires-Dist: peft>=0.17.1
Requires-Dist: pillow>=12.2.0
Requires-Dist: pyarrow>=15.0.2
Requires-Dist: pydantic-settings>=2.11.0
Requires-Dist: pydantic>=2.0.0
Requires-Dist: python-dotenv==1.2.2
Requires-Dist: safetensors>=0.4.0
Requires-Dist: setuptools>=80.9.0
Requires-Dist: starlette>=1.0.1
Requires-Dist: tiktoken==0.7.0
Requires-Dist: torch<=2.6.0,>=2.0.0
Requires-Dist: torchaudio<=2.6.0,>=0.13.0
Requires-Dist: torchvision<=0.21.0,>=0.13.0
Requires-Dist: tornado>=6.5.5
Requires-Dist: tqdm>=4.67.3
Requires-Dist: transformers<6.0.0,>=4.35.0
Requires-Dist: uvicorn>=0.29.0
Requires-Dist: wheel>=0.45.1
Provides-Extra: train
Requires-Dist: flash-attn>=2.7.0; extra == 'train'
Requires-Dist: peft>=0.17.0; extra == 'train'
Requires-Dist: trl>=0.22.0; extra == 'train'
Requires-Dist: unsloth-zoo<2026.3.1,>=2025.10.10; extra == 'train'
Requires-Dist: unsloth[colab-new]; extra == 'train'
Description-Content-Type: text/markdown

# unbias-plus
[![website](https://img.shields.io/badge/website-ff00ff)](https://vectorinstitute.github.io/unbias-plus/)
[![code checks](https://github.com/VectorInstitute/unbias-plus/actions/workflows/code_checks.yml/badge.svg)](https://github.com/VectorInstitute/unbias-plus/actions/workflows/code_checks.yml)
[![unit tests](https://github.com/VectorInstitute/unbias-plus/actions/workflows/unit_tests.yml/badge.svg)](https://github.com/VectorInstitute/unbias-plus/actions/workflows/unit_tests.yml)
[![integration tests](https://github.com/VectorInstitute/unbias-plus/actions/workflows/integration_tests.yml/badge.svg)](https://github.com/VectorInstitute/unbias-plus/actions/workflows/integration_tests.yml)
[![docs](https://github.com/VectorInstitute/unbias-plus/actions/workflows/docs.yml/badge.svg)](https://github.com/VectorInstitute/unbias-plus/actions/workflows/docs.yml)
[![codecov](https://codecov.io/github/VectorInstitute/unbias-plus/graph/badge.svg?token=83MYFZ3UPA)](https://codecov.io/github/VectorInstitute/unbias-plus)
[![License: Vector Institute](https://img.shields.io/badge/License-Vector%20Institute-003049.svg)](./LICENSE.md)
[![Contact](https://img.shields.io/badge/Contact-shaina.raza%40vectorinstitute.ai-green)](mailto:shaina.raza@vectorinstitute.ai)

Bias detection and debiasing in text: identify biased segments, classify severity, get reasoning and neutral replacements per segment, and a full neutral rewrite. Structured output (binary label, severity, biased segments with offsets) via CLI, REST API, or Python.

## Overview

Input text → analysis → validated `BiasResult`: binary label (biased/unbiased), overall severity (1–5), `biased_segments` (original phrase, replacement, severity, bias type, reasoning, character offsets), and full `unbiased_text`. Entry points: CLI (`unbias-plus`), REST API (FastAPI + demo UI), or Python (`UnBiasPlus`).

**Project structure:**
```
unbias-plus/
├── src/unbias_plus/
│   ├── __init__.py      # UnBiasPlus, BiasResult, BiasedSegment, serve
│   ├── cli.py           # unbias-plus entry point (--text, --file, --serve)
│   ├── api.py           # FastAPI app, /health, /analyze, serve()
│   ├── pipeline.py      # UnBiasPlus: prompt → model → parse → result
│   ├── model.py         # UnBiasModel: load LM, generate(), 4-bit optional
│   ├── prompt.py        # build_prompt(text), system prompt
│   ├── parser.py        # parse_llm_output() → BiasResult
│   ├── schema.py        # BiasResult, BiasedSegment (Pydantic)
│   ├── formatter.py     # format_cli, format_dict, format_json
│   └── demo/            # bundled web UI (served at / when using --serve)
│       ├── static/      # script.js, style.css
│       └── templates/   # index.html
├── tests/
│   ├── conftest.py      # fixtures (sample_result, sample_json, …)
│   └── unbias_plus/     # test_api, test_pipeline, test_parser, …
├── pyproject.toml
└── README.md
```

## Features

- **Bias detection**: Identifies biased phrases in text and returns them as segments with character-level offsets for highlighting.
- **Classification**: Binary label (biased/unbiased), per-segment severity (low/medium/high), and bias type (e.g. loaded language, framing).
- **Reasoning**: Each segment includes an explanation of why it is considered biased.
- **Debiasing**: Per-segment neutral replacements and a full rewritten `unbiased_text`.
- **Structured output**: Pydantic-validated `BiasResult` with `binary_label`, `severity` (1–5), `biased_segments`, and `unbiased_text`.
- **Demo UI**: `--serve` launches a FastAPI server that also serves a visual web interface at `http://localhost:8000`.
- **CLI**: Analyze from command line with `--text`, `--file`, or start the API + UI with `--serve`. Optional 4-bit quantization and JSON output.
- **REST API**: FastAPI server with `/health` and `/analyze` (POST JSON `{"text": "..."}`). Model loaded at startup via lifespan.
- **Python API**: Use `UnBiasPlus` in code; call `analyze()`, `analyze_to_cli()`, `analyze_to_dict()`, or `analyze_to_json()`.

## Requirements

- Python ≥3.10, <3.12
- CUDA 12.4 recommended (PyTorch + CUDA deps in `pyproject.toml`). CPU is supported with `device="cpu"`.

## Installation

The project uses [uv](https://github.com/astral-sh/uv) for dependency management. Install uv, then from the project root:

```bash
uv sync
source .venv/bin/activate   # or .venv\Scripts\activate on Windows
```

For development (tests, linting, type checking):
```bash
uv sync --dev
source .venv/bin/activate
```

**Optional: flash-attn (GPU only)**
For training or faster inference with flash attention, install the `train` extra (requires CUDA/nvcc to build):
```bash
uv sync --extra train
# On HPC: load CUDA first, e.g. module load cuda/12.4.0
```
Default `uv sync` does **not** install flash-attn, so CI and CPU-only setups work without it.

## Usage

### Command line

```bash
# Analyze a string
unbias-plus --text "Women are too emotional to lead."

# Analyze a file, output JSON
unbias-plus --file article.txt --json

# Start API server + demo UI (default model, port 8000)
unbias-plus --serve
unbias-plus --serve --model path/to/model --port 8000
unbias-plus --serve --load-in-4bit   # reduce VRAM
```

Options: `--model`, `--load-in-4bit`, `--max-new-tokens`, `--host`, `--port`, `--json`.

### Test the model (CLI)

After `uv sync` (and optionally `uv sync --extra train` on a GPU machine), verify the pipeline with:

```bash
# Default install (no flash-attn); use a small model or --load-in-4bit on GPU
uv run unbias-plus --text "Women are too emotional to lead."

# With your own model path
uv run unbias-plus --text "Some biased sentence." --model path/to/your/model

# JSON output
uv run unbias-plus --text "Test." --json
```

Or in Python (same env):

```bash
uv run python -c "
from unbias_plus import UnBiasPlus
pipe = UnBiasPlus()  # or UnBiasPlus('your-model-id', load_in_4bit=True)
text = 'Women are too emotional to lead.'
print(pipe.analyze_to_cli(text))
"
```

### REST API + Demo UI

Start the server with `unbias-plus --serve` (or `serve()` in Python). This starts a single FastAPI server that:

- Serves the visual demo UI at **`http://localhost:8000/`**
- Exposes **`GET /health`** → `{"status": "ok", "model": "<model_name_or_path>"}`
- Exposes **`POST /analyze`** → Body: `{"text": "Your text here"}`. Returns JSON matching `BiasResult`.

Programmatic start:
```python
from unbias_plus import serve
serve("your-hf-model-id", port=8000, load_in_4bit=False)
```

> **Running on a remote server or HPC node:** If the server is running on a remote machine, use SSH port forwarding to access the UI in your browser:
> ```bash
> ssh -L 8000:localhost:8000 user@your-server.com
> # or through a login node to a compute node:
> ssh -L 8000:gpu-node-hostname:8000 user@login-node.com
> ```
> Then open `http://localhost:8000`. If port 8000 is already in use locally, use a different local port (e.g. `-L 8001:...`) and open `http://localhost:8001`.
>
> If you're using VS Code remote SSH, port forwarding is handled automatically via the **Ports** tab.

### Python API

```python
from unbias_plus import UnBiasPlus, BiasResult, BiasedSegment

pipe = UnBiasPlus("your-hf-model-id", load_in_4bit=False)
result = pipe.analyze("Women are too emotional to lead.")

print(result.binary_label)   # "biased" | "unbiased"
print(result.severity)       # 1–5
print(result.bias_found)     # bool

for seg in result.biased_segments:
    print(seg.original, seg.replacement, seg.severity, seg.bias_type, seg.reasoning)
    print(seg.start, seg.end)  # character offsets in original text

print(result.unbiased_text)  # full neutral rewrite

# Formatted outputs
cli_str  = pipe.analyze_to_cli("...")    # human-readable colored terminal output
d        = pipe.analyze_to_dict("...")   # plain dict
json_str = pipe.analyze_to_json("...")   # pretty-printed JSON string
```

## Training

The Qwen3-8B checkpoint shipped with the demo was fine-tuned in two stages
— SFT followed by GRPO post-training — on the
[vector-institute/Unbias-plus](https://huggingface.co/datasets/vector-institute/Unbias-plus)
dataset on HuggingFace.

Standalone scripts that reproduce both stages live in [`training/`](training/),
along with a sanity-check inference runner. They depend on the `[train]`
optional extra (`peft`, `trl`, `unsloth`, `flash-attn`) and require an A100
or comparable GPU.

See [`training/README.md`](training/README.md) for details, CLI invocations,
and resource sizing.

## Development

- **Tests**: `pytest` (see `pyproject.toml` for markers). Run from repo root: `uv run pytest tests/`.
- **Linting / formatting**: `ruff` (format + lint), config in `pyproject.toml`.
- **Type checking**: `mypy` with strict options, `mypy_path = ["src", "training"]`.


## 👥 Team



Developed by the **AI Engineering** team at the [Vector Institute](https://vectorinstitute.ai).

For research collaborations, partnerships, or technical inquiries, please contact **Shaina Raza, PhD** at shaina.raza@vectorinstitute.ai.


## Acknowledgement

Resources used in preparing this research are provided, in part, by the Province of Ontario, the Government of Canada through CIFAR, and companies sponsoring the Vector Institute.

This research is also supported by the European Union's Horizon Europe research and innovation programme under the AIXPERT project (Grant Agreement No. 101214389).



## License

Licensed under the **Vector Institute License**. Use is restricted to Academic Entities, Sponsors, and Partners of the Vector Institute; by accessing or using the work, you agree to be bound by the license terms. See [LICENSE.md](./LICENSE.md) in the repository.

## Support

* Open an issue on GitHub: [https://github.com/VectorInstitute/unbias-plus/issues](https://github.com/VectorInstitute/unbias-plus/issues)
