Metadata-Version: 2.4
Name: bbbnuke
Version: 0.9.0
Summary: BBB-Nuke: Blood-brain barrier penetration screening pipeline
Author-email: Temi Sobodu <temisobodu@gmail.com>
License: MIT
Classifier: Development Status :: 4 - Beta
Classifier: Intended Audience :: Science/Research
Classifier: License :: OSI Approved :: MIT License
Classifier: Programming Language :: Python :: 3
Classifier: Topic :: Scientific/Engineering :: Bio-Informatics
Classifier: Topic :: Scientific/Engineering :: Chemistry
Requires-Python: >=3.9
Requires-Dist: huggingface-hub>=0.20
Requires-Dist: joblib>=1.2
Requires-Dist: numpy>=1.24
Requires-Dist: openpyxl>=3.0
Requires-Dist: pandas>=2.0
Requires-Dist: pydantic>=2.0
Requires-Dist: rdkit
Requires-Dist: scikit-learn>=1.0
Requires-Dist: torch-geometric>=2.0
Requires-Dist: torch-scatter
Requires-Dist: torch-sparse
Requires-Dist: torch>=2.0
Provides-Extra: api
Requires-Dist: aiosqlite>=0.20; extra == 'api'
Requires-Dist: alembic>=1.13; extra == 'api'
Requires-Dist: arq>=0.26; extra == 'api'
Requires-Dist: asyncpg>=0.29; extra == 'api'
Requires-Dist: fastapi>=0.110; extra == 'api'
Requires-Dist: httpx>=0.25; extra == 'api'
Requires-Dist: redis>=5.0; extra == 'api'
Requires-Dist: sqlalchemy[asyncio]>=2.0; extra == 'api'
Requires-Dist: uvicorn[standard]>=0.25; extra == 'api'
Provides-Extra: api-test
Requires-Dist: aiosqlite>=0.20; extra == 'api-test'
Requires-Dist: arq>=0.26; extra == 'api-test'
Requires-Dist: fastapi>=0.110; extra == 'api-test'
Requires-Dist: httpx>=0.25; extra == 'api-test'
Requires-Dist: pytest-asyncio>=0.23; extra == 'api-test'
Requires-Dist: pytest>=7.0; extra == 'api-test'
Requires-Dist: redis>=5.0; extra == 'api-test'
Requires-Dist: sqlalchemy[asyncio]>=2.0; extra == 'api-test'
Requires-Dist: uvicorn[standard]>=0.25; extra == 'api-test'
Provides-Extra: dev
Requires-Dist: pytest>=7.0; extra == 'dev'
Requires-Dist: ruff>=0.4; extra == 'dev'
Description-Content-Type: text/markdown

# BBB-Nuke

<img width="960" height="540" alt="image" src="https://github.com/user-attachments/assets/42bfaee2-aa67-4675-9423-803386c1dc6c" />


Blood-brain barrier penetration screening pipeline. Scores small molecules for BBB permeability using physicochemical properties, CNS-MPO desirability scoring, protein-ligand affinity, and a heuristic confidence layer.

## Quick Start

### Install

```bash
pip install -e ".[api]"
```

### Run the API

```bash
uvicorn bbnuke.api.app:app --host 127.0.0.1 --port 8000 --reload
```

Open **http://127.0.0.1:8000/docs** for the interactive Swagger UI.

---

## API Tutorial

### Step 1: Check the server is running

```bash
curl http://127.0.0.1:8000/v1/health
```

**Expected response:**

```json
{"status": "ok"}
```

### Step 2: Check the pipeline version

```bash
curl http://127.0.0.1:8000/v1/version
```

**Expected response:**

```json
{
  "pipeline_version": "0.7.0",
  "api_version": "1.0.0",
  "hyperparameters": {
    "logistic_k": 0.1352,
    "efflux_veto_threshold": 0.7,
    "mpo_gain_coeff": 2.0,
    "mpo_filter_cutoff": 3.0
  }
}
```

### Step 3: Score a compound (CPU-only)

Score caffeine without affinity data:

```bash
curl -X POST http://127.0.0.1:8000/v1/score \
  -H "Content-Type: application/json" \
  -d '{"smiles": "Cn1c(=O)c2c(ncn2C)n(C)c1=O"}'
```

**Expected response (key fields):**

```json
{
  "result": {
    "compound_id": "query",
    "smiles_standardized": "Cn1c(=O)c2c(ncn2C)n(C)c1=O",
    "properties": {
      "mw": 194.194,
      "logp": -1.0293,
      "tpsa": 61.82,
      "hbd": 0,
      "hba": 6
    },
    "cns_mpo": {
      "score": 6.0,
      "passed_filter": true
    },
    "affinity": null,
    "heuristic": {
      "p_bbb": 0.692365,
      "diagnostics": {
        "reason": "MPO-only (no affinity scores)",
        "veto": false,
        "score_bind": 0.0,
        "score_mpo": 6.0,
        "score_total": 6.0
      }
    },
    "passed_mpo_filter": true
  },
  "pipeline_version": "0.7.0"
}
```

**What this means:**
- Caffeine scores a perfect 6.0 on CNS-MPO (excellent drug-likeness for CNS)
- Without affinity data, P_BBB is derived from MPO alone: `P_BBB = 1 / (1 + exp(-0.1352 * 6.0))` = 0.69
- `affinity: null` because no pre-computed PSICHIC data is loaded

### Step 4: Score with affinity data

To include PSICHIC affinity scoring, set `include_affinity: true`. This requires pre-computed data files (see [Data Setup](#data-setup) below).

```bash
curl -X POST http://127.0.0.1:8000/v1/score \
  -H "Content-Type: application/json" \
  -d '{
    "smiles": "CCC1(c2ccccc2)C(=O)NC(=O)N(C(=O)c2ccccc2)C1=O",
    "include_affinity": true
  }'
```

**Expected response with affinity (key fields):**

```json
{
  "result": {
    "compound_id": "query",
    "cns_mpo": {
      "score": 5.6234,
      "passed_filter": true
    },
    "affinity": {
      "model": "psichic",
      "scores": [
        {"protein_id": "4F2_HUMAN", "score_raw": 6.42, "score_norm": 0.642},
        {"protein_id": "ACHE_HUMAN", "score_raw": 7.90, "score_norm": 0.790},
        "... 65 proteins total ..."
      ]
    },
    "heuristic": {
      "p_bbb": 0.670261,
      "diagnostics": {
        "reason": "OK",
        "veto": false,
        "score_bind": 0.0,
        "score_mpo": 5.2468,
        "score_total": 5.2468,
        "contributions": ["... per-protein breakdown ..."]
      }
    }
  }
}
```

**What this means:**
- Each of the 65 BBB proteins gets a binding probability (0-1) from PSICHIC
- The heuristic weighs carrier proteins positively, enzyme proteins negatively
- Efflux proteins (P-gp, BCRP, MRPs) can **veto** the compound if binding exceeds their threshold
- `S_bind` sums weighted protein contributions; `S_mpo` is the MPO gain above baseline
- `P_BBB = 1 / (1 + exp(-k * (S_bind + S_mpo)))`

### Step 5: Score a compound that fails MPO filter

```bash
curl -X POST http://127.0.0.1:8000/v1/score \
  -H "Content-Type: application/json" \
  -d '{"smiles": "O=C(O)CCCCCCCCCCCCCCCCC"}'
```

**Expected response:**

```json
{
  "result": {
    "cns_mpo": {
      "score": 1.873,
      "passed_filter": false
    },
    "heuristic": null,
    "passed_mpo_filter": false
  }
}
```

**What this means:**
- Stearic acid has poor CNS drug-likeness (high MW, high LogP, no HBD)
- CNS-MPO < 3.0, so it's filtered out before affinity/heuristic scoring
- `heuristic: null` — no P_BBB is computed for compounds that fail the MPO gate

### Step 6: Override pipeline configuration

```bash
curl -X POST http://127.0.0.1:8000/v1/score \
  -H "Content-Type: application/json" \
  -d '{
    "smiles": "c1ccc2c(c1)c1ccccc1[nH]2",
    "config": {
      "mpo_cutoff": 5.0,
      "logistic_k": 0.2
    }
  }'
```

You can override any pipeline parameter: `mpo_cutoff`, `logistic_k`, `efflux_veto_threshold`, `mpo_gain_coeff`.

### Step 7: List all BBB target proteins

```bash
curl http://127.0.0.1:8000/v1/proteins
```

Returns all 65 proteins with their categories (carrier/efflux/enzyme), weights, and binding thresholds.

---

## Data Setup

### Pre-computed affinity data

The API supports pre-computed PSICHIC affinity lookup when `include_affinity: true`. To enable this, place these files in the repo root:

| File | Description |
|------|-------------|
| `psichic_affinity_matrix.csv` | 599 compounds x 65 proteins, normalized scores (0-1) |
| `bbb_plus_pka.csv` | pKa predictions (acid/base lists) for SMILES-to-compound mapping |

**Format of `psichic_affinity_matrix.csv`:**
```
Compound_ID,4F2_HUMAN,5NTD_HUMAN,ABCG2_HUMAN,...
caffeine,0.431,0.489,0.497,...
```

**Format of `bbb_plus_pka.csv`:**
```
compound_name,SMILES,acid_pkas,base_pkas
caffeine,Cn1c(=O)c2c(ncn2C)n(C)c1=O,,
```

The API matches incoming SMILES by canonicalizing via RDKit and looking up against the pre-computed set. If a compound is not found, scoring proceeds without affinity data (MPO-only).

### Running your own PSICHIC screening

To generate affinity data for new compounds, see the `modules/affinity.py` module which wraps the PSICHIC subprocess.

---

## Pipeline

```
Input SMILES
  -> Standardize (RDKit: canonicalize, neutralize, strip salts)
  -> Properties (MW, LogP, TPSA, HBD, HBA)
  -> pKa (MolGpKa predictions or placeholder)
  -> CNS-MPO (6 desirability subscores, 0-6 total)
  -> MPO Gate (score >= 3.0 to proceed)
  -> Affinity (PSICHIC: 65 BBB proteins, 0-1 binding probability)
  -> Heuristic (weighted binding + efflux veto + logistic)
  -> P_BBB output (0-1)
```

## Endpoints

| Method | Path | Description |
|--------|------|-------------|
| GET | `/v1/health` | Liveness probe |
| GET | `/v1/version` | Pipeline version + hyperparameters |
| POST | `/v1/score` | Score a single compound |
| GET | `/v1/proteins` | List all 65 BBB target proteins |
| POST | `/v1/batch` | Submit batch job (requires Redis + ARQ) |
| GET | `/v1/batch/{job_id}` | Poll batch progress |

## Tests

```bash
pip install -e ".[api]"
pytest tests/ -v
```

## License

Proprietary — ATTN Lab.
