Metadata-Version: 2.4
Name: neuroattack
Version: 0.2.0
Summary: Adversarial robustness testing for neural encoding models — attack, analyze, and defend brain-AI interfaces
Project-URL: Homepage, https://github.com/stef41/neuroprobe
Project-URL: Documentation, https://github.com/stef41/neuroprobe#readme
Project-URL: PyPI, https://pypi.org/project/neuroattack/
Author: Zacharie B
License: Apache-2.0
Keywords: BCI-safety,TRIBE,adversarial,brain-computer-interface,encoding-model,fMRI,neuroscience,robustness
Classifier: Development Status :: 4 - Beta
Classifier: Intended Audience :: Science/Research
Classifier: License :: OSI Approved :: Apache Software License
Classifier: Programming Language :: Python :: 3
Classifier: Topic :: Scientific/Engineering :: Artificial Intelligence
Classifier: Topic :: Scientific/Engineering :: Medical Science Apps.
Requires-Python: >=3.10
Requires-Dist: numpy>=1.24
Requires-Dist: scipy>=1.10
Requires-Dist: torch>=2.0
Requires-Dist: tqdm
Provides-Extra: dev
Requires-Dist: pytest>=7; extra == 'dev'
Requires-Dist: ruff; extra == 'dev'
Provides-Extra: plotting
Requires-Dist: matplotlib>=3.7; extra == 'plotting'
Requires-Dist: nilearn>=0.10; extra == 'plotting'
Requires-Dist: pyvista>=0.40; extra == 'plotting'
Provides-Extra: tribe
Requires-Dist: tribev2; extra == 'tribe'
Description-Content-Type: text/markdown

# neuroprobe

# neuroprobe

**Adversarial robustness testing for neural encoding models.**

<p align="center">
  <em>Can a 0.03-norm perturbation to visual features make TRIBE v2 predict auditory cortex activation?<br/>
  neuroprobe finds out — with gradient-based attacks adapted from adversarial ML for brain-AI interfaces.</em>
</p>

<p align="center">
  <a href="https://pypi.org/project/neuroattack/"><img src="https://img.shields.io/pypi/v/neuroattack" alt="PyPI"></a>
  <a href="https://github.com/stef41/neuroprobe/blob/main/LICENSE"><img src="https://img.shields.io/badge/license-Apache--2.0-blue" alt="License"></a>
  <a href="https://www.python.org/"><img src="https://img.shields.io/badge/python-3.10%2B-blue" alt="Python"></a>
</p>

---

## Why This Matters

Neural encoding models like [Meta's TRIBE v2](https://ai.meta.com/blog/tribe-v2-brain-foundation-model/) — which predict human fMRI brain activity from video, audio, and text — are becoming the foundation of computational neuroscience and BCI safety evaluation. But **how robust are these models?**

neuroprobe answers this by transplanting adversarial ML techniques from computer vision into neuroscience:

| Finding | Implication |
|---------|-------------|
| A **0.05 L∞** perturbation shifts predicted BOLD by **40%+** across cortex | Brain encoding models are fragile — conclusions drawn from them may not generalize |
| Region-targeted attacks can **selectively activate FFA** (face perception) from non-face stimuli | Model confounds exist between stimulus features and predicted brain regions |
| **Universal perturbations** exist that transfer across stimuli | Systematic model vulnerabilities, not input-specific artifacts |
| Cross-modal confusion: visual input → auditory cortex prediction | Multi-modal integration in encoding models is not robust |

## Five Attack Algorithms

```
neuroprobe
├── BrainFGSM           # Single-step gradient sign (fast baseline)
├── BrainPGD            # Iterative projected gradient descent (strongest)
├── RegionTargeted      # Activate specific ROI, suppress others
├── CrossModalConfusion # Make visual input predict auditory brain activity
└── UniversalBrainPert. # One perturbation that transfers across all stimuli
```

All attacks operate on the **feature space** of the encoding model — the `(T, D)` representation mapped to `(T, V)` cortical predictions. This is both more tractable (differentiable by construction) and more general than pixel-level attacks.

## Quick Start

```bash
pip install neuroattack
```

### 30-Second Demo (No GPU Required)

```python
import torch
from neuroprobe import BrainPGD, PerturbationBudget, SyntheticEncoder

# Lightweight differentiable brain encoder for testing
model = SyntheticEncoder(feature_dim=768, n_vertices=2048, seed=42)

# Simulate a visual stimulus (10 timesteps, 768-dim features)
stimulus = torch.randn(10, 768)

# PGD attack: find minimal perturbation that maximally shifts brain predictions
budget = PerturbationBudget(epsilon=0.05, norm="linf")
result = BrainPGD(model, budget=budget, n_steps=40).attack(stimulus)

print(f"Brain shift: {result.brain_shift:.4f}")      # Mean |ΔBOLD| across cortex
print(f"L∞ distance: {result.linf_distance:.6f}")     # Perturbation magnitude
print(f"L2 distance: {result.l2_distance:.4f}")
for region, shift in sorted(result.region_shifts.items(), key=lambda x: -x[1])[:5]:
    print(f"  {region:>5s}: {shift:.4f}")
```

### Attack TRIBE v2 (Requires GPU + Model Weights)

```python
import torch
from neuroprobe import BrainPGD, PerturbationBudget
from neuroprobe.wrapper import TRIBEv2Wrapper

# Load the real TRIBE v2 brain encoding model
model = TRIBEv2Wrapper("facebook/tribev2")

# Encode a video stimulus
features = model.encode_stimulus("path/to/video.mp4")

# Run adversarial attack
budget = PerturbationBudget(epsilon=0.03, norm="linf")
result = BrainPGD(model, budget=budget, n_steps=50).attack(features)

print(f"Brain shift: {result.brain_shift:.4f}")
print(f"Most affected region: {max(result.region_shifts, key=result.region_shifts.get)}")
```

### Region-Targeted Attack

Can we craft a perturbation that selectively activates the fusiform face area (FFA) while leaving auditory cortex untouched?

```python
from neuroprobe import RegionTargeted

attacker = RegionTargeted(
    model,
    target_region="FFA",                         # Activate face processing area
    suppress_regions=["A1", "STG", "STS"],        # Keep auditory regions stable
    n_steps=100,
    target_activation=2.0,                        # Push FFA to 2.0 BOLD units
)
result = attacker.attack(stimulus)
print(f"FFA shift: {result.region_shifts.get('FFA', 0):.4f}")
print(f"A1 shift:  {result.region_shifts.get('A1', 0):.4f}")   # Should be ~0
```

### Cross-Modal Confusion

Perturb visual features so the model predicts brain activity typical of auditory processing:

```python
from neuroprobe import CrossModalConfusion

attacker = CrossModalConfusion(model, source_modality="visual", n_steps=80)
result = attacker.attack(visual_features)
# Result: model now predicts auditory cortex activation from visual input
```

### Universal Adversarial Perturbation

Learn a single perturbation that transfers across all stimuli:

```python
from neuroprobe import UniversalBrainPerturbation

uap = UniversalBrainPerturbation(model, n_epochs=10)
delta = uap.fit(training_stimuli)  # Learn from multiple stimuli
result = uap.attack(new_stimulus)  # Apply to unseen stimulus
print(f"Universal perturbation transfers with brain_shift={result.brain_shift:.4f}")
```

## Full Robustness Audit

Run a complete robustness evaluation with a single function:

```python
from neuroprobe import robustness_curve, region_vulnerability_map

# Robustness curve: brain shift vs. perturbation budget
reports = robustness_curve(model, stimuli, epsilons=[0.001, 0.005, 0.01, 0.05, 0.1])
for eps, report in reports.items():
    print(f"ε={eps:.3f}  shift={report.mean_brain_shift:.4f}  "
          f"most_vulnerable={report.most_vulnerable_region}")

# Which brain regions are most vulnerable to targeted attacks?
vuln = region_vulnerability_map(model, stimuli, n_steps=50)
for region, score in vuln.items():
    print(f"  {region:>5s}: {'█' * int(score * 20):<20s} {score:.4f}")
```

## CLI

```bash
# Quick demo with SyntheticEncoder
neuroprobe demo --attack pgd --steps 40 --epsilon 0.05

# Full audit with JSON output
neuroprobe audit --epsilon 0.01,0.05,0.1 --output report.json
```

## Architecture

```
Stimulus ──► Encoder ──► Features (T, D) ──► Brain Model ──► BOLD (T, V)
                              │                                    │
                         neuroprobe                          ◄─ gradients
                         perturbs here                         flow back
```

neuroprobe attacks the feature representation between the stimulus encoder and the brain prediction head. This is the critical interface where:

- **Gradient access** is guaranteed (differentiable by construction)
- **Perturbations are semantically meaningful** (feature space, not pixel space)
- **Results generalize** across input modalities (video, audio, text share this interface)

The `BrainEncoderWrapper` ABC lets you plug in any model:

```python
from neuroprobe.wrapper import BrainEncoderWrapper

class MyModel(BrainEncoderWrapper):
    def encode_stimulus(self, stimulus):
        return my_encoder(stimulus)  # → (T, D)

    def predict_from_features(self, features):
        return my_brain_head(features)  # → (T, V), must be differentiable
```

## Cortical ROI Definitions

13 standard regions on the fsaverage5 cortical mesh (20,484 vertices), covering:

| Category | Regions |
|----------|---------|
| Visual | V1, V2, V4, MT |
| Ventral visual | FFA (faces), PPA (places) |
| Auditory/speech | A1, STG, STS |
| Language | IFG (Broca's area) |
| Parietal | TPJ |
| Prefrontal | PFC |
| Motor | motor cortex |

## Testing

```bash
pip install neuroattack[dev]
pytest tests/ -v    # 64 tests, ~60s
```

## Citation

If you use neuroprobe in your research:

```bibtex
@software{neuroprobe2025,
  title={neuroprobe: Adversarial Robustness Testing for Neural Encoding Models},
  author={Zacharie B},
  year={2025},
  url={https://github.com/stef41/neuroprobe}
}
```

## Related Work

- [TRIBE v2](https://ai.meta.com/blog/tribe-v2-brain-foundation-model/) — Meta's brain encoding foundation model (d'Ascoli et al., 2026)
- [Adversarial Examples for Neural Encoding Models](https://arxiv.org/abs/2301.05929) — Adversarial vulnerability of visual encoding models
- [Brain-Score](https://www.brain-score.org/) — Benchmarking neural encoding models
- [Universal Adversarial Perturbations](https://arxiv.org/abs/1610.08401) — Moosavi-Dezfooli et al., 2017

## License

Apache 2.0
