Metadata-Version: 2.4
Name: shapley-value
Version: 0.0.9
Summary: A comprehensive Python package for calculating Shapley values in cooperative game theory
Home-page: https://github.com/Bowenislandsong/shapley-value
Author: Bowen Song
Author-email: Bowen Song <bowenson@example.com>
Maintainer: Bowen Song
Maintainer-email: Bowen Song <bowenson@example.com>
License: MIT
Project-URL: Homepage, https://github.com/Bowenislandsong/shapley-value
Project-URL: Source, https://github.com/Bowenislandsong/shapley-value
Project-URL: Bug Reports, https://github.com/Bowenislandsong/shapley-value/issues
Project-URL: Documentation, https://github.com/Bowenislandsong/shapley-value#readme
Project-URL: Examples, https://github.com/Bowenislandsong/shapley-value/tree/main/examples
Project-URL: Changelog, https://github.com/Bowenislandsong/shapley-value/blob/main/CHANGELOG.md
Project-URL: Author, https://bowenislandsong.github.io/#/personal
Keywords: shapley,shapley-value,game-theory,cooperative-games,fair-allocation,coalition,marginal-contribution,economics,mathematics,optimization,machine-learning,feature-importance,explainable-ai,xai
Platform: any
Classifier: Development Status :: 5 - Production/Stable
Classifier: Intended Audience :: Developers
Classifier: Intended Audience :: Science/Research
Classifier: Intended Audience :: Education
Classifier: Intended Audience :: Financial and Insurance Industry
Classifier: Topic :: Scientific/Engineering :: Mathematics
Classifier: Topic :: Scientific/Engineering :: Artificial Intelligence
Classifier: Topic :: Software Development :: Libraries :: Python Modules
Classifier: Topic :: Office/Business :: Financial
Classifier: License :: OSI Approved :: MIT License
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.8
Classifier: Programming Language :: Python :: 3.9
Classifier: Programming Language :: Python :: 3.10
Classifier: Programming Language :: Python :: 3.11
Classifier: Programming Language :: Python :: 3.12
Classifier: Operating System :: OS Independent
Classifier: Environment :: Console
Classifier: Environment :: Web Environment
Classifier: Natural Language :: English
Requires-Python: >=3.7
Description-Content-Type: text/markdown
License-File: LICENSE
Requires-Dist: joblib>=1.0.0
Provides-Extra: dev
Requires-Dist: pytest>=6.0; extra == "dev"
Requires-Dist: pytest-cov; extra == "dev"
Requires-Dist: black; extra == "dev"
Requires-Dist: flake8; extra == "dev"
Requires-Dist: mypy; extra == "dev"
Requires-Dist: pre-commit; extra == "dev"
Provides-Extra: examples
Requires-Dist: pandas>=1.0.0; extra == "examples"
Requires-Dist: numpy>=1.18.0; extra == "examples"
Provides-Extra: performance
Requires-Dist: pandas>=1.0.0; extra == "performance"
Requires-Dist: psutil; extra == "performance"
Provides-Extra: all
Requires-Dist: shapley-value[dev,examples,performance]; extra == "all"
Dynamic: author
Dynamic: home-page
Dynamic: license-file
Dynamic: maintainer
Dynamic: platform
Dynamic: requires-python

# Shapley Value Calculator

[![PyPI Downloads](https://static.pepy.tech/personalized-badge/shapley-value?period=total&units=INTERNATIONAL_SYSTEM&left_color=BRIGHTGREEN&right_color=GREEN&left_text=downloads)](https://pepy.tech/projects/shapley-value)
[![License: MIT](https://img.shields.io/badge/License-MIT-yellow.svg)](https://opensource.org/licenses/MIT)
[![Python 3.8+](https://img.shields.io/badge/python-3.8+-blue.svg)](https://www.python.org/downloads/)
[![Project Page](https://img.shields.io/badge/Project%20Page-Live-blue)](https://bowenislandsong.github.io/shapley-value/)

A comprehensive Python package for calculating Shapley values in cooperative game theory. Shapley values provide a mathematically principled method for fairly distributing payoffs among players based on their marginal contributions to coalitions.

## 📖 Table of Contents

- [Overview](#overview)
- [Installation](#installation)
- [Quick Start](#quick-start)
- [Usage Examples](#usage-examples)
- [API Reference](#api-reference)
- [Features](#features)
- [Performance](#performance)
- [Testing](#testing)
- [Contributing](#contributing)
- [License](#license)

## 🎯 Overview

The Shapley value is a solution concept from cooperative game theory that assigns a unique distribution (among the players) of a total surplus generated by the coalition of all players. This package provides four approaches:

| Class | When to use | Scales to |
|---|---|---|
| `ShapleyCombinations` | Pre-defined coalition values (exact) | ~15 players |
| `ShapleyValue` | Pre-defined coalition values, alt algorithm | ~15 players |
| `ShapleyValueCalculator` | Dynamic evaluation function (exact, parallel) | ~20 players |
| `MonteCarloShapleyValue` | Dynamic evaluation function (approximate, parallel) | 100+ players |

## 🚀 Installation

```bash
pip install shapley-value
```

## ⚡ Quick Start

### Exact calculation from pre-defined coalition values

```python
from shapley_value import ShapleyCombinations

players = ['Alice', 'Bob', 'Charlie']
coalition_values = {
    (): 0,
    ('Alice',): 10,  ('Bob',): 20,  ('Charlie',): 30,
    ('Alice', 'Bob'): 50,  ('Alice', 'Charlie'): 60,  ('Bob', 'Charlie'): 70,
    ('Alice', 'Bob', 'Charlie'): 100,
}

calculator = ShapleyCombinations(players)
shapley_values = calculator.calculate_shapley_values(coalition_values)
# {'Alice': 16.67, 'Bob': 33.33, 'Charlie': 50.0}
```

### Monte Carlo approximation for large games

```python
from shapley_value import MonteCarloShapleyValue

def my_model(coalition):
    """Any callable – ML model, simulation, or analytic function."""
    return sum(coalition) ** 0.9 if coalition else 0.0

mc = MonteCarloShapleyValue(
    my_model,
    players=list(range(50)),   # 50 players – exact is infeasible
    num_samples=5000,
    random_seed=42,
    n_jobs=-1,                 # use all CPU cores
)
values = mc.calculate_shapley_values()   # Dict[player, float]
```

## 📋 Usage Examples

### Method 1 – Pre-defined coalition values (`ShapleyCombinations`)

Use this when every coalition value is already known:

```python
from shapley_value import ShapleyCombinations

players = ['Player1', 'Player2', 'Player3']
coalition_values = {
    (): 0,
    ('Player1',): 100,  ('Player2',): 200,  ('Player3',): 300,
    ('Player1', 'Player2'): 450,
    ('Player1', 'Player3'): 500,
    ('Player2', 'Player3'): 600,
    ('Player1', 'Player2', 'Player3'): 900,
}

calculator = ShapleyCombinations(players)
shapley_values = calculator.calculate_shapley_values(coalition_values)
print(shapley_values)
```

### Method 2 – Exact evaluation function (`ShapleyValueCalculator`)

Use this when a callable computes the coalition value and you have ≤ ~20 players:

```python
from shapley_value import ShapleyValueCalculator

def profit_function(coalition):
    if not coalition:
        return 0
    return sum(coalition) + len(coalition) * 10  # synergy bonus

players = [100, 200, 300]

calculator = ShapleyValueCalculator(profit_function, players, n_jobs=-1)
shapley_values = calculator.calculate_shapley_values()
print(shapley_values)

raw_data = calculator.get_raw_data()           # detailed per-coalition data
calculator.save_raw_data('analysis.csv')       # export to CSV
```

`n_jobs` follows the **scikit-learn convention**: `1` = sequential, `-1` = all cores, `k` = exactly k cores.

### Method 3 – Monte Carlo approximation (`MonteCarloShapleyValue`)

Use this for **large games** (20+ players) where exact computation is infeasible.
Complexity is O(m × n) where m = `num_samples` and n = number of players,
versus O(2ⁿ) for exact methods.

```python
from shapley_value import MonteCarloShapleyValue

def complex_model(coalition):
    """Expensive callable – e.g. a trained ML model."""
    if not coalition:
        return 0.0
    return float(sum(p ** 1.2 for p in coalition))

players = list(range(1, 51))  # 50 players

mc = MonteCarloShapleyValue(
    complex_model,
    players=players,
    num_samples=5000,   # more samples → lower error
    random_seed=0,      # set for reproducibility
    n_jobs=-1,          # parallelise across all CPU cores
)

# Core result
values = mc.calculate_shapley_values()

# Diagnose how many samples you need
convergence_df = mc.get_convergence_data()  # DataFrame shape (5000, 50)

# Inspect per-permutation marginal contributions
raw_df = mc.get_raw_data()  # columns: iteration, permutation, player, marginal_contribution
```

#### Choosing `num_samples`

| Players | Suggested `num_samples` | Typical error |
|---------|------------------------|---------------|
| ≤ 10    | 1 000                  | < 0.5 %       |
| 10–30   | 5 000                  | < 1 %         |
| 30–100  | 10 000–50 000          | 1–3 %         |

Use `get_convergence_data()` to plot running estimates and verify convergence
for your specific game.

#### `n_jobs` quick reference

```python
# Sequential (default – no overhead, good for small games / debugging)
mc = MonteCarloShapleyValue(f, players, n_jobs=1)

# All available CPU cores
mc = MonteCarloShapleyValue(f, players, n_jobs=-1)

# Exactly 4 cores
mc = MonteCarloShapleyValue(f, players, n_jobs=4)
```

> **Note:** Permutations are generated in a fixed order before the parallel
> step, so `random_seed` guarantees bit-identical results regardless of `n_jobs`.

## 📚 API Reference

### `ShapleyCombinations`

```python
class ShapleyCombinations:
    def __init__(self, players: List[Any])
    def calculate_shapley_values(
        self, coalition_values: Dict[Tuple, float]
    ) -> Dict[Any, float]
    def get_all_coalitions(self) -> List[Tuple]
```

### `ShapleyValueCalculator`

```python
class ShapleyValueCalculator:
    def __init__(
        self,
        evaluation_function: Callable[[List[Any]], float],
        players: List[Any],
        n_jobs: int = 1,   # sklearn-style: 1=seq (default), -1=all cores, k=k cores
    )
    def calculate_shapley_values(self) -> Dict[Any, float]
    def get_raw_data(self) -> pd.DataFrame
    def save_raw_data(self, file_path: str) -> None
```

### `MonteCarloShapleyValue`

```python
class MonteCarloShapleyValue:
    def __init__(
        self,
        evaluation_function: Callable[[List[Any]], float],
        players: List[Any],
        num_samples: int = 1000,     # permutations to sample
        random_seed: Optional[int] = None,
        n_jobs: int = 1,             # sklearn-style parallelism
    )
    def calculate_shapley_values(self) -> Dict[Any, float]
    def get_convergence_data(self) -> pd.DataFrame   # running estimates per iteration
    def get_raw_data(self) -> pd.DataFrame           # per-permutation detail
```

### `ShapleyValue`

Low-level calculator using the weighted-marginal-contribution formula directly.

```python
class ShapleyValue:
    def __init__(
        self,
        players: List[Any],
        coalition_values: Dict[Tuple[Any, ...], float],
    )
    def calculate_shapley_values(self) -> Dict[Any, float]
```

## ✨ Features

- **Four calculation methods** – exact (pre-defined values, evaluation function) and
  Monte Carlo approximation for large games
- **sklearn-style `n_jobs`** on both `ShapleyValueCalculator` and `MonteCarloShapleyValue`:
  `1` = sequential, `-1` = all cores, `k` = exactly k cores
- **Reproducible parallel results** – `random_seed` in `MonteCarloShapleyValue` gives
  bit-identical output regardless of `n_jobs`
- **Convergence diagnostics** – `get_convergence_data()` lets you plot running
  estimates and tune `num_samples` for your accuracy target
- **Data export** – `get_raw_data()` and `save_raw_data()` for downstream analysis
- **Type flexibility** – any hashable player type (strings, ints, tuples, …)
- **Comprehensive test suite** – 53 tests covering correctness, reproducibility,
  parallelism, edge cases, and stress / performance

## ⚡ Performance

### Exact methods (`ShapleyCombinations` / `ShapleyValueCalculator`)

Exact computation enumerates all 2ⁿ coalitions and becomes impractical beyond
~20 players.

| Players | Coalitions | Sequential | Parallel (`n_jobs=-1`) | Speedup |
|---------|-----------|------------|------------------------|---------|
| 5       | 32        | < 0.001 s  | < 0.001 s              | 1×      |
| 10      | 1 024     | ~ 6.6 s    | ~ 1.1 s                | ~ 6×    |
| 12      | 4 096     | ~ 31 s     | ~ 2.6 s                | ~ 12×   |
| 15      | 32 768    | ~ 4 min    | ~ 20 s                 | ~ 12×   |

### Monte Carlo (`MonteCarloShapleyValue`)

O(m × n) cost – scales to 100+ players. Parallelism is beneficial when the
evaluation function is expensive (e.g. ML model inference, simulations).

| Players | `num_samples` | `n_jobs=1` | `n_jobs=-1` | Notes |
|---------|--------------|------------|-------------|-------|
| 20      | 5 000        | < 1 s      | < 1 s       | cheap eval function |
| 50      | 5 000        | < 1 s      | < 1 s       | cheap eval function |
| 50      | 10 000       | ~ 2 s      | ~ 0.7 s     | expensive eval (~1 ms/call) |
| 100     | 10 000       | ~ 5 s      | ~ 1.5 s     | expensive eval (~1 ms/call) |

> **Rule of thumb:** if your evaluation function is cheap (< 10 µs), use
> `n_jobs=1` to avoid joblib process-spawn overhead. If it's expensive
> (ML model, simulation), set `n_jobs=-1` for maximum throughput.

## 🧪 Testing

```bash
# Install dev dependencies
pip install -e ".[dev,examples,performance]"

# Run the full test suite (53 tests)
python -m pytest tests/ -v

# Run only stress / performance tests (with printed benchmark table)
python -m pytest tests/test_montecarlo_stress.py -v -s
```

The suite is organised into:

| File | Tests | Coverage |
|------|-------|----------|
| `test_calculator.py` | 11 | `ShapleyValue`, `ShapleyCombinations`, `ShapleyValueCalculator` |
| `test_montecarlo.py` | 29 | Correctness, reproducibility, parallelism, edge cases |
| `test_montecarlo_stress.py` | 13 | 20–100 player stress, timing bounds, benchmark table |

## 🤝 Contributing

We welcome contributions!

1. **Fork** the repository
2. **Create** a feature branch (`git checkout -b feature/amazing-feature`)
3. **Make** your changes and add tests
4. **Commit** your changes (`git commit -m 'Add amazing feature'`)
5. **Push** to the branch (`git push origin feature/amazing-feature`)
6. **Open** a Pull Request

### Development Setup

```bash
git clone https://github.com/Bowenislandsong/shapley-value.git
cd shapley-value
pip install -e ".[dev,examples,performance]"
python -m pytest tests/
```

## 📄 License

This project is licensed under the MIT License – see the [LICENSE](LICENSE) file for details.

## 📝 Citation

If you use this package in your research or project, please cite it as:

```bibtex
@software{song2026shapley,
  author = {Song, Bowen},
  title = {Shapley Value Calculator},
  year = {2026},
  publisher = {GitHub},
  url = {https://github.com/Bowenislandsong/shapley-value},
  version = {0.0.9}
}
```

**APA Format:**
```
Song, B. (2026). Shapley Value Calculator (Version 0.0.9) [Computer software]. https://github.com/Bowenislandsong/shapley-value
```

For more citation formats see [CITATION.cff](CITATION.cff).

## 🏷️ Version History

- **0.0.9**: PyPI publish on `v*` tag push; citation / version metadata
- **0.0.8**: `ShapleyValueCalculator` uses `n_jobs` (sklearn-style); doc and landing-page sync; example runner includes Monte Carlo example
- **0.0.7**: `MonteCarloShapleyValue` with sklearn-style `n_jobs`; comprehensive stress tests; convergence and raw-data diagnostics
- **0.0.6**: Citation information for academic use
- **0.0.5**: CI improvements and Python 3.13 support
- **0.0.4**: Comprehensive examples, parallel processing, performance optimizations
- **0.0.3**: Enhanced parallel processing
- **0.0.2**: Function-based evaluation and data export
- **0.0.1**: Initial release

## 👥 Authors

- **Bowen Song** – [Profile](https://bowenislandsong.github.io/#/personal)

## 🔗 Links

- [PyPI Package](https://pypi.org/project/shapley-value/)
- [GitHub Repository](https://github.com/Bowenislandsong/shapley-value)
- [Examples](examples/)

---

*For more information about Shapley values and cooperative game theory, see the [Wikipedia article](https://en.wikipedia.org/wiki/Shapley_value).*

