Metadata-Version: 2.4
Name: SearchLibrium
Version: 0.0.85
Summary: A Python package for econometric models driven by search
Author: Alexander Paz Prithvi Beeramole, Robert Burdett
Author-email: Zeke Ahern <z.ahern@qut.edu.au>
Project-URL: Homepage, https://github.com/zahern/HypothesisX
Keywords: econometric models,search,discrete choice,logit,probit
Classifier: Development Status :: 4 - Beta
Classifier: Intended Audience :: Science/Research
Classifier: License :: OSI Approved :: MIT License
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.10
Classifier: Programming Language :: Python :: 3.11
Classifier: Programming Language :: Python :: 3.12
Classifier: Topic :: Scientific/Engineering :: Mathematics
Requires-Python: >=3.10
Description-Content-Type: text/markdown
Requires-Dist: numpy>2.0.0
Requires-Dist: pandas>=2.0.0
Requires-Dist: scikit-learn>=1.3.1
Requires-Dist: statsmodels
Provides-Extra: dev
Requires-Dist: black; extra == "dev"
Requires-Dist: bumpver; extra == "dev"
Requires-Dist: isort; extra == "dev"
Requires-Dist: pip-tools; extra == "dev"
Requires-Dist: pytest; extra == "dev"

# SearchLibrium

[![PyPI version](https://img.shields.io/pypi/v/SearchLibrium.svg)](https://pypi.org/project/SearchLibrium/)
[![Python](https://img.shields.io/pypi/pyversions/SearchLibrium.svg)](https://pypi.org/project/SearchLibrium/)
[![License: MIT](https://img.shields.io/badge/License-MIT-yellow.svg)](LICENSE)
[![CI](https://github.com/zahern/HypothesisX/actions/workflows/ci.yml/badge.svg)](https://github.com/zahern/HypothesisX/actions/workflows/ci.yml)

**Automated discrete choice model search powered by Simulated Annealing, Harmony Search, and JAX-accelerated MLE.**

SearchLibrium searches over model specifications — which variables to include, whether parameters should be random, which transformations to apply, and which model class to use — and returns the best converged, all-significant model according to your chosen criterion (BIC, AIC, log-likelihood, MAE, or multi-objective combinations).

---

## Install

```bash
pip install SearchLibrium --upgrade
```

**Requirements:** Python ≥ 3.10, numpy ≥ 2.0, scipy ≥ 1.10, pandas ≥ 2.0, scikit-learn ≥ 1.3.1, statsmodels

### Install in Jupyter Notebook

```python
# Run in a notebook cell
import subprocess
import sys

subprocess.check_call([sys.executable, "-m", "pip", "install", "SearchLibrium", "--upgrade"])

# Then import
from SearchLibrium import Parameters, call_siman
print("✓ SearchLibrium installed and ready!")
```

---

## Quick start

```python
import numpy as np
import pandas as pd
from SearchLibrium import Parameters, call_siman

df = pd.read_csv("https://raw.githubusercontent.com/zahern/HypothesisX/refs/heads/main/data/Swissmetro_final.csv")
varnames   = ["TIME", "COST", "HEADWAY", "SEATS"]
choice_set = np.unique(df["alt"]).tolist()

params = Parameters(
    criterions   = [("bic", -1)],        # minimise BIC
    df           = df,
    varnames     = varnames,
    asvarnames   = varnames,
    isvarnames   = [],
    choice_set   = choice_set,
    choices      = df["CHOICE"].values,
    alt_var      = df["alt"].values,
    choice_id    = df["custom_id"].values,
    ind_id       = df["ID"].values,
    base_alt     = "SM",
    models       = ["multinomial", "mixed_logit"],
    allow_random = True,
    p_val        = 0.05,
)

best = call_siman(params, init_sol=None, id_num=1)
```

A **run dashboard** is printed automatically at the end of every search, showing BIC, log-likelihood, AIC, MAE, variables, model type, and (if multi-objective) the full Pareto archive.

---

## Example Notebooks

| Model | Notebook |
| ----- | -------- |
| Multinomial Logit — standalone fit + search | [notebooks/mnl_example.ipynb](src/SearchLibrium/notebooks/mnl_example.ipynb) |
| Mixed Logit — standalone fit + search | [notebooks/mixed_logit_example.ipynb](src/SearchLibrium/notebooks/mixed_logit_example.ipynb) |
| Random Regret Minimisation — standalone fit + search | [notebooks/rrm_example.ipynb](src/SearchLibrium/notebooks/rrm_example.ipynb) |
| Mixed Random Regret — standalone fit + search | [notebooks/mixed_rrm_example.ipynb](src/SearchLibrium/notebooks/mixed_rrm_example.ipynb) |
| Nested Logit — standalone fit + search | [notebooks/Data_Nest.ipynb](src/SearchLibrium/notebooks/Data_Nest.ipynb) |
| HPC Batch Jobs & PyPI Publishing | [notebooks/pbs_batch_jobs_guide.ipynb](src/SearchLibrium/notebooks/pbs_batch_jobs_guide.ipynb) |

---

## How the search works

The search uses **Simulated Annealing (SA)** to explore the space of model specifications:

```text
generate starting solution
  └─ for each SA temperature step
       └─ perturb current specification → guaranteed distinct from current
            ├─ fit model with JAX-accelerated MLE
            ├─ run backward elimination (remove insignificant vars, refit)
            ├─ accept if converged + Metropolis criterion satisfied
            └─ update best solution
print dashboard
```

**Key guarantees:**

- Only **converged** solutions are accepted
- Every accepted solution has **all variables statistically significant** (p < `p_val`, backward elimination)
- Each perturbation is guaranteed to produce a **genuinely different specification** — a distribution-only swap (e.g. normal → lognormal) without any structural change does not count

---

## Data format

Your dataframe must be in **long format** — one row per alternative per observation:

| obs_id | alt   | choice | TIME | COST | ... |
| ------ | ----- | ------ | ---- | ---- | --- |
| 1      | car   | 1      | 35   | 12   | ... |
| 1      | train | 0      | 60   | 8    | ... |
| 1      | bus   | 0      | 55   | 5    | ... |
| 2      | car   | 0      | 40   | 14   | ... |

---

## Model types

| Model name | Description | JAX MLE |
| ---------- | ----------- | ------- |
| `"multinomial"` | Multinomial Logit (MNL) | ✓ |
| `"mixed_logit"` | Mixed Logit with simulation-based integration | ✓ |
| `"random_regret"` | Random Regret Minimisation (RRM) | ✓ |
| `"mixed_random_regret"` | Mixed-RRM with random parameters | ✓ |
| `"nested_logit"` | Nested Logit (requires `nests=` and `lambdas=` kwargs) | ✓ |
| `"ordered_logit"` | Ordered Logit | ✓ |

---

## Search examples by model type

### Multinomial Logit

```python
params = Parameters(
    criterions = [("bic", -1)],
    df         = df,
    varnames   = ["TIME", "COST", "HEADWAY"],
    asvarnames = ["TIME", "COST", "HEADWAY"],
    isvarnames = [],
    choice_set = choice_set,
    choices    = df["CHOICE"].values,
    alt_var    = df["alt"].values,
    choice_id  = df["custom_id"].values,
    base_alt   = "SM",
    models     = ["multinomial"],
    p_val      = 0.05,
)
best = call_siman(params, init_sol=None, id_num=1)
```

### Mixed Logit (random parameters)

```python
params = Parameters(
    criterions   = [("bic", -1)],
    df           = df,
    varnames     = ["TIME", "COST", "HEADWAY"],
    asvarnames   = ["TIME", "COST", "HEADWAY"],
    isvarnames   = [],
    choice_set   = choice_set,
    choices      = df["CHOICE"].values,
    alt_var      = df["alt"].values,
    choice_id    = df["custom_id"].values,
    ind_id       = df["ID"].values,
    base_alt     = "SM",
    models       = ["mixed_logit"],
    allow_random = True,     # enable random parameters
    allow_bcvars = True,     # enable Box-Cox transformations
    n_draws      = 500,      # Halton draws for simulation
    p_val        = 0.05,
)
best = call_siman(params, init_sol=None, id_num=1)
```

### Random Regret Minimisation (RRM)

```python
params = Parameters(
    criterions = [("bic", -1)],
    df         = df,
    varnames   = ["TIME", "COST", "HEADWAY"],
    asvarnames = ["TIME", "COST", "HEADWAY"],
    isvarnames = [],
    choice_set = choice_set,
    choices    = df["CHOICE"].values,
    alt_var    = df["alt"].values,
    choice_id  = df["custom_id"].values,
    base_alt   = "SM",
    models     = ["random_regret"],
    p_val      = 0.05,
)
best = call_siman(params, init_sol=None, id_num=1)
```

### Mixed Random Regret (regret + heterogeneity)

```python
params = Parameters(
    criterions   = [("bic", -1)],
    df           = df,
    varnames     = ["TIME", "COST", "HEADWAY"],
    asvarnames   = ["TIME", "COST", "HEADWAY"],
    isvarnames   = [],
    choice_set   = choice_set,
    choices      = df["CHOICE"].values,
    alt_var      = df["alt"].values,
    choice_id    = df["custom_id"].values,
    ind_id       = df["ID"].values,
    base_alt     = "SM",
    models       = ["mixed_random_regret"],
    allow_random = True,
    n_draws      = 500,
    p_val        = 0.05,
)
best = call_siman(params, init_sol=None, id_num=1)
```

### Nested Logit

```python
nests   = {"PublicTransport": [0, 1], "Private": [2, 3]}
lambdas = {"PublicTransport": 0.8, "Private": 1.0}

params = Parameters(
    criterions = [("bic", -1)],
    df         = df,
    varnames   = ["TIME", "COST", "HEADWAY"],
    asvarnames = ["TIME", "COST", "HEADWAY"],
    choice_set = choice_set,
    choices    = df["CHOICE"].values,
    alt_var    = df["alt"].values,
    choice_id  = df["custom_id"].values,
    base_alt   = "SM",
    models     = ["nested_logit"],
    nests      = nests,
    lambdas    = lambdas,
    p_val      = 0.05,
)
best = call_siman(params, init_sol=None, id_num=1)
```

### Multi-objective search (BIC + MAE)

```python
params = Parameters(
    criterions   = [("bic", -1), ("mae", -1)],   # minimise both
    df           = df,
    df_test      = df_test,                        # required for MAE
    varnames     = varnames,
    asvarnames   = varnames,
    choice_set   = choice_set,
    choices      = df["CHOICE"].values,
    alt_var      = df["alt"].values,
    choice_id    = df["custom_id"].values,
    base_alt     = "SM",
    models       = ["multinomial", "mixed_logit"],
    allow_random = True,
)
best = call_siman(params, init_sol=None, id_num=1)
# Returns a Pareto-optimal solution; full archive is printed in the dashboard
```

---

## Key parameters

| Parameter | Type | Default | Description |
| --------- | ---- | ------- | ----------- |
| `criterions` | list of `(name, sign)` | required | Objectives: `"bic"`, `"aic"`, `"loglik"`, `"mae"`. Sign: `-1` = minimise, `+1` = maximise |
| `models` | list of str | all | Model classes to search over |
| `allow_random` | bool | `False` | Enable random parameters (required for mixed models) |
| `allow_bcvars` | bool | `False` | Enable Box-Cox variable transformations |
| `allow_corvars` | bool | `False` | Enable correlated random parameters |
| `p_val` | float | `0.05` | Significance threshold — variables with p > p_val are eliminated |
| `all_sig` | bool | `True` | Enforce all-significant via backward elimination at each evaluation |
| `n_draws` | int | `1000` | Halton draws for mixed model simulation |
| `maxiter` | int | `2000` | Maximum MLE iterations per model evaluation |

### Random parameter distributions

| Code | Distribution |
| ---- | ------------ |
| `"n"` | Normal |
| `"ln"` | Log-normal |
| `"t"` | Triangular |
| `"tn"` | Truncated normal |
| `"u"` | Uniform |

### SA control parameters

Pass `ctrl=(tI, tF, max_temp_steps, max_iter)` to `call_siman`:

```python
best = call_siman(params, ctrl=(500, 0.001, 100, 20), id_num=1)
```

| Parameter | Description |
| --------- | ----------- |
| `tI` | Initial temperature — higher = more exploration early on |
| `tF` | Final temperature — lower = more exploitation at the end |
| `max_temp_steps` | Number of cooling steps |
| `max_iter` | Iterations evaluated at each temperature step |

---

## Standalone model fitting (no search)

```python
from SearchLibrium import MultinomialLogit, MixedLogit, RandomRegret, MixedRandomRegret

# MNL
mnl = MultinomialLogit()
mnl.setup(X, y, varnames=varnames, alts=alts, ids=ids)
mnl.fit()
mnl.summarise()

# Mixed Logit
mxl = MixedLogit()
mxl.setup(X, y, varnames=varnames, alts=alts, ids=ids, panels=panels,
          randvars={"TIME": "n", "COST": "ln"}, n_draws=500)
mxl.fit()
mxl.summarise()

# RRM
rrm = RandomRegret(df=df, short=False)
rrm.fit()
rrm.report()

# Mixed RRM
mrrm = MixedRandomRegret(df=df)
mrrm.fit()
```

---

## Interpreting the dashboard

After every `call_siman` run a dashboard is printed:

```text
╔══════════════════════════════════════════════════════╗
║           SEARCHLIBRIUM — RUN DASHBOARD              ║
╠══════════════════════════════════════════════════════╣
║  Model type   : mixed_logit                          ║
║  Variables    : TIME, COST, HEADWAY                  ║
║  Random params: TIME~n, COST~ln                      ║
╠══════════════════════════════════════════════════════╣
║  Log-likelihood : -312.45                            ║
║  AIC            :  634.90                            ║
║  BIC            :  658.22   ◄ best                   ║
║  MAE            :  0.1843                            ║
╠══════════════════════════════════════════════════════╣
║  Evaluations : 247   Converged : 198   Accepted : 43 ║
╚══════════════════════════════════════════════════════╝
```

- **Lower BIC / AIC** = better fit-complexity tradeoff
- All retained variables are **statistically significant** (p < `p_val`)
- **Random parameters** indicate heterogeneity in that attribute's taste
- **RRM** models suit contexts where regret-avoidance drives choice behaviour
- For multi-objective runs the full Pareto archive is shown with one row per non-dominated solution

---

## Bundled datasets

```python
import SearchLibrium as sl
sl.main.preview_dataset()   # prints head of each dataset
```

| Name | Description |
| ---- | ----------- |
| `electricity` | Stated-preference electricity plan choice |
| `travel_mode` | Mode choice: air / train / bus / car |
| `swiss_metro` | Swiss Metro SP study (SM / train / car) |

---

## CLI

```bash
python -m SearchLibrium --info              # print package guide
python -m SearchLibrium --preview_datasets  # preview bundled datasets
python -m SearchLibrium --test_search       # run MNL/MXL search on travel_mode
python -m SearchLibrium --test_search_nest  # run nested logit search
```

---

## Search algorithms

Both algorithms share a **consistent interface** through `call_search`:

```python
from SearchLibrium import call_search, estimate_ctrl

# Auto-estimate hyperparameters from problem size (recommended)
best = call_search(params)                            # SA by default
best = call_search(params, algorithm='hs')            # Harmony Search

# Manual hyperparameters
best = call_search(params, ctrl=(1000, 0.001, 100, 20))           # SA
best = call_search(params, algorithm='hs',
                   ctrl=(20, 500, 0.9, 0.6, 0.85, 0.3))          # HS

# Inspect auto-estimated ctrl before running
ctrl = estimate_ctrl(params, algorithm='sa')
print(ctrl)
```

### Simulated Annealing (`call_siman` / `algorithm='sa'`)

| Parameter | Meaning |
| --------- | ------- |
| `tI` | Initial temperature — higher → more exploration |
| `tF` | Final temperature — lower → more exploitation |
| `max_temp_steps` | Number of cooling steps |
| `max_iter` | Evaluations per cooling step |

```python
best = call_siman(params, ctrl=(1000, 0.001, 100, 20), id_num=1)
```

### Harmony Search (`call_harmony` / `algorithm='hs'`)

| Parameter | Meaning |
| --------- | ------- |
| `max_mem` | Harmony memory size (population) |
| `maxiter` | Improvisation iterations |
| `max_harm` | Max harmony consideration rate |
| `min_harm` | Min harmony consideration rate |
| `max_pitch` | Max pitch adjustment rate |
| `min_pitch` | Min pitch adjustment rate |

```python
best = call_harmony(params, ctrl=(20, 400, 0.9, 0.6, 0.85, 0.3), id_num=1)
```

### Auto hyperparameter estimation

If `ctrl` is omitted, the library estimates appropriate defaults from the
problem complexity (`n_vars × n_alts × n_models`, doubled for random params):

```python
from SearchLibrium import estimate_ctrl
ctrl_sa = estimate_ctrl(params, algorithm='sa')
ctrl_hs = estimate_ctrl(params, algorithm='hs')
print('SA ctrl:', ctrl_sa)
print('HS ctrl:', ctrl_hs)
```

Complexity buckets:

| Complexity | SA tI | SA steps | SA iter/step | HS mem | HS iters |
| ---------- | ----- | -------- | ------------ | ------ | -------- |
| < 50 | 500 | 50 | 10 | 10 | 100 |
| 50–200 | 1 000 | 100 | 15 | 15 | 300 |
| 200–600 | 2 000 | 150 | 20 | 20 | 500 |
| > 600 | 5 000 | 250 | 30 | 25 | 800 |



## License

MIT — see [LICENSE](LICENSE) for details.

## Citation

If you use SearchLibrium in academic work, please cite the repository:

```text
Ahern, Z. (2025). SearchLibrium: Automated discrete choice model search.
https://github.com/zahern/HypothesisX
```
