Metadata-Version: 2.4
Name: sm-rs
Version: 0.1.0
Summary: SM-RS benchmark: data loaders and canonical task evaluators for the single- and multi-objective recommendations dataset.
Project-URL: Homepage, https://github.com/pdokoupil/SM-RS
Project-URL: Repository, https://github.com/pdokoupil/SM-RS
Project-URL: Dataset, https://huggingface.co/datasets/pdokoupil/SM-RS
Project-URL: Paper (TORS 2026), https://doi.org/10.1145/3754459
Project-URL: Paper (SIGIR 2024), https://doi.org/10.1145/3626772.3657863
Author: Patrik Dokoupil
License: MIT
License-File: LICENSE
Keywords: benchmark,beyond-accuracy,dataset,impressions,multi-objective,propensity,recommender-systems
Classifier: Development Status :: 3 - Alpha
Classifier: Intended Audience :: Science/Research
Classifier: License :: OSI Approved :: MIT License
Classifier: Programming Language :: Python :: 3
Classifier: Topic :: Scientific/Engineering :: Artificial Intelligence
Requires-Python: >=3.9
Requires-Dist: huggingface-hub>=0.20
Requires-Dist: numpy>=1.21
Requires-Dist: pandas>=1.3
Requires-Dist: pyarrow>=12
Requires-Dist: scikit-learn>=1.0
Provides-Extra: dev
Requires-Dist: pytest>=7; extra == 'dev'
Provides-Extra: lstm
Requires-Dist: tensorflow>=2.10; extra == 'lstm'
Description-Content-Type: text/markdown

<div align="center">

# SM-RS

**The single- and multi-objective recommendations benchmark** — self-declared user
propensities (relevance · diversity · novelty · exploration) linked to contextual
impressions, item selections, and perceived quality.

<!-- Badges light up once the repo is pushed and the package is published. -->
[![PyPI](https://img.shields.io/pypi/v/sm-rs.svg)](https://pypi.org/project/sm-rs/)
[![CI](https://github.com/pdokoupil/SM-RS/actions/workflows/ci.yml/badge.svg)](https://github.com/pdokoupil/SM-RS/actions/workflows/ci.yml)
[![License: MIT](https://img.shields.io/badge/License-MIT-blue.svg)](LICENSE)
[![Dataset](https://img.shields.io/badge/🤗-dataset-yellow.svg)](https://huggingface.co/datasets/pdokoupil/SM-RS)

</div>

SM-RS is, to our knowledge, the only public recommender-systems dataset linking
users' **self-declared propensities** toward beyond-accuracy objectives with
**contextual impressions**, item selections, and **explicit perceived-quality**
judgments. This repository is the **benchmark code**: data loaders and the
*canonical* evaluator for each task, so everyone reports comparable numbers. The
**data** lives on Hugging Face; the **leaderboard** lives in the dataset card.

> **Dataset & leaderboard:** 🤗 [pdokoupil/SM-RS](https://huggingface.co/datasets/pdokoupil/SM-RS)
> **Cite BOTH** the SM-RS 2.0 (TORS'26) and SM-RS (SIGIR'24) papers — see [Citing](#citing).

## Install

```bash
pip install sm-rs                 # core (numpy / pandas / scikit-learn)
pip install "sm-rs[lstm]"         # + TensorFlow, for the optional LSTM baseline
```

## Quick start (Task 1: propensity estimation)

```python
import numpy as np
from smrs.tasks import task1_propensity as t1

# reproducible 80/20 split (seed 2024), reading a local SM-RS copy for now
X_train, X_test, y_train, y_test = t1.split(data_dir="path/to/sm-rs")

# ... train your estimator, produce an (N, 4) array of [rel, div, nov, exp] ...
predictions = np.full((len(y_test), 4), 0.25)        # placeholder

print(t1.evaluate(predictions, y_test))   # {'MAE': ..., 'MSE': ..., 'KLDiv': ...}
```

## The six tasks

All six draw on the **same** dataset; they differ in what you predict and how it's
scored. The canonical evaluator for each lives in `smrs.tasks.*`.

| # | Task | You produce | Metric(s) | Evaluator |
|---|------|-------------|-----------|-----------|
| 1 | **Propensity estimation** | a 4-vector propensity per user | MAE · MSE · KL | ✅ `task1_propensity` |
| 2 | **Results proportionality** | a top-k list matching target propensities | MAE · KL · wSUM · Pearson ρ | ✅ `task2_proportionality`¹ |
| 3 | **Selections-aware reranking** | a reranked impression list | nDCG@10 · Precision@5 | ✅ `task3_reranking` |
| 4 | **Diversity-metric definition** | per-list diversity values | MAE · MSE · KL | ✅ `task4_diversity` |
| 5 | **Perceived quality** (5.1 rel / 5.2 div / 5.3 nov / 5.4 ser) | per-objective perception | MAE · MSE · Kendall τ | ✅ `task5_perceived` |
| 6 | **Satisfaction** (6.1 / 6.2) | overall satisfaction | MAE · MSE · Kendall τ | ✅ `task6_satisfaction` |

¹ Task 2's *metric layer* is implemented; turning a top-k list into achieved
objective proportions needs the derived matrices (see Data) and lands with the
source→rating-matrix builder.

## Data

Two layers:

1. **Core tables** (the collected study data, **CC-BY**, hosted on Hugging Face):
   `behaviors`, `propensities`, `objective_perceptions`, `criteria_values`,
   `comparative_diversity`, `users`, `movies`, `books`. Items are referenced by ID.

   **Auto-downloaded** from the Hub and cached — no manual download:

   ```python
   from smrs import data
   df = data.load("propensities")                 # downloads from HF, cached
   # offline / local copy (e.g. OSF download):
   df = data.load("propensities", data_dir="path/to/sm-rs")   # or set $SMRS_DATA_DIR
   ```

   Users of the 🤗 `datasets` library can equivalently do
   `load_dataset("pdokoupil/SM-RS", "behaviors")`.

2. **Derived matrices** — *recomputed locally, not downloaded.* The list-scoring
   tasks (2, 3) need per-item / per-pair artifacts (relevance via item-item,
   intra-list diversity via a distance matrix, novelty via popularity). Rather than
   ship multi-GB blobs — or redistribute the third-party catalogs they come from —
   the benchmark **recomputes** them from a rating matrix you build from your own
   download of the public source datasets (movies: **MovieLens 25M** — the "Latest"
   snapshot at collection time — plus **MovieLens Tag Genome 2021**; books:
   **goodbooks-10k**):

   ```python
   from smrs import derived
   art = derived.build_artifacts(rating_matrix_movies)   # {item_item, distance_matrix, mean_popularities}
   ```

   `derived` provides the deterministic pieces: `popularity`, `cosine_distance`
   (1 − cosine over item rating-vectors), and `ease_item_item` (EASE^R closed form,
   used as the relevance model). This keeps the benchmark lightweight and
   license-clean. The **bit-exact original artifacts are archived on OSF**
   ([v2](https://osf.io/wsakx)) for strict reproduction.

   Get the source datasets with the bundled fetcher (downloads from the official
   hosts — GroupLens, the goodbooks repo — under their licenses; or place them
   there manually):

   ```bash
   smrs-fetch --list                      # show sources + licenses, no download
   smrs-fetch --dest ./sm-rs-sources      # download MovieLens 25M, Tag Genome 2021, goodbooks-10k
   ```

> **Why recompute?** MovieLens may not be redistributed, and a 5 GB download hurts
> adoption. You obtain MovieLens/goodbooks under their own licenses; we ship only
> the study data, the id-maps (`movies.json`: movieId→imdbId; `books.json`:
> book_index→goodreads_id), and the recompute code. The canonical scorer is pinned
> (sources above; **positive feedback = rating ≥ 3**; EASE λ) so results stay
> comparable. For strict reproduction of the paper's numbers, use the OSF artifacts.

## Reproduction check

`examples/reproduce_perceived.py` reproduces the paper's **Linear Regression**
baseline for Tasks 5 & 6 (the one baseline needing no derived/source data), scored
with this package's evaluators — MAE matches the paper exactly:

| subtask | MAE (ours / paper) | MSE | Kendall τ |
|---|---|---|---|
| 5.1 relevance | 0.235 / 0.235 | 0.086 / 0.085 | 0.076 / 0.080 |
| 5.2 diversity | 0.222 / 0.222 | 0.080 / 0.061 | 0.197 / 0.196 |
| 5.3 novelty | 0.259 / 0.259 | 0.104 / 0.104 | 0.143 / 0.143 |
| 5.4 serendipity | 0.270 / 0.270 | 0.104 / 0.103 | 0.036 / 0.039 |
| 6 satisfaction | 0.255 / 0.255 | 0.102 / 0.102 | 0.039 / 0.045 |

```bash
SMRS_DATA_DIR=/path/to/sm-rs python examples/reproduce_perceived.py
```

## Submitting to the leaderboard

Self-service (no submission server): run the canonical `evaluate()` for a task,
then open a PR adding your row to the leaderboard in the
[dataset card](https://huggingface.co/datasets/pdokoupil/SM-RS), with a link to
reproduce. The shipped baselines are the rows to beat.

## Citing

Please cite **both** papers (GitHub's "Cite this repository" reads
[`CITATION.cff`](CITATION.cff)):

```bibtex
@article{dokoupil2026smrs2,
  author  = {Dokoupil, Patrik and Peska, Ladislav},
  title   = {SM-RS 2.0: User-perceived Qualities of Single- and Multi-Objective Recommender Systems},
  journal = {ACM Transactions on Recommender Systems},
  volume  = {4}, number = {3}, year = {2026},
  doi     = {10.1145/3754459}
}
@inproceedings{dokoupil2024smrs,
  author    = {Dokoupil, Patrik and Peska, Ladislav and Boratto, Ludovico},
  title     = {SM-RS: Single- and Multi-Objective Recommendations with Contextual Impressions and Beyond-Accuracy Propensity Scores},
  booktitle = {Proceedings of the 47th International ACM SIGIR Conference on Research and Development in Information Retrieval},
  series    = {SIGIR '24}, pages = {988--995}, year = {2024},
  doi       = {10.1145/3626772.3657863}
}
```

## License

Code: MIT (see [`LICENSE`](LICENSE)). Data: CC-BY-4.0 (see the dataset card).
