Metadata-Version: 2.4
Name: pycasher
Version: 0.3.0
Summary: Cache function results and side effects (stdout, stderr, file writes) with automatic file I/O discovery via strace or audit hooks
License-Expression: MIT
License-File: LICENSE
Classifier: License :: OSI Approved :: MIT License
Classifier: Operating System :: POSIX :: Linux
Classifier: Programming Language :: Python :: 3
Classifier: Topic :: Software Development :: Libraries :: Python Modules
Requires-Python: >=3.12
Requires-Dist: loguru
Provides-Extra: pandas
Requires-Dist: pandas; extra == 'pandas'
Requires-Dist: pyarrow; extra == 'pandas'
Provides-Extra: polars
Requires-Dist: polars; extra == 'polars'
Description-Content-Type: text/markdown

# pycasher

Cache Python function results **and their side effects** — stdout, stderr, and filesystem writes — with automatic invalidation.

```bash
pip install pycasher
```

## What makes it different

Most caching libraries cache return values. casher also captures and replays:

- **stdout/stderr** printed during execution
- **Files written** by the function (restored from cache on hit)
- **Files read** by the function (used as cache keys — change an input file, cache auto-invalidates)

No manual file declarations needed. casher discovers file I/O automatically via `strace` (subprocess mode) or Python audit hooks (in-process mode).

## Usage

```python
from casher import cached

@cached
def train(data_path: str, output_path: str, lr: float = 0.01) -> dict:
    df = read_csv(data_path)
    model = fit(df, lr=lr)
    save(model, output_path)
    return {"accuracy": model.score}

# First call — runs function, traces file I/O, caches everything
result = train("train.csv", "model.pkl")

# Second call — instant replay from cache (model.pkl restored too)
result = train("train.csv", "model.pkl")

# Change train.csv — casher detects it, re-runs automatically
```

Cache any shell command without code changes:

```bash
acache -- python train.py --data train.csv
```

## Key features

- **Automatic file tracking**: strace (kernel-level, catches C extensions) or audit hooks (zero overhead, Python-only)
- **Dependency invalidation**: changes to imported `.py` files invalidate the cache
- **LRU eviction**: configurable via `max_cache_bytes` or `CASHER_MAX_CACHE_BYTES` env var (default 32 GB)
- **DataFrame support**: polars and pandas DataFrames serialized via Arrow IPC
- **Environment-aware**: include env vars in cache key with `env_vars=["MY_VAR"]`
- **Structured logging**: loguru INFO for every call — cache dir, hit/miss, mode, eviction

## Configuration

| Env var                  | Default               | Description                        |
| ------------------------ | --------------------- | ---------------------------------- |
| `CASHER_CACHE_DIR`       | `~/.cache/casher`     | Cache storage directory            |
| `CASHER_MAX_CACHE_BYTES` | `34359738368` (32 GB) | Max cache size before LRU eviction |

Or set programmatically (takes priority over env vars):

```python
from casher import configure, get_config

configure(cache_dir="/data/my_cache", max_cache_bytes=10 * 1024**3)
print(get_config())  # effective config
```

## Platform support

Full caching on **Linux** only (requires strace for subprocess mode, fcntl for locking). On macOS and Windows the decorator is a transparent pass-through — functions execute normally, caching is skipped with a one-time warning.

## Documentation

See [documentation/](documentation/) for detailed docs:

- [Introduction](documentation/00-Introduction.md) — architecture, limitations, in-process vs subprocess comparison
- [Quick Start](documentation/10-Quick-Start.md) — installation, decorator options, CLI usage
- [API Reference](documentation/20-API-Reference.md) — full API surface

## License

MIT
