Metadata-Version: 2.4
Name: pycasher
Version: 0.5.7
Summary: Cache function results and side effects (stdout, stderr, file writes) with automatic file I/O discovery via strace or audit hooks
License-Expression: MIT
License-File: LICENSE
Classifier: License :: OSI Approved :: MIT License
Classifier: Operating System :: POSIX :: Linux
Classifier: Programming Language :: Python :: 3
Classifier: Topic :: Software Development :: Libraries :: Python Modules
Requires-Python: >=3.12
Requires-Dist: loguru
Requires-Dist: lz4
Provides-Extra: pandas
Requires-Dist: pandas; extra == 'pandas'
Requires-Dist: pyarrow; extra == 'pandas'
Provides-Extra: polars
Requires-Dist: polars; extra == 'polars'
Description-Content-Type: text/markdown

# pycasher

Cache Python function results **and their side effects** — stdout, stderr, and filesystem writes — with automatic invalidation.

```bash
uv add pycasher
```

If you want the `casher` CLI outside a project environment, install it as a
tool instead:

```bash
uv tool install pycasher
```

## What makes it different

Most caching libraries cache return values. casher also captures and replays:

- **stdout/stderr** printed during execution
- **Files written** by the function (restored from cache on hit)
- **Files read** by the function (used as cache keys — change an input file, cache auto-invalidates)

No manual file declarations needed. casher discovers file I/O automatically via `strace` (subprocess mode) or Python audit hooks (in-process mode).

## Usage

```python
from casher import cached, expand_input_dir

@cached
def train(data_path: str, output_path: str, lr: float = 0.01) -> dict:
    df = read_csv(data_path)
    model = fit(df, lr=lr)
    save(model, output_path)
    return {"accuracy": model.score}

# First call — runs function, traces file I/O, caches everything
result = train("train.csv", "model.pkl")

# Second call — instant replay from cache (model.pkl restored too)
result = train("train.csv", "model.pkl")

# Change train.csv — casher detects it, re-runs automatically
```

For directory-shaped inputs, keep the argument semantics explicit instead of
making every directory `Path` recursive by magic:

```python
from pathlib import Path

from casher import cached, expand_input_dir


@cached(input_files=lambda data_dir: expand_input_dir(data_dir, "*.csv"))
def build_dataset(data_dir: Path) -> int:
    return len(list(data_dir.glob("*.csv")))
```

`Path` arguments that point to files are hashed by file content for the
function-argument portion of the cache key. Auto-discovered input files remain
path-sensitive and content-sensitive.

Cache any shell command without code changes:

```bash
casher -- python train.py --data train.csv
```

## Key features

- **Automatic file tracking**: strace (kernel-level, catches C extensions) or audit hooks (zero overhead, Python-only)
- **Dependency invalidation**: changes to imported `.py` files invalidate the cache
- **File-hash memoization**: unchanged files reuse cached content hashes from a small SQLite metadata store
- **LRU eviction**: configurable via `max_cache_bytes` or `CASHER_MAX_CACHE_BYTES` env var (default 32 GB)
- **DataFrame support**: polars and pandas DataFrames serialized via Arrow IPC
- **Environment-aware**: include env vars in cache key with `env_vars=["MY_VAR"]`
- **Structured logging**: loguru INFO for config changes, enablement, hit/miss, mode, eviction
- **Explicit directory expansion helper**: `expand_input_dir()` for stable `input_files` lists

## Configuration

| Env var                  | Default               | Description                                                        |
| ------------------------ | --------------------- | ------------------------------------------------------------------ |
| `CASHER_CACHE_DIR`       | _unset_               | Cache storage directory. Caching stays disabled until this is set. |
| `CASHER_MAX_CACHE_BYTES` | `34359738368` (32 GB) | Max cache size before LRU eviction                                 |

Or set programmatically (takes priority over env vars):

```python
from casher import configure, get_config

configure(cache_dir="/data/my_cache", max_cache_bytes=10 * 1024**3)
print(get_config())  # effective config
```

If no cache directory is configured via `CASHER_CACHE_DIR`, `configure(cache_dir=...)`,
`@cached(cache_dir=...)`, or `casher --cache-dir ...`, casher runs transparently without
caching and emits a one-time warning.

## Platform support

Full caching on **Linux** only (requires strace for subprocess mode, fcntl for locking). On macOS and Windows the decorator is a transparent pass-through — functions execute normally, caching is skipped with a one-time warning.

## Documentation

See [documentation/](documentation/) for detailed docs:

- [Introduction](documentation/00-Introduction.md) — architecture, limitations, in-process vs subprocess comparison
- [Quick Start](documentation/10-Quick-Start.md) — installation, decorator options, CLI usage
- [API Reference](documentation/20-API-Reference.md) — full API surface

## License

MIT
