Metadata-Version: 2.4
Name: pycache_skip
Version: 0.1.1
Summary: Skip pipeline steps when inputs are unchanged — content-aware, with module dependency tracking
License-Expression: MIT
License-File: LICENSE
Classifier: License :: OSI Approved :: MIT License
Classifier: Operating System :: POSIX :: Linux
Classifier: Programming Language :: Python :: 3
Classifier: Topic :: Software Development :: Libraries :: Python Modules
Requires-Python: >=3.12
Requires-Dist: loguru
Requires-Dist: xxhash
Description-Content-Type: text/markdown

# pycache_skip

Skip pipeline steps when their inputs have not changed.

```bash
uv add pycache_skip
```

## What it does

`cache_skip` wraps a pipeline step function and skips re-execution when all
inputs are unchanged. It stores a compact state file (`.input_state.json`)
alongside each output directory. On subsequent calls it compares the current
inputs against the stored state and only reruns the function when something
actually changed.

## Usage

### Basic example (single input directory)

```python
from pathlib import Path
from cache_skip import cache_skip, Dirmaker

dm = Dirmaker(Path("/data/pipeline/run-001"))

@cache_skip
def step_transform(raw: Path, *, _output: Path) -> Path:
    # heavy transformation ...
    return _output

# First call — runs the function and records input state.
step_transform(Path("/data/raw"), _output=dm.path_for("transform"))

# Second call — skips the function, returns the output path immediately.
step_transform(Path("/data/raw"), _output=dm.path_for("transform"))
```

### Example with non-Path args

Non-`Path` arguments (dates, strings, ints, etc.) are also part of the cache
key. Changing them triggers a rerun.

```python
import datetime as dt

@cache_skip(track_dependencies=False)
def step_build_config(
    schedule_date: dt.date,
    template: Path,
    *,
    _output: Path,
) -> Path:
    ...

# Changing schedule_date from 2025-01-01 to 2025-01-02 invalidates the cache.
```

### Dirmaker companion

`Dirmaker` allocates named output directories under a staging root. Use
`path_for(name)` to resolve the path without side effects (for `@cache_skip`),
or `new_output_dir(name)` to delete and recreate explicitly.

```python
dm = Dirmaker(Path("/data/pipeline/run-001"))

# Pass path to decorator — decorator manages deletion on rerun.
step_transform(raw, _output=dm.path_for("transform"))

# Or manage the directory yourself:
out = dm.new_output_dir("transform")   # deletes existing, creates fresh
```

## How invalidation works

Three-tier change detection on every call after the first:

1. **Args hash** — all non-`Path`, non-`_output` arguments are hashed via
   `repr()`. A change in any scalar argument (date, string, int, …) triggers
   a rerun immediately.

2. **Dependency hash** — the source files of the decorated function and all
   modules it imports (static AST analysis) are hashed. Editing the function's
   source code triggers a rerun. Disable with `track_dependencies=False`.

3. **File content hash** — every file under each input `Path` is compared.
   Metadata (mtime, inode, size) is checked first as a fast path. If metadata
   is identical the stored hash is trusted. If metadata drifted but content
   hash matches, the state file is updated silently without a rerun (handles
   `rsync` / `cp -p` copies with timestamp noise).

## track_dependencies

```python
@cache_skip(track_dependencies=False)
def step(...):
    ...
```

Set `track_dependencies=False` to skip module source hashing. Useful when the
function imports large, rarely-changing libraries and startup cost matters, or
in tests.

## Comparison with auto_skip

`cache_skip` is a simpler, self-contained alternative to `auto_skip`:

| Feature             | `cache_skip`                 | `auto_skip`          |
| ------------------- | ---------------------------- | -------------------- |
| Input detection     | explicit `Path` args         | strace / audit hooks |
| Non-Path args       | hashed                       | ignored              |
| Module dep tracking | static AST                   | runtime import list  |
| External deps       | `xxhash`, `loguru`           | heavier stack        |
| Output format       | dir with `.input_state.json` | opaque cache store   |
