Metadata-Version: 2.4
Name: storage-driven-events
Version: 0.1.0
Summary: Git-based change detection for folders
Author: Vijay Balasubramaniam
License-Expression: MIT
License-File: LICENSE
Classifier: License :: OSI Approved :: MIT License
Classifier: Operating System :: OS Independent
Classifier: Programming Language :: Python :: 3
Classifier: Topic :: Software Development :: Version Control :: Git
Requires-Python: >=3.10
Description-Content-Type: text/markdown

# storage-driven-events

Git-based change detection for folders. Detect what changed in a git repository since your last scan and pipe the changes to a handler script.

No daemon, no polling loop — just run `storage-events scan` whenever you want to check for changes.

## How it works

The tool uses a custom git ref (`refs/storage-events/last-processed`) to track the last commit you processed. On each scan, it runs `git diff-tree` between that ref and the current `HEAD` to get a list of added, modified, deleted, and renamed files. If a handler is configured, the changes are piped to it via stdin. The ref only advances after the handler succeeds (exit code 0), giving you automatic retry on failure.

## Installation

### With pip

```bash
pip install storage-driven-events
```

### With uv

```bash
uv add storage-driven-events
```

### From source

```bash
git clone https://github.com/your-username/storage-driven-events.git
cd storage-driven-events
uv sync
```

## Quick start

### Option 1: Interactive setup

```bash
storage-events setup
```

This walks you through cloning a repo, initializing change tracking, and optionally installing a git hook and cron job.

### Option 2: Non-interactive setup

```bash
storage-events setup \
    --repo git@github.com:user/data-repo.git \
    --path ./data-repo \
    --branch main
```

### Option 3: Manual setup on an existing repo

```bash
cd /path/to/your/repo
storage-events scan  # First run initializes tracking
```

## Usage

### Scanning for changes

```bash
# Scan the current directory
storage-events scan

# Scan a specific repo
storage-events scan /path/to/repo

# Pull first, then scan
storage-events scan --pull

# Preview changes without advancing the ref
storage-events scan --dry-run

# Pipe changes to a custom handler
storage-events scan --handler ./my-handler.sh
```

### Change output format

Changes are printed as tab-separated lines matching `git diff-tree --name-status` output:

```
A       reports/q1-summary.pdf
M       data/metrics.csv
D       tmp/scratch.txt
R       docs/guide-v2.md
```

Status codes: `A` (added), `M` (modified), `D` (deleted), `R` (renamed), `C` (copied).

### Setup command

```bash
# Interactive mode (prompts for everything)
storage-events setup

# Non-interactive mode
storage-events setup --repo <url> [options]
```

Options:

| Flag | Default | Description |
|---|---|---|
| `--repo URL` | — | Git repository URL (required for non-interactive) |
| `--path PATH` | `./<repo-name>` | Local clone path |
| `--branch NAME` | `main` | Branch to track |
| `--handler PATH` | built-in | Path to handler script |
| `--cron MINUTES` | — | Set up cron polling at this interval |
| `--no-hook` | — | Skip post-merge hook installation |

The setup command:
1. Clones the repo (or reuses an existing clone)
2. Initializes the last-processed ref to the current HEAD
3. Installs a default handler (`default-handler.py`) that pretty-prints changes
4. Installs a `post-merge` git hook (so changes are shown automatically after `git pull`)
5. Optionally sets up a cron job for automated polling

## Writing handlers

A handler is any executable that reads tab-separated change lines from stdin. Exit `0` to mark changes as processed (advances the ref). Exit non-zero to leave the ref unchanged so the same changes are retried on the next scan.

### Default handler

The setup command installs `default-handler.py`, which pretty-prints changes:

```
ADDED      reports/q1-summary.pdf
MODIFIED   data/metrics.csv
DELETED    tmp/scratch.txt
```

### Example: Slack notification

```bash
#!/usr/bin/env bash
payload=$(jq -Rs '{text: ("Files changed:\n" + .)}' <<< "$(cat)")
curl -s -X POST -H 'Content-type: application/json' \
    --data "$payload" "$SLACK_WEBHOOK_URL"
```

### Example: Process only CSV files

```bash
#!/usr/bin/env bash
while IFS=$'\t' read -r status file; do
    if [[ "$file" == *.csv && "$status" != "D" ]]; then
        python3 pipeline.py "$file"
    fi
done
```

### Example: Python handler

```python
#!/usr/bin/env python3
import sys

for line in sys.stdin:
    status, filepath = line.strip().split("\t", 1)
    if status == "A":
        print(f"New file detected: {filepath}")
        # Do something with the new file...
```

## Automation

### Cron

Poll every 15 minutes:

```bash
# Via setup
storage-events setup --repo <url> --cron 15

# Or add manually to crontab
*/15 * * * * cd /path/to/repo && git pull -q && storage-events scan
```

### Post-merge hook

The `setup` command installs a `.git/hooks/post-merge` hook that shows changes automatically after every `git pull`. You can customize the handler with the `STORAGE_EVENTS_HANDLER` environment variable:

```bash
STORAGE_EVENTS_HANDLER=./notify.sh git pull
```

### Launchd (macOS)

Create `~/Library/LaunchAgents/com.storage-events.scan.plist` for a timer-based approach that's more reliable than cron on macOS.

## How the ref works

The tool stores a single git ref at `refs/storage-events/last-processed` inside the repo's `.git` directory. This ref points to the last commit that was successfully processed.

- **First scan**: The ref is created pointing to the current `HEAD`. No changes are reported.
- **Subsequent scans**: Changes between the ref and `HEAD` are reported. If the handler succeeds, the ref advances to `HEAD`.
- **Handler failure**: The ref stays where it is. The next scan will report the same changes again.
- **No external state**: Everything lives inside the git repo. No config files, databases, or lock files.

## Requirements

- Python 3.10+
- Git
- No runtime dependencies (stdlib only)

## Development

```bash
git clone https://github.com/your-username/storage-driven-events.git
cd storage-driven-events
uv sync
uv run pytest -v
```

## License

MIT
