Metadata-Version: 2.4
Name: safe-state
Version: 0.1.1
Summary: Resumable execution for Python. One decorator. Zero retry loops.
Author-email: Nishant Bhatte <ironfighter23@users.noreply.github.com>
License: MIT License
        
        Copyright (c) 2026 Nishant Bhatte (IronFighter23)
        
        Permission is hereby granted, free of charge, to any person obtaining a copy
        of this software and associated documentation files (the "Software"), to deal
        in the Software without restriction, including without limitation the rights
        to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
        copies of the Software, and to permit persons to whom the Software is
        furnished to do so, subject to the following conditions:
        
        The above copyright notice and this permission notice shall be included in all
        copies or substantial portions of the Software.
        
        THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
        IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
        FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
        AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
        LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
        OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
        SOFTWARE.
        
Project-URL: Homepage, https://github.com/IronFighter23/safe-state
Project-URL: Repository, https://github.com/IronFighter23/safe-state
Project-URL: Issues, https://github.com/IronFighter23/safe-state/issues
Project-URL: Changelog, https://github.com/IronFighter23/safe-state/blob/main/CHANGELOG.md
Keywords: checkpoint,resume,fault-tolerance,retry,serialization,dill,automation,batch-processing,decorator
Classifier: Development Status :: 4 - Beta
Classifier: Intended Audience :: Developers
Classifier: Intended Audience :: System Administrators
Classifier: License :: OSI Approved :: MIT License
Classifier: Operating System :: OS Independent
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.9
Classifier: Programming Language :: Python :: 3.10
Classifier: Programming Language :: Python :: 3.11
Classifier: Programming Language :: Python :: 3.12
Classifier: Programming Language :: Python :: 3.13
Classifier: Topic :: Software Development :: Libraries :: Python Modules
Classifier: Topic :: System :: Recovery Tools
Classifier: Topic :: Utilities
Requires-Python: >=3.9
Description-Content-Type: text/markdown
License-File: LICENSE
Requires-Dist: dill>=0.3.7
Provides-Extra: dev
Requires-Dist: pytest>=7.0; extra == "dev"
Requires-Dist: pytest-cov>=4.0; extra == "dev"
Requires-Dist: requests>=2.28; extra == "dev"
Dynamic: license-file

# safe-state

**Resumable execution for Python. One decorator. Zero retry loops.**

[![PyPI](https://img.shields.io/pypi/v/safe-state.svg)](https://pypi.org/project/safe-state/)
[![Python](https://img.shields.io/pypi/pyversions/safe-state.svg)](https://pypi.org/project/safe-state/)
[![License](https://img.shields.io/pypi/l/safe-state.svg)](LICENSE)

You wrote a Python script that loops through 10,000 things — sending welcome
emails, downloading files, calling an API for each user in your database,
resizing images, scraping URLs. Somewhere around item 6,432 the network blips,
a rate-limit kicks in, or someone unplugs your laptop. Everything dies. You
have no idea what was done and what wasn't.

The usual fix is a thicket of `try/except` blocks, manual retry loops, a "last
processed ID" column in some side database, and a `--resume-from` CLI flag.
`safe-state` deletes all of that:

```python
from safe_state import safe_state

@safe_state
def send_welcome_emails(users, mailer):
    for user in users:
        mailer.send(user.email, "Welcome!", render_template(user))

send_welcome_emails(load_users(), open_mailer())
# Crashes at user 6,432? Just run the script again. It skips the first 6,431
# and picks up at 6,432. No code changes needed.
```

---

## What makes this hard (and why most checkpointing tools don't actually work)

Python's built-in `pickle` can serialize dictionaries, lists, integers, and most
plain objects. It **cannot** serialize:

- Open network sockets
- Live database connections (`sqlite3`, `psycopg2`, `pymongo`)
- Open file handles
- `requests.Session` objects with active TCP keep-alives
- Any object holding a C-level resource

So a naive "just pickle everything" checkpointer crashes the moment your script
holds anything useful. `safe-state` solves this with a **reconnect registry**:
when it finds a live object, it serializes a small metadata record describing
*how to recreate the object*, then rebuilds a fresh one on resume.

Built-in handlers ship for `sqlite3.Connection`, `socket.socket`,
`requests.Session`, and file handles. Custom types are a five-line
`register_reconnector()` call away.

---

## Install

```bash
pip install safe-state
```

Requires Python 3.9+ and `dill` (the only runtime dependency; `pickle` isn't
powerful enough on its own).

---

## How it works

`@safe_state` does three things to the function it wraps:

1. **Intercepts the first iterable argument.** The function still sees a normal
   iterable, but `safe-state` is silently tracking which items have completed.
2. **Persists progress after every item** (or every N items — configurable) to
   a `.safestate` file on disk via an atomic write.
3. **Captures locals on failure.** When an exception escapes the function,
   `safe-state` walks the traceback, grabs the local variables from the failing
   frame, freezes them with `dill` plus the reconnect registry, and writes them
   to the checkpoint. The exception then re-raises as normal — `safe-state`
   never silently swallows errors.

On the next invocation with the same job name, the checkpoint is loaded,
already-completed indices are skipped, and the iteration resumes from where it
stopped.

On successful completion, the checkpoint file is deleted.

---

## Full example: downloading 500 images

```python
import requests
from safe_state import safe_state

@safe_state(name="image-scrape", verbose=True)
def download_all(urls, session):
    for url in urls:
        filename = url.split("/")[-1]
        response = session.get(url, timeout=10)
        response.raise_for_status()
        with open(f"downloads/{filename}", "wb") as f:
            f.write(response.content)

if __name__ == "__main__":
    urls = open("urls.txt").read().splitlines()
    download_all(urls, requests.Session())
```

**Run 1** — connection times out on file 234:

```
[safe_state] starting fresh job 'image-scrape'
[safe_state] 'image-scrape' failed at item 233:
  ConnectionError: HTTPSConnectionPool... Read timed out.
  Progress 233/500 saved to .safe_state/image-scrape.safestate
Traceback (most recent call last): ...
```

**Run 2** — same command, no flags, no edits:

```
[safe_state] resuming 'image-scrape': 233/500 done (run #2)
[safe_state] skip index 0 (done)
...
[safe_state] skip index 232 (done)
# resumes at item 233, completes through 499
[✓] Job complete. Checkpoint cleared.
```

---

## More use cases

Anything that loops through a batch of work benefits from this:

```python
# Bulk database backfill
@safe_state(name="backfill-2026")
def backfill(user_ids, conn):
    for uid in user_ids:
        new_value = expensive_computation(uid)
        conn.execute("UPDATE users SET score = ? WHERE id = ?", (new_value, uid))
        conn.commit()

# Processing a giant CSV
@safe_state(name="csv-cleanup")
def clean_rows(rows, output_writer):
    for row in rows:
        cleaned = normalize(row)
        output_writer.writerow(cleaned)

# Calling an API for every record
@safe_state(name="enrich-leads", save_every=10)
def enrich(leads, api_client):
    for lead in leads:
        data = api_client.lookup(lead.email)
        lead.enriched_data = data
        lead.save()

# Resizing thousands of images
@safe_state(name="thumbnails")
def make_thumbs(image_paths):
    for path in image_paths:
        img = Image.open(path)
        img.thumbnail((256, 256))
        img.save(path.replace(".jpg", "_thumb.jpg"))
```

In every case, if the script crashes partway, you just rerun it. No retry
logic, no progress columns, no resume flags.

---

## API

### `@safe_state`

```python
@safe_state(
    name=None,             # job identifier; defaults to fn.__qualname__
    state_dir=".safe_state",  # checkpoint directory
    iterable_arg=0,        # which arg is the iterable (int index or kwarg name)
    save_every=1,          # persist every N completed items
    store_results=False,   # also store each item's value (must be serializable)
    keep_on_success=False, # keep checkpoint after successful completion
    verbose=False,         # print progress to stderr
    auto_iterate=True,     # set False for manual checkpoint() mode
)
```

The decorator works with or without parentheses:

```python
@safe_state                # equivalent to @safe_state()
def f(items): ...

@safe_state(name="custom")
def g(items): ...
```

### Inspecting checkpoints

Every decorated function exposes three helpers:

```python
@safe_state
def my_job(items): ...

my_job.peek_checkpoint()    # -> Checkpoint object, or None
my_job.clear_checkpoint()   # -> deletes the .safestate file
my_job.checkpoint_path      # -> Path to the .safestate file
```

A `Checkpoint` object holds:

- `completed_indices: set[int]`
- `total_items: int | None`
- `last_failure: dict | None` — exception type, message, traceback, index
- `frozen_state: bytes | None` — `dill`-serialized locals from the failing frame
- `run_count: int`
- `progress() -> dict` — human-readable summary

### Reconnect registry

Built-in handlers cover `sqlite3.Connection`, `socket.socket`,
`requests.Session`, and `io.IOBase` (file handles). To add your own:

```python
from safe_state import register_reconnector

class MyApiClient:
    def __init__(self, host, token):
        self.host = host
        self.token = token
        self.session = open_some_session(host, token)

register_reconnector(
    MyApiClient,
    extract=lambda c: {"host": c.host, "token": c.token},
    reconnect=lambda meta: MyApiClient(meta["host"], meta["token"]),
)
```

That's it — any `MyApiClient` instance held in your function's locals will now
survive checkpoint/restore.

### Manual checkpointing (advanced)

If your function doesn't fit the "loop over items" mould — e.g. it processes a
graph or a single very long task — set `auto_iterate=False` and call
`checkpoint()` manually:

```python
from safe_state import safe_state, checkpoint

@safe_state(auto_iterate=False)
def big_job(graph):
    visited = set()
    for node in graph.walk():
        process(node)
        visited.add(node.id)
        checkpoint(visited=visited)  # freeze progress here
```

---

## What `safe-state` is **not**

- **Not a distributed task queue.** For multi-machine job dispatch use Celery,
  Dramatiq, or RQ. `safe-state` solves the much smaller problem of "this one
  process crashed; let me rerun the same script and resume."
- **Not a transaction manager.** If your work involves multi-step database
  state that needs rollback, use real transactions. `safe-state` checkpoints at
  iteration boundaries; an item is either complete or it isn't.
- **Not magic.** It doesn't freeze CPython frames mid-instruction. The
  iteration boundary is the resume granularity. If a single item's work is
  itself a long pipeline, decompose it into smaller items.

---

## Performance

The default `save_every=1` writes a checkpoint after every iteration. For most
real workloads (network calls, DB writes) this is well under a millisecond of
overhead and totally invisible. If your inner loop is microsecond-scale, raise
`save_every` to batch progress flushes:

```python
@safe_state(save_every=100)
def fast_loop(items):
    for item in items:
        cheap_in_memory_work(item)
```

---

## License

MIT. See [LICENSE](LICENSE).

---

## Contributing

Issues and pull requests welcome. Run the test suite with:

```bash
pip install -e ".[dev]"
pytest
```
