Metadata-Version: 2.4
Name: timecal
Version: 0.1.0
Summary: Cross-agent time-calibration corpus served over MCP
Author-email: Conal Hickey <conal.hg@gmail.com>
License: MIT
Project-URL: Homepage, https://github.com/Conalh/timecal
Project-URL: Repository, https://github.com/Conalh/timecal
Keywords: mcp,llm,estimation,calibration,agents
Classifier: Development Status :: 4 - Beta
Classifier: Intended Audience :: Developers
Classifier: License :: OSI Approved :: MIT License
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.11
Classifier: Topic :: Software Development :: Libraries
Requires-Python: >=3.11
Description-Content-Type: text/markdown
License-File: LICENSE
Requires-Dist: mcp>=1.0.0
Provides-Extra: dev
Requires-Dist: pytest>=8.0; extra == "dev"
Requires-Dist: ruff>=0.6; extra == "dev"
Requires-Dist: build>=1.0; extra == "dev"
Requires-Dist: twine>=5.0; extra == "dev"
Dynamic: license-file

# TimeCal

A cross-agent time-calibration corpus, served over MCP. It counters the systematic over-estimation that LLM agents inherit from ~30 years of human software-engineering timelines.

## Problem

When an agent reads "build a Reddit→Claude pipeline," its training prior maps that to engineer-weeks. A capable agent driving Claude Code can ship the same thing in an afternoon. That bad prior cascades: things get called "infeasible solo," scope gets cut that didn't need cutting, multi-phase rollouts get proposed where one session would do.

Asserting "you're powerful" doesn't override the prior — examples do.

## Solution

A small [MCP](https://modelcontextprotocol.io) server backed by a local SQLite corpus of real "human-estimated → actually-took" rows. Any MCP-aware agent (Claude Code, Codex, Cursor, Cline) can:

- Call `calibrate_task(task_description)` to retrieve similar past rows **before** scoping.
- Call `log_completion(...)` to append new rows as work finishes, so the corpus grows.
- Load the `timecal://preamble` resource to reset the prior at conversation start.

The corpus separates **two clocks** (wall-clock days vs. active hours — different units, never compared raw) and tags every row with a **regime** so the reading agent can tell whether a human "months" estimate was a fake prior or a real external constraint.

## Quickstart

Zero-install, with [uv](https://docs.astral.sh/uv/):

```bash
uvx timecal          # runs the MCP server over stdio
```

Or install it:

```bash
pip install timecal
python -m timecal.server
```

The corpus DB is **created and seeded automatically on first run** (10-row example corpus), at `~/.timecal/timecal.db`. No setup step. Point `TIMECAL_DB` at any path to use your own corpus instead — the server, scripts, and tests all read it at call time.

From source (for development):

```bash
git clone https://github.com/Conalh/timecal && cd timecal
pip install -e ".[dev]"
pytest -q            # 25 passing
```

## Use it from an agent

**Claude Code:**

```bash
claude mcp add timecal -- uvx timecal
```

**Generic MCP client** (`mcp.json` / client config):

```json
{
  "mcpServers": {
    "timecal": {
      "command": "uvx",
      "args": ["timecal"],
      "env": { "TIMECAL_DB": "/path/to/your/corpus.db" }
    }
  }
}
```

(`env` is optional — drop it to use the auto-seeded default.) Once connected, the agent gets two tools (`calibrate_task`, `log_completion`) and one resource (`timecal://preamble`).

## What it changes — before / after

Same task, same model, the difference is one `calibrate_task` call:

> **User:** Can we build a Slack bot that summarizes standup messages?
>
> **Without TimeCal:** "That's a solid 2–3 week project — you'll need to design the webhook integration, build the summarization pipeline, handle Slack's API rate limits, and test across channels. I'd suggest phasing it: week 1 plumbing, week 2 the LLM layer, week 3 hardening."
>
> **With TimeCal** (after `calibrate_task("slack bot that summarizes standup messages")` returns the `standup-summary-bot` row — *human bucket: weeks, actual: ~5h active, agent_executed*): "A close match in the corpus shipped in about 5 active hours, agent-executed — the webhook plumbing dominated, not the LLM call. This is an afternoon, not a sprint. Want me to start on the webhook handler?"

## Layout

```
timecal/
  README.md             this file
  LICENSE               MIT
  pyproject.toml        packaging + pytest/ruff config
  src/timecal/
    server.py           MCP server entrypoint (FastMCP, stdio)
    calibrate.py        retrieval / ranking + agent-facing formatting
    log.py              validation + insert (mcp-free, unit-tested)
    db.py               DB path + auto-init/seed (honors TIMECAL_DB)
    data/
      schema.sql        SQLite schema (shipped in the wheel)
      example.csv       10-row synthetic corpus, auto-seeded on first run
  scripts/
    bootstrap.py        pre-create + seed the DB without starting the server
    init_db.py          create an empty DB from schema
    import_seed.py      import a reviewed CSV (validates enums)
  tests/                pytest suite (db, calibrate, log, import_seed)
```

## Data model

Every row in `projects` carries:

- **`regime`** — what *kind* of work it was; drives whether a human estimate was a fake prior or a real constraint:
  - `agent_executed` — agent does the work end-to-end; human-week estimates are usually agent-hour.
  - `review_bound` — agent produces code in minutes, but human review / re-prompting dominates wall-clock.
  - `external_bound` — gated by people, data accrual, or training runs; "months" is months, not a prior.
- **two clocks** — `wall_clock_days` (includes idle gaps) and `active_hours` (real work time). Different units; the corpus never compares them raw.
- **`estimate_bucket`** — ordinal of what a human team would have estimated (`hours`…`year_plus`), plus `estimate_raw` for nuance.
- **`data_quality`** — how the row was *measured*: `dates_only`, `timed_session`, or `self_reported`.
- **`source`** — provenance. Rows with an empty `source` are filtered out of the default `calibrate_task` response, so synthetic exploration data can't pollute the agent's view.

## Bring your own corpus

The auto-seeded example rows (marked `source=example`) are synthetic — enough to make the demo runnable, not enough to be your real prior. Build your own:

- point `TIMECAL_DB` at a fresh path, then log tasks as you finish them via the `log_completion` tool, or
- import a reviewed CSV into your `TIMECAL_DB`: `python scripts/import_seed.py path/to/your.csv` (rejects, rather than silently drops, rows with bad enums or empty `what_shipped`).

To keep your corpus free of the example rows, delete them with `DELETE FROM projects WHERE source = 'example';` or start from an empty DB via `python scripts/init_db.py`.

## License

MIT — see [LICENSE](LICENSE).
