Metadata-Version: 2.4
Name: convex-to-pydantic
Version: 0.1.2
Summary: Pydantic codegen from Convex schemas. Requires Node.js ≥18 on PATH.
License: MIT
License-File: LICENSE
Requires-Python: >=3.11
Requires-Dist: pydantic>=2.0
Requires-Dist: typer>=0.12
Requires-Dist: watchdog>=3.0
Description-Content-Type: text/markdown

# convex-to-pydantic

Generate fully-typed Pydantic models and async (or sync) client wrappers from your [Convex](https://convex.dev) schema — automatically, with zero manual wiring.

```
convex-to-pydantic generate --convex-dir ./convex --output-dir ./src/myapp/convex_generated
```

Your IDE immediately gets autocomplete, type checking, and inline docs for every Convex function.

## Requirements

- **Python >= 3.11**
- **Node.js >= 18** (must be on `PATH` — used once per run to introspect your Convex schema)

## Installation

```bash
uv add convex-to-pydantic

# Or with pip
pip install convex-to-pydantic

# Install as a CLI tool (globally)
uv tool install convex-to-pydantic
```

### Install from source (development)

```bash
git clone https://github.com/ExSidius/convex-to-pydantic.git
cd convex-to-pydantic
uv sync
```

## Quick start

### 1. Generate types

```bash
convex-to-pydantic generate \
  --convex-dir ./convex \
  --output-dir ./src/myapp/convex_generated
```

This produces two files in `./src/myapp/convex_generated/`:

| File | Contents |
|------|----------|
| `_types.py` | Pydantic `BaseModel` classes for every table and function argument, plus keyword-arg constructor functions |
| `_client.py` | Async wrapper functions that validate args and call `ConvexClient` methods |

### 2. Use in your code

```python
from myapp.convex_generated._types import messages_send_mutation
from myapp.convex_generated._client import messages_send_mutation_call

# Option 1: Just validate args (e.g. for tests)
args = messages_send_mutation(body="Hello!", author="Alice")

# Option 2: Full async call with validation
result = await messages_send_mutation_call(client, body="Hello!", author="Alice")
```

### 3. Watch mode (for development)

```bash
convex-to-pydantic watch \
  --convex-dir ./convex \
  --output-dir ./src/myapp/convex_generated
```

Run this alongside `npx convex dev`. When your Convex schema changes, types regenerate automatically. See [Watch mode](#watch-mode) for details on how this stays efficient.

## What gets generated

Given a Convex schema with a `messages` table and `messages:send` mutation:

**`_types.py`**

```python
class MessagesTable(BaseModel):
    model_config = ConfigDict(extra="forbid", populate_by_name=True)
    author: str
    body: str
    id_: str | None = Field(default=None, alias="_id")
    creation_time: float | None = Field(default=None, alias="_creationTime")

class MessagesSendMutationArgs(BaseModel):
    model_config = ConfigDict(extra="forbid", populate_by_name=True)
    body: str
    author: str

def messages_send_mutation(*, body: str, author: str) -> MessagesSendMutationArgs:
    """Validate args for Convex mutation messages:send."""
    return MessagesSendMutationArgs(body=body, author=author)
```

**`_client.py`**

```python
async def messages_send_mutation_call(
    client: "ConvexClient",
    *,
    body: str,
    author: str,
) -> Any:
    """Convex mutation: messages:send"""
    args = messages_send_mutation(body=body, author=author)
    return await client.mutation("messages:send", args.model_dump(by_alias=True, exclude_none=True))
```

### Sync client wrappers

Not every caller can use `async`/`await` (Flask handlers, scripts, notebooks, sync
test harnesses, etc.). Pass `--client-style sync` — or set
`client_style = "sync"` in `[tool.convex-to-pydantic]` — to emit plain `def`
wrappers that call the client synchronously:

```python
def messages_send_mutation_call(
    client: "ConvexClient",
    *,
    body: str,
    author: str,
) -> Any:
    """Convex mutation: messages:send"""
    args = messages_send_mutation(body=body, author=author)
    return client.mutation("messages:send", args.model_dump(by_alias=True, exclude_none=True))
```

`_types.py` is identical across both styles; only the wrappers in `_client.py`
(and per-module files in `tree` output mode) differ.

## CLI reference

### `generate`

```
convex-to-pydantic generate [OPTIONS]
```

| Option | Description |
|--------|-------------|
| `--convex-dir PATH` | Path to your Convex directory (e.g. `./convex`). Works whether or not you've run `npx convex dev`. |
| `--input PATH` | Path to a pre-exported JSON file (alternative to `--convex-dir`). |
| `--output-dir PATH` | **(required)** Directory to write `_types.py` and `_client.py`. |
| `--force / -f` | Regenerate even if the schema hasn't changed. |
| `--client-style [async\|sync]` | Flavor of client wrappers to emit. `async` (default) emits `async def` + `await`; `sync` emits plain `def` for callers that can't use async. |

Either `--convex-dir` or `--input` must be provided. Use `--input` for CI workflows or when you've pre-exported the schema JSON.

### Configuration via `pyproject.toml`

```toml
[tool.convex-to-pydantic]
convex_dir = "./convex"
output_dir = "./src/myapp/convex_generated"
output_mode = "single"        # or "tree"
client_style = "async"        # or "sync"
format = true
```

### `watch`

```
convex-to-pydantic watch [OPTIONS]
```

| Option | Description |
|--------|-------------|
| `--convex-dir PATH` | **(required)** Path to your Convex directory. |
| `--output-dir PATH` | **(required)** Directory to write generated files. |

## Watch mode

Watch mode is designed to run alongside `npx convex dev` during development. Here's how it handles the realities of a live development environment:

### What happens on a file change

```
1. Filesystem event (inotify/FSEvents, not polling)
       │
2. Debounce: 500ms quiet window
   (rapid saves collapse into one check)
       │
3. Source-file hash check
   SHA-256 of all .ts/.js/.mjs file contents
   ┌─ Unchanged? → STOP. No Node.js call.
   │
4. Node.js extraction (single subprocess)
   schema_export.mjs → JSON blob
       │
5. Blob hash check
   SHA-256 of canonical JSON
   ┌─ Unchanged? → STOP. No codegen.
   │  (source changed but schema didn't,
   │   e.g. comment edits, formatting)
   │
6. Pure transform: JSON → IR → _types.py + _client.py
       │
7. Write files + update .convex_codegen_hash
```

### Two-layer staleness detection

The hash file (`.convex_codegen_hash`) stores two lines:

1. **Source-file hash** — SHA-256 over all `.ts/.js/.mjs` file contents in the convex directory. Checked *before* calling Node.js. This is the fast path: if you save a file without changing anything (or edit a non-schema file), we never even spawn Node.
2. **Blob hash** — SHA-256 over the canonical extraction JSON. Checked *after* extraction. Catches cases where source files changed but the schema didn't (comments, formatting, non-schema code).

This means:
- **Editor auto-save** with no changes → stopped at layer 1 (no Node.js)
- **Editing comments** in a Convex file → stopped at layer 2 (Node.js runs, but no codegen)
- **Actual schema change** → full regeneration (~200ms for typical projects)
- **`npx convex dev` writing intermediates** → stopped at layer 1 or 2

### What watch mode does NOT do

- No polling — uses OS-native filesystem events (inotify on Linux, FSEvents on macOS, ReadDirectoryChangesW on Windows)
- No periodic timers — the debounce timer only runs when a filesystem event actually occurs
- No file diffing — hashing is cheaper and more reliable than line-by-line comparison
- No hot reload — it generates static `.py` files. Your IDE/type checker picks up the changes via its own file watcher.

## Supported Convex types

| Convex type | Python type | Notes |
|-------------|-------------|-------|
| `v.string()` | `str` | |
| `v.number()` / `v.float64()` | `float` | JSON wire: `"number"` |
| `v.int64()` | `int` | JSON wire: `"bigint"`. Inline `# int64` comment |
| `v.boolean()` | `bool` | |
| `v.null()` | `None` | |
| `v.bytes()` | `bytes` | |
| `v.any()` | `Any` | |
| `v.id("table")` | `str` | Inline `# Id[table]` comment |
| `v.literal("x")` | `Literal["x"]` | |
| `v.array(T)` | `list[T]` | Nested arrays supported |
| `v.record(K, V)` | `dict[K, V]` | |
| `v.object({...})` | Named `BaseModel` subclass | Nested objects get their own class |
| `v.union(A, B)` | `A \| B` | |
| `v.union(T, v.null())` | `T \| None` | Null-simplified |
| `v.optional(T)` | `T \| None = None` | Default `None` |
| `v.union(v.literal("a"), v.literal("b"))` | `StrEnum` subclass | All-string-literal unions → enum |

> **Note on the Convex JSON spec:** Convex does not publish a formal spec for `ValidatorJSON`. The canonical definition lives in [`get-convex/convex-js/src/values/validators.ts`](https://github.com/get-convex/convex-js/blob/main/src/values/validators.ts). Key gotcha: the JSON wire format uses `"number"` (not `"float64"`) and `"bigint"` (not `"int64"`) — legacy naming from the server. This tool handles both forms.

### System fields

All table models automatically include:

- `id_: str | None = Field(default=None, alias="_id")` — document ID (server-assigned)
- `creation_time: float | None = Field(default=None, alias="_creationTime")` — creation timestamp (server-assigned)

These are excluded from table constructor functions (since Convex manages them) and default to `None` so client code can construct table models for fixtures, validation, or JSONL export without supplying server-managed values. They still parse correctly when present (e.g. from `model_validate` on a server document).

## Architecture

```
src/convex_to_pydantic/
├── __init__.py           # Public API: generate(), generate_from_json()
├── pipeline.py           # Pure core: transform(blob) → GeneratedFiles
├── types.py              # Frozen Pydantic IR (immutable discriminated unions)
├── converter.py          # Raw JSON dict → IR (pure)
├── namer.py              # Collision-free naming → NameRegistry (pure)
├── hasher.py             # SHA-256 staleness detection
├── codegen/
│   ├── types_file.py     # IR + NameRegistry → _types.py string (pure)
│   └── client_file.py    # IR + NameRegistry → _client.py string (pure)
├── extractor/
│   ├── runner.py          # Node.js subprocess wrapper (IO edge)
│   └── schema_export.mjs  # Bundled JS: walks convex/ directly, imports each module
├── watcher.py             # Debounced watchdog file monitor (IO edge)
└── cli.py                 # Typer CLI (IO edge)
```

### Pure functional core

The entire transformation pipeline is a single pure function:

```python
from convex_to_pydantic.pipeline import transform

result = transform(blob)  # dict → GeneratedFiles (frozen dataclass)
result.types_content      # str — the _types.py file
result.client_content     # str — the _client.py file
```

No IO, no mutation, no side effects. The IR models are frozen (immutable), the namer returns a `NameRegistry` instead of mutating, and codegen produces strings. All side effects (file reads, subprocess calls, file writes) live exclusively at the edges: `cli.py`, `__init__.py`, and `runner.py`.

This means:
- **Testing is trivial** — pass a dict, assert on strings. No mocking.
- **Deterministic** — same input always produces identical output.
- **Debuggable** — inspect the IR and NameRegistry at any point without worrying about mutation order.

### Module auto-discovery

The bundled `schema_export.mjs` walks the user's `convex/` directory directly, importing each module file to discover queries, mutations, and actions. It works whether or not `npx convex dev` has been run — we don't rely on `_generated/api.js` being populated (Convex ships an `anyApi` Proxy stub there until it is). TypeScript is supported via `esbuild` (a transitive dependency of the `convex` npm package). Reserved filenames (`schema.*`, `http.*`, `crons.*`, `auth.config.*`, `convex.config.*`) and underscore/dot-prefixed files are skipped; internal functions (`internalQuery`, `internalMutation`, `internalAction`) are filtered out. No hard-coded `CONVEX_MODULES` list — add a new Convex function and it's picked up automatically.

## Programmatic API

```python
from pathlib import Path
from convex_to_pydantic import generate, generate_from_json

# From a live Convex project (requires Node.js)
generate(
    convex_dir=Path("./convex"),
    output_dir=Path("./src/myapp/convex_generated"),
)

# From a pre-exported JSON file (no Node.js needed)
generate_from_json(
    input_json=Path("./schema_export.json"),
    output_dir=Path("./src/myapp/convex_generated"),
)

# Emit sync (`def`) wrappers instead of async (`async def`)
generate(
    convex_dir=Path("./convex"),
    output_dir=Path("./src/myapp/convex_generated"),
    client_style="sync",
)
```

For lower-level access to the pure pipeline:

```python
import json
from convex_to_pydantic.pipeline import transform

blob = json.loads(Path("schema.json").read_text())
result = transform(blob)
print(result.types_content)   # the _types.py source
print(result.client_content)  # the _client.py source
print(result.num_tables, result.num_functions)
```

## Code quality guidelines

This project follows a few conventions to keep the codebase clean and predictable:

1. **Pure functional core.** The entire transformation pipeline (`converter.py` → `namer.py` → `codegen/`) is pure — no IO, no mutation, no side effects. All IR models are frozen/immutable. Side effects live exclusively at the edges: `cli.py`, `__init__.py`, `runner.py`.

2. **Guard clauses over nesting.** Functions use early returns to handle edge cases at the top, keeping the main logic at a single indentation level. Avoid deep `if/elif/else` chains.

3. **No blanket exception catching.** Catch specific exception types (`OSError`, `json.JSONDecodeError`, `ValueError`, etc.), never bare `except Exception`.

4. **Focused, typed functions.** Every function has a clear input/output contract. No hidden state, no global mutation. Type annotations on all public APIs.

5. **No speculative abstractions.** Don't add helpers, utilities, or configurability for hypothetical future needs. Three similar lines of code is better than a premature abstraction.

## Development

```bash
# Clone and install
git clone https://github.com/ExSidius/convex-to-pydantic.git
cd convex-to-pydantic
uv sync

# Run tests
uv run pytest

# Integration tests that exercise the JS extractor against real Convex
# projects need `pnpm` on PATH (auto-skip otherwise). The subset marked
# `requires_docker` also needs a running Docker daemon — those deploy
# each fixture to a self-hosted Convex backend and verify the extractor
# works against both the pre-deploy `anyApi` stub and the concrete
# post-deploy `_generated/api.js`.

# Lint + format
uv run ruff check .
uv run ruff format .

# Type check
uv run ty check
```

### Pre-commit hooks (prek)

This project uses [prek](https://prek.j178.dev/) for pre-commit hooks. To set up:

```bash
# Install prek
uv tool install prek
# or: brew install prek

# Install git hooks
prek install

# Run hooks manually
prek run --all-files
```

Configured hooks (see `.pre-commit-config.yaml`):
- **ruff check** — lint with auto-fix
- **ruff format** — code formatting
- **ty** — type checking (Astral)
- **pytest** — full test suite (catches bugs in generated output that lint/typecheck can't see)
- **biome check** — JS/TS linting and formatting

`prek` reads the same `.pre-commit-config.yaml` as classic `pre-commit`, so contributors can use either tool interchangeably.

### Test fixtures

Tests use four JSON fixtures covering all type variants:

| Fixture | Covers |
|---------|--------|
| `chat_app.json` | Basic strings, empty args, simple mutations/queries |
| `auth_app.json` | `v.id()`, `v.optional()`, `v.union()` with null, string-literal unions (→ StrEnum) |
| `ai_app.json` | `v.array()`, nullable ID pattern |
| `kitchen_sink.json` | `v.int64()`, `v.record()`, `v.bytes()`, nested objects (2 levels), non-string literals, mixed literal+null union, nested arrays, empty args, actions |

## License

MIT
