Metadata-Version: 2.4
Name: auzek
Version: 0.1.1
Summary: Auzek — an autonomous coding agent that plans, executes, self-verifies and self-heals across multiple LLM providers.
Author: Azaan (Auzek)
License: MIT
Keywords: ai,agent,autonomous,coding-agent,llm,langgraph,groq,developer-tools
Classifier: Development Status :: 4 - Beta
Classifier: Environment :: Console
Classifier: Intended Audience :: Developers
Classifier: License :: OSI Approved :: MIT License
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.10
Classifier: Programming Language :: Python :: 3.11
Classifier: Programming Language :: Python :: 3.12
Classifier: Topic :: Software Development :: Code Generators
Classifier: Topic :: Software Development :: Build Tools
Requires-Python: >=3.10
Description-Content-Type: text/markdown
Requires-Dist: langgraph>=0.2.40
Requires-Dist: langchain-core>=0.3.0
Requires-Dist: litellm>=1.51.0
Requires-Dist: pydantic>=2.7
Requires-Dist: pydantic-settings>=2.3
Requires-Dist: python-dotenv>=1.0
Requires-Dist: rich>=13.7
Requires-Dist: typer>=0.12
Requires-Dist: gitpython>=3.1
Requires-Dist: pathspec>=0.12
Requires-Dist: tenacity>=8.3
Provides-Extra: dev
Requires-Dist: pytest>=8.0; extra == "dev"
Requires-Dist: ruff>=0.5; extra == "dev"
Requires-Dist: mypy>=1.10; extra == "dev"
Requires-Dist: build>=1.2; extra == "dev"
Requires-Dist: twine>=5.0; extra == "dev"

# Auzek

> An autonomous coding agent by **Azaan (Auzek)**.

**Auzek** is an autonomous coding agent that **understands the repo, plans before
it codes, executes one step at a time, verifies its own work, and self-heals on
failure** before moving on. It runs on **any major LLM provider** — bring your own
API key (Anthropic, OpenAI, **Groq**, Google, Mistral, DeepSeek, or local Ollama).

It is built on **LangGraph** (orchestration) and **LiteLLM** (provider gateway).

```bash
pip install auzek
auzek run "add input validation to the /signup endpoint" --provider groq
```

---

## Why it's different from a "blind" coding bot

| Naive agent | This agent |
|---|---|
| Starts editing immediately | **Onboards** to the repo first (stack, tests, layout, git history) |
| Holds the plan in context | Writes the plan to **disk** (`.agent/plan.md`) — survives crashes |
| "Looks done" after writing | Marks a step done only **after running its verification** |
| Retries forever | **Hard stop** after N recovery attempts, then escalates |
| One giant change | **Atomic steps**, optionally **micro-committed** |
| "Done" = code written | "Done" = full test/lint/typecheck pass + diff reviewed vs. the task |

---

## The lifecycle (a LangGraph state machine)

```
context → planning → [human approval] → execution ⇄ recovery → verification → report
```

1. **Context** – lists/reads files, searches code, reads git history → a briefing.
2. **Planning** – emits a structured, ordered, atomic plan (`submit_plan` tool).
3. **Approval** – optional human gate (pause/approve the plan).
4. **Execution** – implements **one** step, then **runs its verification**.
5. **Recovery** – on failure, widens investigation and retries (capped).
6. **Verification** – runs the full suite, reviews the whole diff vs. the task.
7. **Report** – writes an honest `.agent/report.md`.

State and plan live in `.agent/` so a run is inspectable and resumable.

---

## Install

```bash
# from PyPI (once published)
pip install auzek

# or with pipx so the `auzek` command is globally available, isolated
pipx install auzek
```

From source (for development):

```bash
cd Autonomous_Agent
python -m venv .venv && . .venv/Scripts/activate   # Windows
# or:  source .venv/bin/activate                    # macOS/Linux
pip install -e .
```

## Configure keys

```bash
cp .env.example .env
# fill in the provider(s) you use, e.g. GROQ_API_KEY=...
```

Check what's wired up:

```bash
auzek providers
```

## Run

```bash
# operate on the current repo
auzek run "Add input validation to the /signup endpoint and a test for it"

# pick a provider/model explicitly (Groq example)
auzek run "Refactor utils.py to remove the duplicated date parsing" \
    --provider groq --model llama-3.3-70b-versatile

# point at another repo, auto-approve the plan, micro-commit each step
auzek run "Fix the failing login test" \
    --workspace ../my-project --yes --auto-commit
```

Useful flags: `--provider`, `--model`, `--api-key`, `--workspace`, `--yes`
(auto-approve), `--no-approval`, `--max-steps`, `--auto-commit`, `--temperature`.

Inspect the plan any time:

```bash
auzek plan-show --workspace ../my-project
```

---

## Configuration (`config.yaml`)

Verification commands auto-detect when blank; set them to be explicit:

```yaml
provider: anthropic
model: claude-sonnet-4-6
max_recovery_attempts: 3
max_steps: 40
auto_commit: false
require_plan_approval: true
test_command: "pytest -q"
lint_command: "ruff check ."
typecheck_command: "mypy ."
```

Resolution order: **CLI flags > env vars (`AGENT_*`) > `config.yaml` > defaults**.

---

## Project layout

```
src/auzek/
  cli.py            # Typer CLI, approval gate, output
  config.py         # layered config
  llm.py            # multi-provider gateway (LiteLLM) + key handling
  runtime.py        # shared deps + the core tool-calling loop
  state.py          # LangGraph state schema
  graph.py          # the state machine (nodes + conditional edges)
  prompts.py        # per-phase system prompts
  memory/plan_store.py   # the durable plan (json + markdown)
  tools/            # read/write/edit, list, search, shell, git
  nodes/            # context, planning, approval, execution, recovery,
                    # verification, report
```

---

## Adding a provider

Add one line to `PROVIDERS` in [llm.py](src/auzek/llm.py):

```python
"xai": ProviderSpec("xai", "XAI_API_KEY", "grok-2-latest"),
```

LiteLLM handles the wire format; nothing else changes.

---

## Safety

- All file access is sandboxed to the workspace; `deny_globs` blocks `.env`,
  `.git`, `node_modules`, etc.
- The shell tool has a destructive-command guardrail and output/time limits —
  but it is **not** a security boundary. For untrusted tasks, run in a
  container or VM.

---

## A note on SWE-bench / "beating" other models

This is a strong, production-shaped **harness**. On agentic benchmarks the
score is dominated by (a) the underlying model and (b) harness discipline —
plan/verify/self-heal loops, tight diffs, real test execution — all of which
this implements. To actually measure it, wire `auzek run` to the SWE-bench
task format (clone repo at the given commit, feed the issue as the task, export
the resulting `git diff` as the prediction patch) and run the official
evaluation. Treat any ranking as something you **measure**, not assume.
```
```

## License

MIT
