Metadata-Version: 2.1
Name: compendiumscribe
Version: 0.3.1
Summary: A package for automating the creation of comprehensive and organized domain knowledge bases for AI applications.
Author-Email: "B.T. Franklin" <brandon.franklin@gmail.com>
License: MIT License
         
         Copyright (c) 2024 B.T. Franklin
         
         Permission is hereby granted, free of charge, to any person obtaining a copy
         of this software and associated documentation files (the "Software"), to deal
         in the Software without restriction, including without limitation the rights
         to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
         copies of the Software, and to permit persons to whom the Software is
         furnished to do so, subject to the following conditions:
         
         The above copyright notice and this permission notice shall be included in all
         copies or substantial portions of the Software.
         
         THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
         IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
         FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
         AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
         LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
         OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
         SOFTWARE.
         
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3 :: Only
Classifier: Programming Language :: Python :: 3.12
Classifier: Programming Language :: Python :: 3.13
Classifier: Programming Language :: Python :: 3.14
Classifier: Typing :: Typed
Classifier: License :: OSI Approved :: MIT License
Classifier: Operating System :: OS Independent
Classifier: Intended Audience :: Developers
Classifier: Topic :: Software Development :: Libraries :: Python Modules
Classifier: Topic :: Software Development :: Code Generators
Classifier: Topic :: Utilities
Classifier: Environment :: Console
Project-URL: Homepage, https://github.com/btfranklin/compendiumscribe
Project-URL: Issues, https://github.com/btfranklin/compendiumscribe/issues
Project-URL: Changelog, https://github.com/btfranklin/compendiumscribe/releases
Project-URL: Repository, https://github.com/btfranklin/compendiumscribe.git
Requires-Python: >=3.12
Requires-Dist: openai>=2.11.0
Requires-Dist: python-dotenv>=1.2.1
Requires-Dist: click>=8.3.1
Requires-Dist: promptdown>=1.1.5
Requires-Dist: mistune>=3.2.0
Requires-Dist: fpdf2>=2.8.5
Description-Content-Type: text/markdown

# Compendium Scribe

![Compendium Scribe banner](https://raw.githubusercontent.com/btfranklin/compendiumscribe/main/.github/social%20preview/compendiumscribe_social_preview.jpg "Compendium Scribe")

[![Build Status](https://github.com/btfranklin/compendiumscribe/actions/workflows/python-package.yml/badge.svg)](https://github.com/btfranklin/compendiumscribe/actions/workflows/python-package.yml) [![Supports Python versions 3.12+](https://img.shields.io/pypi/pyversions/compendiumscribe.svg)](https://pypi.python.org/pypi/compendiumscribe)

Compendium Scribe is a Click-driven command line tool and library that uses OpenAI's **deep research** models to assemble a comprehensive research compendium for any topic. The workflow combines optional prompt refinement, a "deep research" call with web search tooling, and deterministic post-processing. It produces human-readable Markdown by default, backed by a rich XML data model that can also be exported.

---

## Features

- 🔍 **Deep research pipeline** — Orchestrates prompt planning, background execution, and tool-call capture with `o3-deep-research`.
- 🧱 **Rich data model** — Includes sections, insights, and citations for cross-format rendering.
- 🧾 **Structured XML output** — Produces a schema-friendly document ready for downstream conversion (HTML, Markdown, PDF pipelines, etc.).
- 🌐 **HTML Site Export** — Generates a static, multi-page HTML site with navigation and semantic structure.
- 🧩 **Skill Export** — Emits an AI agent skill folder with `SKILL.md` plus the compendium Markdown in `references/`.
- 🔄 **Re-rendering** — Ingest existing XML compendiums to generate new output formats without re-running costly research.
- ⚙️ **Configurable CLI** — Control background execution, tool call limits, and output paths via a unified command structure.
- 🧪 **Testable architecture** — Research orchestration is decoupled from the OpenAI client, making it simple to stub in tests.

---

## Quick Start

### 1. Install

```bash
pdm install --group dev
```

Ensure `PDM_HOME` points to a writable location when developing within a sandboxed environment.

### 2. Configure credentials

Create a `.env` file (untracked) with your OpenAI credentials:

```
OPENAI_API_KEY=sk-...
PROMPT_REFINER_MODEL=gpt-5.2
DEEP_RESEARCH_MODEL=o3-deep-research
SKILL_NAMER_MODEL=gpt-5.2
SKILL_WRITER_MODEL=gpt-5.2
POLLING_INTERVAL_IN_SECONDS=10
MAX_POLL_TIME_IN_MINUTES=60
```

Deep research requires an OpenAI account with the browsing tooling enabled. Document any environment keys for additional tooling in the repo as you add them.

### 3. Generate a compendium

Use the `create` subcommand to verify a topic and run the research process:

```bash
pdm run compendium create "Lithium-ion battery recycling"
```

**Options:**

- `--output PATH` — Base path/filename for the output (extension is ignored).
- `--no-background` — Force synchronous execution (useful for short or restricted queries).
- `--max-tool-calls N` — Cap the total number of tool calls for cost control.
- `--format FORMAT` — Output format (defaults to `md`). Available: `md`, `xml`, `html`, `pdf`, `skill`. Can be repeated for multiple outputs.

Example output file name: `lithium-ion-battery-recycling.md`.
Skill output writes a folder named after the skill with `SKILL.md` and a
`references/` markdown file using the standard output filename.

### 4. Render formats from existing XML

If you have an existing XML compendium (e.g., `my-topic.xml`), you can re-render it into other formats:

```bash
pdm run compendium render my-topic.xml --format html
```

**Options:**

- `--format FORMAT` — Output format(s) to generate (`md`, `xml`, `html`, `pdf`, `skill`).
- `--output PATH` — Base path/filename for the output.

### 5. Recover from a timeout

If a research task times out (exceeding `MAX_POLL_TIME_IN_MINUTES`), recovery information is saved to `timed_out_research.json`. You can resume checking for its completion without starting over:

```bash
pdm run compendium recover
```

**Options:**

- `--input PATH` — Path to the recovery JSON file (defaults to `timed_out_research.json`).

---

## Library Usage

```python
from compendiumscribe import build_compendium, ResearchConfig, DeepResearchError

try:
    compendium = build_compendium(
        "Emerging pathogen surveillance",
        config=ResearchConfig(
            background=False, 
            max_tool_calls=30,
            max_poll_time_minutes=15,
        ),
    )
except DeepResearchError as exc:
    # Handle or log deep research failures
    raise

xml_payload = compendium.to_xml_string()

# Alternate exports
markdown_doc = compendium.to_markdown()
html_files = compendium.to_html_site()  # Returns dict of filename -> content
pdf_bytes = compendium.to_pdf_bytes()
```

The returned `Compendium` object contains structured sections, insights, citations, and open questions.

---

## Data Model Overview

Compendium Scribe produces XML shaped like:

```xml
<compendium topic="Lithium-ion Battery Recycling" generated_at="2025-01-07T14:32:33+00:00">
  <overview><![CDATA[Comprehensive synthesis of the state of lithium-ion recycling...]]></overview>
  <methodology>
    <step><![CDATA[Surveyed peer-reviewed literature from 2022–2025]]></step>
    <step><![CDATA[Corroborated industrial capacity data with regulatory filings]]></step>
  </methodology>
  <sections>
    <section id="S01">
      <title><![CDATA[Technology Landscape]]></title>
      <summary><![CDATA[Dominant recycling modalities and throughput metrics...]]></summary>
      <key_terms>
        <term><![CDATA[hydrometallurgy]]></term>
        <term><![CDATA[direct recycling]]></term>
      </key_terms>
      <guiding_questions>
        <question><![CDATA[Which processes yield the highest cobalt recovery rates?]]></question>
      </guiding_questions>
      <insights>
        <insight>
          <title><![CDATA[Hydrometallurgy remains the throughput leader]]></title>
          <evidence><![CDATA[EPRI 2024 data shows >95% cobalt recovery in commercial plants.]]></evidence>
          <implications><![CDATA[Capital efficiency favors hydrometallurgy for near-term scaling.]]></implications>
          <citations>
            <ref>C1</ref>
          </citations>
        </insight>
      </insights>
    </section>
  </sections>
  <citations>
    <citation id="C1">
      <title><![CDATA[EPRI Lithium-ion Recycling Benchmarking 2024]]></title>
      <url><![CDATA[https://example.com/epri-li-benchmark]]></url>
      <publisher><![CDATA[EPRI]]></publisher>
      <published_at><![CDATA[2024-09-01]]></published_at>
      <summary><![CDATA[Performance metrics for recycling modalities across 12 facilities.]]></summary>
    </citation>
  </citations>
  <open_questions>
    <question><![CDATA[How will policy incentives shape regional plant siting post-2025?]]></question>
  </open_questions>
</compendium>
```

This format is intentionally verbose to support downstream transformation. Markdown links within text (e.g., `[Label](URL)`) are preserved in the XML to ensure they render correctly in final outputs.

---

## Testing & Quality

- `pdm run test` — Executes the unit suite. Tests stub the OpenAI client, so they run offline.
- `pdm run lint` — Linting.
- `pdm build` — Produce distributable artifacts.

If `pdm` fails to write log files in restricted environments, set `PDM_HOME` to a writable directory (for example, `export PDM_HOME=.pdm_home`).

---

## Contributing

1. Fork and clone the repository.
2. Run `pdm install --group dev`.
3. Make changes following the style guide and update/add tests.
4. Run `pdm run test` and `pdm run lint`.
5. Raise a pull request with:
   - A concise description of the change.
   - Verification commands executed locally.
   - Representative XML samples if the user-facing structure changes.
