Metadata-Version: 2.4
Name: paperpilot
Version: 1.0.3
Summary: A CLI research agent for AI-related paper search, code discovery, PDF collection, and bilingual reports.
Keywords: literature-review,research-agent,papers,arxiv,openalex,openreview
Classifier: Development Status :: 3 - Alpha
Classifier: Environment :: Console
Classifier: Intended Audience :: Science/Research
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.10
Classifier: Programming Language :: Python :: 3.11
Classifier: Programming Language :: Python :: 3.12
Classifier: Topic :: Scientific/Engineering :: Artificial Intelligence
Requires-Python: >=3.10
Description-Content-Type: text/markdown
Requires-Dist: jinja2>=3.1
Requires-Dist: pypdf>=4.0
Requires-Dist: reportlab>=4.0

# PaperPilot

A CLI AI literature search agent that follows a v1.0 research workflow:

`Intake -> Protocol -> Search -> Corpus -> Screening -> Verification -> Synthesis -> Review -> Report`

## Quick Start

```bash
pip install -e .
PaperPilot
```

Command mode:

```bash
PaperPilot "LLM agent" --auto-confirm --max-papers 30 --since-year 2021
PaperPilot "RNA inverse folding sequence design" --github-filter required --mode apa --quality balanced
PaperPilot inspect runs/<task-id>
PaperPilot resume runs/<task-id>
```

## LLM Config

Recommended:

```bash
PaperPilot config set --base-url https://api.deepseek.com --model deepseek-chat
PaperPilot config import ./api.json
PaperPilot config show
PaperPilot config list
PaperPilot config use deepseek
```

The config is stored at `~/.paperpilot/config.json` with file mode `600` where supported. Running `PaperPilot` without an LLM config starts a setup wizard. The app requires a working LLM configuration before running searches. Inside the interactive shell, use `/model` to add, import, switch, test, or delete model profiles.

Priority:

1. Environment variables: `OPENAI_API_KEY`, `OPENAI_BASE_URL`, `OPENAI_MODEL`
2. User config: `~/.paperpilot/config.json`
3. Legacy project file: `llmapi.txt`

Without a working LLM config, PaperPilot will pause and ask you to configure one first.

## Outputs

Each run writes:

- `task.json`
- `state.json`
- `manifest.json`
- `query_understanding.md`
- `plan.json`
- `protocol.json`
- `prompt_manifest.json`
- `registries.json`
- `events.jsonl`
- `metadata.json`
- `user_corpus_log.json`
- `corpus.json`
- `core_papers.json`
- `adjacent_papers.json`
- `excluded_papers.json`
- `ranked_papers.json`
- `verification.json`
- `quality_gate.json`
- `literature_matrix.json`
- `synthesis.json`
- `evidence_ledger.json`
- `review_agent_findings.json`
- `report.canonical.json`
- `reflection.json`
- `report.zh.md`
- `report.en.md`
- `report.zh.html`
- `report.en.html`
- `report.zh.pdf`
- `report.en.pdf`
- `download_log.json`
- `pdfs/`
- `fulltext/`
- `paper_notes.json`

The Chinese and English Markdown, HTML, and PDF reports are rendered from the same `report.canonical.json`, so paper lists and conclusions stay aligned.

Local corpus input is supported with `--user-corpus <file-or-dir>` for PDF, BibTeX, RIS, Markdown, and text files. User corpus entries are logged in `user_corpus_log.json`; skipped files include reasons instead of being silently dropped.

## GitHub Filter

```bash
literature-agent "retrieval augmented generation" --auto-confirm --github-filter required
```

- `any`: keep all papers and annotate code availability.
- `required`: the final report view keeps papers with a detected public code link; the full screened corpus is still saved.
- `none`: keep papers without detected public code links.

## v1.0 Quality Layer

- `prompt_manifest.json` records versioned prompt roles and required JSON keys.
- `registries.json` records the built-in ToolRegistry and CapabilityRegistry.
- `events.jsonl` records stage start/progress/completion events for inspection.
- `protocol.json` records research question, inclusion/exclusion rules, sources, and negative keywords.
- `corpus.json` stores every screened paper with `core`, `adjacent`, or `exclude` decisions.
- `verification.json` records DOI, URL, PDF, and code confidence status.
- `quality_gate.json` emits `pass`, `retry`, or `needs_user_attention`.
- `evidence_ledger.json` maps report-level claims to paper citations or `MATERIAL GAP`.
- `review_agent_findings.json` records SourceVerifier, RelevanceJudge, CitationCompliance, and Devil's Advocate checks.
- `literature_matrix.json` and `synthesis.json` support APA-style reports with evidence limits and AI disclosure.
- Reports include a real review narrative: field background, method families, representative paper summaries, method comparison, trends, and open questions.
