Get multiple expert perspectives on your draft research paper in a few minutes. Upload a PDF, pick how many reviewers from a pool of AI personas should examine it, each selected reviewer produces structured review comments in parallel, and the results are clustered and ranked so the issues multiple reviewers raise float to the top.
This system is a draft-polishing aid for papers you are writing. It is designed to help authors spot weaknesses in their own in-progress work before they submit (e.g., missing baselines, thin ablations, unclear contributions, reproducibility gaps, and writing issues), and the kinds of questions reviewers are likely to ask.
The system ships with a default reviewer database for computer architecture (200 reviewers: 10 sub-domains × 20 personas). The reviewer database is a swappable input — you can build one for any research field externally and upload it through the Database page.
Each step corresponds to a page in the navigation bar above. Start with Model to confirm your LLM provider is set up, then run Review on a draft. Once you have real human reviews back, use Validation to see how well the AI matched. After several validations accumulate, use Aggregation to surface the tuning suggestions that repeat across papers.
Defaults from config.yaml; adjustable per session
Pick which LLM provider and model run reviews, and (optionally)
a different one for validation. Supported providers include
Anthropic, OpenAI, Google Gemini, xAI, GitHub Models,
GitHub Copilot SDK, and any OpenAI-compatible endpoint (Ollama,
Together, Groq, DeepSeek, local llama.cpp, …). The page shows
which providers have credentials configured, and you can change
provider / model / base URL via the form — changes apply for
the current session and are cleared on server restart. For
permanent changes, edit config.yaml directly.
Output: review and validation models.
Upload a PDF and get a ranked review report
Upload your draft PDF, pick a reviewer database, and choose how many reviewers to run. The system extracts text + keywords from the paper, picks the top-N reviewer personas most relevant to your topic (diversified across personas), runs them in parallel against the configured LLM — each reviewer emitting structured comments — then clusters similar comments and ranks them by commonality × severity. A separate always-on writing-clarity reviewer also runs on every paper. It focuses strictly on writing quality (flow, terminology, figures, structure). It is not clustered with the domain reviewers and never enters Validation — writing feedback is for your polishing, not for persona calibration.
Output: per-reviewer comments, ranked comments, and writing-clarity comments.
Compare the AI review against real human review
Once you receive real human peer reviews, upload them alongside the AI review to see a calibration report: which persona prompts the AI is getting right, where it misses human-raised issues, and where it false-alarms. The validator converts the human review into the same structured format the AI pipeline uses, then asks the LLM for a similarity score between every (human comment, AI comment) pair across parallel chunked calls, and derives hit / partial / miss verdicts from the resulting matrix. Use the report to hand-tune the reviewer database over multiple papers.
Outputs: validation report and calibration delta
Cross-paper aggregation of calibration deltas to help adjust reviewer personas
After running Validation on several papers, a single paper's calibration delta is noisy — one human reviewer's idiosyncrasies can make any persona look miscalibrated. Aggregation groups suggestions across all completed validations and only surfaces those that repeat across papers. What's left is robust advice on which reviewer-database knobs to turn.
Output: cross-paper aggregation report.