Metadata-Version: 2.4
Name: med-discover-ai
Version: 1.0.11
Summary: Med-Discover is an AI-powered tool designed to assist biomedical researchers by leveraging Retrieval-Augmented Generation (RAG) with fine-tuned LLMs on PubMed literature. It enables efficient document retrieval, knowledge extraction, and interactive querying from biomedical research papers, helping researchers find relevant insights quickly. The package supports both GPU-based embeddings (MedCPT) and CPU-friendly alternatives (GPT-4 embeddings), making it accessible for a wide range of users.
License: CC BY-NC-ND 4.0
Author: VatsalPatel18
Author-email: vatsal1804@gmail.com
Requires-Python: >=3.11,<4.0
Classifier: License :: Other/Proprietary License
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.11
Classifier: Programming Language :: Python :: 3.12
Classifier: Programming Language :: Python :: 3.13
Classifier: Programming Language :: Python :: 3.14
Requires-Dist: datasets (>=2.14.0)
Requires-Dist: faiss-cpu (>=1.7.0)
Requires-Dist: gradio (>=4.0.0)
Requires-Dist: nltk (>=3.8.0)
Requires-Dist: numpy (>=1.23.0)
Requires-Dist: ollama (>=0.2.0)
Requires-Dist: openai (>=1.0.0)
Requires-Dist: pandas (>=2.0.0)
Requires-Dist: pypdf2 (>=3.0.0)
Requires-Dist: python-dotenv (>=1.0.0)
Requires-Dist: ragas (>=0.1.11)
Requires-Dist: rouge-score (>=0.1.0)
Requires-Dist: scipy (>=1.10.0)
Requires-Dist: statsmodels (>=0.14.0)
Requires-Dist: torch (>=2.0.0)
Requires-Dist: torchvision (>=0.15.0,<0.30.0)
Requires-Dist: transformers (>=4.30.0,<4.50.0)
Description-Content-Type: text/markdown

# MedDiscover
MedDiscover is an AI-powered tool designed to assist biomedical researchers using RAG-LLM models fine-tuned on PubMed literature.

## CLI evaluation (headless)
Install the package (or use it in editable mode), set your `OPENAI_API_KEY`, and run the built-in evaluator:

```bash
pip install .
export OPENAI_API_KEY=...
# optional: ALLOW_MEDCPT_CPU=1 to force MedCPT on CPU

meddiscover-eval \
  --pdfs med_discover_ai/eval_samples/sample_pdfs/fmed-11-1345659.pdf med_discover_ai/eval_samples/sample_pdfs/s10549-023-07033-8.pdf \
  --qa_csv med_discover_ai/eval_samples/sample_qa.csv \
  --embedding_model "MedCPT (GPU Recommended)" \
  --llm_models gpt-4.1-mini \
  --k 3 \
  --max_tokens 64 \
  --out_dir ./eval_outputs_demo

# Evaluate both decoders in one run (example)
meddiscover-eval \
  --pdfs med_discover_ai/eval_samples/sample_pdfs/fmed-11-1345659.pdf med_discover_ai/eval_samples/sample_pdfs/s10549-023-07033-8.pdf \
  --qa_csv med_discover_ai/eval_samples/sample_qa.csv \
  --embedding_model "MedCPT (GPU Recommended)" \
  --llm_models gpt-4.1-mini,gpt-4.1-nano \
  --k 3 --max_tokens 64 --out_dir ./eval_outputs_all
```

- For Ada-based retrieval, switch `--embedding_model` to `OpenAI Ada-002 (CPU/Cloud)`.
- RAGAS metrics are optional; if dependencies are missing or the QA CSV lacks a `reference` column, they fall back to `None`.
- Re-ranking stays disabled on CPU; enable `--rerank` only when a GPU and cross-encoder are available.
- Ollama models are available in both the UI and CLI (prefix with `ollama:`), e.g. `--llm_models ollama:gemma3:4b`.

