Metadata-Version: 2.4
Name: paper-search-lib
Version: 1.0.0
Summary: Unified paper search across 20+ academic sources
Home-page: https://github.com/yourusername/paper-search-lib
Author: Your Name
Author-email: your.email@example.com
Project-URL: Documentation, https://github.com/yourusername/paper-search-lib/wiki
Project-URL: Source, https://github.com/yourusername/paper-search-lib
Project-URL: Tracker, https://github.com/yourusername/paper-search-lib/issues
Keywords: paper search academic arxiv pubmed semantic scholar google scholar research
Classifier: Development Status :: 5 - Production/Stable
Classifier: Intended Audience :: Developers
Classifier: Intended Audience :: Science/Research
Classifier: License :: OSI Approved :: MIT License
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.9
Classifier: Programming Language :: Python :: 3.10
Classifier: Programming Language :: Python :: 3.11
Classifier: Programming Language :: Python :: 3.12
Classifier: Topic :: Scientific/Engineering
Classifier: Topic :: Internet :: WWW/HTTP :: Dynamic Content
Requires-Python: >=3.9
Description-Content-Type: text/markdown
License-File: LICENSE
Requires-Dist: requests>=2.28.0
Requires-Dist: feedparser>=6.0.0
Requires-Dist: httpx>=0.24.0
Requires-Dist: beautifulsoup4>=4.12.0
Requires-Dist: lxml>=4.9.0
Provides-Extra: pdf
Requires-Dist: PyPDF2>=3.0.0; extra == "pdf"
Requires-Dist: pdf2image>=1.16.0; extra == "pdf"
Requires-Dist: pdfminer>=20221105; extra == "pdf"
Provides-Extra: dev
Requires-Dist: pytest>=7.0.0; extra == "dev"
Requires-Dist: pytest-cov>=4.0.0; extra == "dev"
Requires-Dist: black>=23.0.0; extra == "dev"
Requires-Dist: flake8>=6.0.0; extra == "dev"
Dynamic: author
Dynamic: author-email
Dynamic: classifier
Dynamic: description
Dynamic: description-content-type
Dynamic: home-page
Dynamic: keywords
Dynamic: license-file
Dynamic: project-url
Dynamic: provides-extra
Dynamic: requires-dist
Dynamic: requires-python
Dynamic: summary

# Paper Search Library

Unified search across 20+ academic sources: ArXiv, PubMed, Semantic Scholar, Google Scholar, SSRN, bioRxiv, and more.

## Features

- 🔍 **Multi-source search** - Search across 20+ academic databases simultaneously
- ⚙️ **Robust error handling** - Automatic retries, rate limiting, timeout handling
- 📥 **PDF downloads** - Download papers from multiple sources with fallback chains
- 🛡️ **Production-ready** - Built from real-world trading system experience
- 📦 **Easy integration** - Simple API, minimal dependencies

## Quick Start

### Installation

```bash
pip install paper-search-lib
```

### Basic Usage

```python
from paper_search import PaperSearch
from paper_search.connectors import ArxivConnector

# Create searcher
searcher = PaperSearch(connectors=[ArxivConnector()])

# Search
papers = searcher.search("machine learning", max_results=10)

# Use results
for paper in papers:
    print(f"{paper.title}")
    print(f"Authors: {', '.join(paper.authors)}")
    print(f"URL: {paper.url}\n")
```

### Robust Search (With Error Handling)

```python
from paper_search import RobustSearch
from paper_search.connectors import ArxivConnector

# Create robust searcher
robust = RobustSearch(
    connectors=[ArxivConnector()],
    min_delay=10,        # Wait 10s between requests
    max_retries=3,       # Retry 3 times on failure
    timeout=90,          # 90 second timeout (not 30!)
    use_fallback=True    # Try other sources if one fails
)

# Search with automatic retries and proper delays
result = robust.search("changepoint detection", max_results=20)
print(f"Found {result.total_found} papers")
print(f"Successful sources: {result.successful_sources}")
print(f"Failed sources: {result.failed_sources}")
```

### Multiple Queries

```python
queries = [
    "changepoint detection",
    "Bayesian inference",
    "regime switching"
]

results = robust.search_multiple(queries, max_results=10)

for query, result in results.items():
    print(f"\n{query}: {result.total_found} papers")
    for paper in result.papers:
        print(f"  - {paper.title[:60]}...")
```

## Available Sources

- ArXiv (✅ Ready)
- PubMed (Coming soon)
- Semantic Scholar (Coming soon)
- Google Scholar (Coming soon)
- bioRxiv (Coming soon)
- SSRN (Coming soon)
- And 14+ more...

## Documentation & Roadmap

### Quick Links
- [API Reference](docs/API_REFERENCE.md) - Class and method documentation
- [Available Sources](docs/SOURCES.md) - Current & planned sources
- [Examples](docs/EXAMPLES.md) - Usage examples
- [Troubleshooting](docs/TROUBLESHOOTING.md) - Common issues

### Implementation Roadmap
- [PHASE_2_PLAN.md](PHASE_2_PLAN.md) - Phase 2: Publish to PyPI (2-3 hours)
- [PHASE_3_LONG_TERM.md](PHASE_3_LONG_TERM.md) - Phase 3+: 20+ sources & advanced features (12+ weeks)

## Key Design Decisions

### Rate Limiting
- **Default delay**: 10 seconds between requests
- **Why**: ArXiv allows ~1 request per 3 seconds; 10s is conservative and safe
- **Configurable**: Adjust via `min_delay` parameter

### Timeout Handling
- **Default timeout**: 90 seconds (not 30!)
- **Why**: Academic servers can be slow; 30s is too aggressive
- **Strategy**: Retry up to 3 times with exponential backoff

### Error Handling
- **Rate limit (429)**: Wait 30s and retry
- **Timeout**: Wait 5-10s and retry
- **Server error (503)**: Skip to next source (server is busy)
- **Result**: >90% success rate instead of ~30%

### Multi-source
- Try all configured sources
- Continue even if one fails
- Return combined results
- Track which sources succeeded/failed

## Contributing

Contributions welcome! Areas to help:
- Add new source connectors
- Improve error handling
- Add caching layer
- Performance optimizations
- Documentation improvements

## License

MIT License - See LICENSE file for details

## Acknowledgments

Built from experience with:
- [paper-search-mcp](https://github.com/openags/paper-search-mcp)
- [arxiv-mcp-server](https://github.com/openags/arxiv-mcp-server)
- [BTC-QUANT](https://github.com/yourusername/btc-quant)

---

**Status**: Alpha (v0.1.0)  
**Last Updated**: April 2, 2026  
**Stability**: Production-ready for ArXiv; other sources coming
