Metadata-Version: 2.4
Name: lemma-ai
Version: 2.0.0
Summary: Local-first paper manager with semantic search and LLM reasoning
Home-page: https://github.com/alMohimanul/lemma
Author: Mahir
Author-email: aislam192054@gmail.com
Classifier: Development Status :: 4 - Beta
Classifier: Intended Audience :: Science/Research
Classifier: Topic :: Scientific/Engineering
Classifier: License :: OSI Approved :: MIT License
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.10
Classifier: Programming Language :: Python :: 3.11
Classifier: Programming Language :: Python :: 3.12
Requires-Python: >=3.10
Description-Content-Type: text/markdown
License-File: LICENSE
Requires-Dist: click>=8.1.0
Requires-Dist: pypdf2>=3.0.0
Requires-Dist: pdfplumber>=0.11.0
Requires-Dist: sentence-transformers>=2.3.0
Requires-Dist: faiss-cpu>=1.7.4
Requires-Dist: sqlalchemy>=2.0.0
Requires-Dist: groq>=0.4.0
Requires-Dist: google-generativeai>=0.3.0
Requires-Dist: httpx>=0.26.0
Requires-Dist: rich>=13.7.0
Requires-Dist: python-dotenv>=1.0.0
Provides-Extra: dev
Requires-Dist: pytest>=7.4.0; extra == "dev"
Requires-Dist: pytest-cov>=4.1.0; extra == "dev"
Requires-Dist: ruff>=0.1.9; extra == "dev"
Requires-Dist: mypy>=1.8.0; extra == "dev"
Requires-Dist: pre-commit>=3.5.0; extra == "dev"
Requires-Dist: build>=1.0.0; extra == "dev"
Requires-Dist: twine>=4.0.0; extra == "dev"
Dynamic: author
Dynamic: author-email
Dynamic: classifier
Dynamic: description
Dynamic: description-content-type
Dynamic: home-page
Dynamic: license-file
Dynamic: provides-extra
Dynamic: requires-dist
Dynamic: requires-python
Dynamic: summary

# 📚 Lemma - Local-First Research Paper Manager

> A privacy-first research paper manager with local semantic search and optional AI-powered insights.

[![Python 3.10+](https://img.shields.io/badge/python-3.10+-blue.svg)](https://www.python.org/downloads/)
[![License: MIT](https://img.shields.io/badge/License-MIT-yellow.svg)](https://opensource.org/licenses/MIT)

## ✨ Features

- 🔒 **Privacy-First**: All papers stored locally, no cloud uploads
- 🚀 **Fast Semantic Search**: Local vector search across all papers
- 🤖 **AI Q&A (Optional)**: Ask questions using cloud LLMs
- 📊 **Auto-Processing**: One command to scan, rename, and embed
- 🔄 **Incremental Updates**: 70-90% faster re-embedding
- 📂 **Smart Cleanup**: Automatically removes deleted papers from database
- 👀 **Watch Mode**: Auto-process new papers as they're added

## 🚀 Quick Start

### 1. Installation

```bash
pip install -r requirements.txt
```

### 2. Set Your Papers Folder (One-Time Setup)

```bash
# Set default papers directory and process all papers
lemma sync ~/Papers --set-default

# That's it! Your papers are now:
# ✓ Scanned and indexed
# ✓ Renamed with metadata
# ✓ Embedded for semantic search
# ✓ Ready for questions
```

### 3. Add New Papers (Automatic)

**Option A: Manual Sync**
```bash
# Just drop PDFs into ~/Papers, then run:
lemma sync
```

**Option B: Auto-Sync (Watch Mode)**
```bash
# Start watching (leave running in terminal)
lemma sync --watch

# Now just drop PDFs into ~/Papers
# They're automatically processed in seconds!
```

### 4. Query Your Papers

```bash
lemma ask "What are the main findings?"
```

## 📖 Common Workflows

### First-Time Setup
```bash
# 1. Set your papers folder and sync everything
lemma sync ~/Papers --set-default

# 2. Query your papers
lemma ask "What is the main contribution?"
```

### Daily Use
```bash
# Download new papers to ~/Papers, then:
lemma sync

# Or enable auto-processing:
lemma sync --watch  # Leave running
```

### Browse Your Library
```bash
lemma list                    # List all papers
lemma search "transformers"   # Search by keyword
lemma show 5                  # Show paper details
```

## 🔧 Advanced Usage

### Sync Options
```bash
lemma sync                    # Use default directory
lemma sync ~/Papers           # Specify directory
lemma sync --no-rename        # Skip automatic renaming
lemma sync --no-embed         # Skip embedding (faster)
lemma sync --watch            # Continuous monitoring
```

### Manual Control (If Needed)
```bash
lemma scan ~/Papers           # Just scan (no rename/embed)
lemma organize                # Rename existing files
lemma embed                   # Generate embeddings only
lemma embed-status            # Check embedding coverage
lemma verify --remove         # Clean up missing files
```

## 🤖 AI Q&A Setup (Optional)

Set up an API key to enable question answering:

```bash
# Option 1: Environment variable
export GROQ_API_KEY="your_key_here"

# Option 2: .env file
echo "GROQ_API_KEY=your_key_here" > ~/.lemma/.env

# Then ask questions
lemma ask "What are the main approaches discussed?"
```

**Get Free API Keys:**
- [Groq](https://console.groq.com/) - Fast and generous free tier (recommended)
- [Google Gemini](https://makersuite.google.com/) - Alternative option

## 📋 Key Commands

| Command | Description |
|---------|-------------|
| `lemma sync` | Auto-process papers (scan + rename + embed) |
| `lemma sync --watch` | Monitor folder and auto-process new papers |
| `lemma list` | List all indexed papers |
| `lemma ask <question>` | Ask questions across papers (requires API key) |
| `lemma search <query>` | Search papers by keyword |
| `lemma show <id>` | Show paper details |
| `lemma embed-status` | Check embedding coverage |

## 🔒 Privacy

- **All papers stay on your machine** - never uploaded anywhere
- **Embeddings generated locally** - no external API calls
- **Cloud APIs only for Q&A** - and only if you configure them
- **Database stored locally** at `~/.lemma/lemma.db`

## 📦 Requirements

- Python 3.10 or higher
- ~500MB disk space for embedding models
- Internet connection only for optional AI Q&A

## License

MIT License - see LICENSE file for details

## Support

- Report issues: [GitHub Issues](https://github.com/alMohimanul/lemma/issues)
- Questions: Open a discussion on GitHub
